python

pyabpoa: abPOA Python interface

Introduction

pyabpoa provides an easy-to-use interface to abPOA, it contains all the APIs that can be used to perform MSA for a set of sequences and consensus calling from the final alignment graph.

Installation

Install pyabpoa with pip

pyabpoa can be installed with pip:

pip install pyabpoa

Install pyabpoa from source

Alternatively, you can install pyabpoa from source (cython is required):

git clone --recursive https://github.com/yangao07/abPOA.git
cd abPOA
make install_py

Examples

The following code illustrates how to use pyabpoa.

import pyabpoa as pa
a = pa.msa_aligner()
seqs=[
'CCGAAGA',
'CCGAACTCGA',
'CCCGGAAGA',
'CCGAAGA'
]
a_res=a.msa(seqs, out_cons=True, out_msa=True) # perform multiple sequence alignment 

for seq in a_res.cons_seq:
    print(seq)  # print consensus sequence

a_res.print_msa() # print row-column multiple sequence alignment in PIR format

# incrementally add new seqs
new_seqs=[
'CCAGA',
'CCGAAAGA'
]
b = pa.msa_aligner()
b.msa_align(seqs, out_cons=True, out_msa=True)
b.msa_add(new_seqs)
b_res = b.msa_output()
b_res.print_msa()

You can also try the example script provided in the source folder:

python ./python/example.py

APIs

Class pyabpoa.msa_aligner

pyabpoa.msa_aligner(aln_mode='g', ...)

This constructs a multiple sequence alignment handler of pyabpoa, it accepts the following arguments:

aln_mode: alignment mode. 'g': global, 'l': local, 'e': extension; default: 'g'
is_aa: input is amino acid sequence; default: False
match: match score; default: 2
mismatch: match penaty; default: 4
score_matrix: scoring matrix file, match and mismatch are not used when score_matrix is used; default: ''
gap_open1: first gap opening penalty; default: 4
gap_ext1: first gap extension penalty; default: 2
gap_open2: second gap opening penalty; default: 24
gap_ext2: second gap extension penalty; default: 1
extra_b: first adaptive banding paremeter; set as < 0 to disable adaptive banded DP; default: 10
extra_f: second adaptive banding paremete; the number of extra bases added on both sites of the band is b+f*L, where L is the length of the aligned sequence; default : 0.01
cons_algrm: consensus calling algorithm. 'HB': heaviest bunlding, 'MF': most frequent bases; default: 'HB'

The msa_aligner handler provides one method msa which performs multiple sequence alignment and accepts the following arguments:

pyabpoa.msa_aligner.msa(seqs, out_cons, out_msa, out_pog='', incr_fn='', qscores=None)

seqs: a list variable containing a set of input sequences; positional
out_cons: a bool variable to ask pyabpoa to generate consensus sequence; positional
out_msa: a bool variable to ask pyabpoa to generate RC-MSA; positional
max_n_cons: maximum number of consensus sequence to generate; default: 1
min_freq: minimum frequency of each consensus to output (effective when max_n_cons > 1); default: 0.3
out_pog: name of a file (.png or .pdf) to store the plot of the final alignment graph; default: ''
incr_fn: name of an existing graph (GFA) or MSA (FASTA) file, incrementally align sequence to this graph/MSA; default: ''
qscores: optional per-sequence quality information used to weight the consensus graph. Each entry must match the corresponding sequence length and should be a list of integer Phred scores, e.g. record.letter_annotations["phred_quality"] from Biopython; default: None

When qscores is provided, pyabpoa enables quality-weighted consensus (use_qv) for that run. This is effective for heaviest-bundling consensus (cons_algrm='HB'), matching the CLI -Q/--use-qual-weight behavior.

msa_aligner also provides three methods for incrementally adding sequences to graph/MSA:

pyabpoa.msa_aligner.msa_align(seqs, out_cons, out_msa, max_n_cons=1, min_freq=0.25, incr_fn=b'', qscores=None)
pyabpoa.msa_aligner.msa_add(new_seqs, qscores=None)
pyabpoa.msa_aligner.msa_output()

Intuitively, msa() = msa_align()+msa_add()+msa_output().

To collect consenus sequence and RC-MSA result, msa_output() needs to be called after msa_align() and msa_add(), which returns an object of pyabpoa.msa_result.

Class pyabpoa.msa_result

pyabpoa.msa_result(seq_n, cons_n, cons_len, ...)

This class describes the information of the generated consensus sequence and the RC-MSA. The returned result of pyabpoa.msa_aligner.msa() is an object of this class that has the following properties:

n_seq: number of input aligned sequences
n_cons: number of generated consensus sequences (generally 1, could be 2 or more if max_n_cons is set as > 1)
clu_n_seq: an array of sequence cluster size
cons_len: an array of consensus sequence length(s)
cons_seq: an array of consensus sequence(s)
cons_cov: an array of consensus sequence coverage for each base
cons_qv: an array of consensus quality strings in FASTQ encoding
msa_len: size of each row in the RC-MSA
msa_seq: an array containing n_seq+n_cons strings that demonstrates the RC-MSA, each consisting of one input sequence and several - indicating the alignment gaps.

pyabpoa.msa_result() has a function of print_msa which prints the RC-MSA to screen.

pyabpoa.msa_result().print_msa()

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
cabpoa.pxd	cabpoa.pxd
example.py	example.py
pyabpoa.pyx	pyabpoa.pyx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

README.md

pyabpoa: abPOA Python interface

Introduction

Installation

Install pyabpoa with pip

Install pyabpoa from source

Examples

APIs

Class pyabpoa.msa_aligner

Class pyabpoa.msa_result

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

pyabpoa: abPOA Python interface

Introduction

Installation

Install pyabpoa with pip

Install pyabpoa from source

Examples

APIs

Class pyabpoa.msa_aligner

Class pyabpoa.msa_result

Expand file tree