pp_simScore - print similarity and alignments for block-profile and protein sequence on the standard output
pp_simScore [OPTIONS] --fasta=protein-sequence-file --prfl=protein-profile-file
Algorithm for calculating the similarity score and the optimal alignments of a block-profile and a protein sequence. The algorithm can optional take intron positions into account. Print to standard output.
It may contain an optional [Intron] section. This section denotes the intron positions in the protein sequence, which are specified as list of (j, f), where j is the index of the amino acid after witch the intron immediately occurs. The indices range from 0 to m - 1 if the protein sequence has a length of m.
>protein sequence header
XXXXXXX protein sequence XXXXXXXXXXX
# index of the position after which an intron occurs | residual nucleotides before the intron
-p, --prfl= file
This structure can be repeated in this file. The file has to end either in a [dist] section or a [dist] and than [intron profile] section. The [intron profile] sections are optional.
min, max denote the distance interval of an inter-block section
B denotes a (20 x t) matrix for a block of t of the block-profile
[intron profile] explanation:
an intron profile describes the positions and frequencies of introns in and
before the associated block
w: number of protein family members used to build the intron profile
inter-block_profile_list: list of (h, v),
where h denotes the number of introns which occurred within a family member,
v the number of family members which have this number of introns
intra-block_profile_list: list (s, f, v),
where s denotes the index of the position in the block after which an intron occurs,
f denotes the number of nucleotides which are left before the intron (0,1,2)
v the number of family members which have an intron at that position
Denotes the output format, the following output options are implemented:
pp_simScore --fasta=EDW03868.1.fa --prfl=EOG09150290.prfl --out=alignment
AUGUSTUS was written by M. Stanke, O. Keller, S. König, L. Gerischer, L. Romoth and L.Gabriel.