Scroll to navigation

PTOF(1) General Commands Manual PTOF(1)

NAME

ptof - convert a protein profile into a frame-search profile

SYNOPSIS

ptof [ -r ] [ protein-profile ] [B=#] [F=#] [I=#] [X=#] [Y=#] [Z=#]

DESCRIPTION

ptof converts a protein profile (generated for instance by pftools programs pfmake, gtop or htop) into a so-called "frame-search profile". A frame-search profile is used to search an "interleaved frame-translated" DNA sequence (generated by pftools program 2ft) for occurrences of a protein sequence motif. An "interleaved frame-translated" DNA sequence is an amino acid sequence corresponding to the N-2 overlapping codons of a DNA sequence of length N. Note that in such a sequence, the character "O" is used to represent stop codons.

The conversion procedure works as follows: The protein profile is expanded in length by a factor of three to accommodate three translated codons per original match position. Two dummy match positions are placed between two consecutive significant match positions imported from the original profile. The original insert positions are placed between pairs of adjacent dummy match positions. The initiation, termination, and transition scores of the original insert positions are left unchanged; the insert extension scores are divided by a factor of 3, or by the value of the command-line parameter I. The two insert positions flanking the significant match positions serve to accommodate frame-shift errors and introns, respectively. The frame-shift insert position allows free insertion opening combined with a high insert extension penalty (command-line parameter F) whereas the intron insertion position has high opening but low extension penalties (command line parameters Y and Z). The deletion opening and closing penalties next to the significant match positions are set to values that ensure that the total cost of a single-base deletion is the same as the cost of a single base-insertion at a frame-shift insert position. Furthermore, the alphabet of the original profile is extended by the stop codon symbol "O" which is assigned a constant negative value (command-line parameter X) at significant match positions, and zero at dummy match positions. At insert positions, it is set to the average of the other insert extension scores.

OPTIONS

-r
Frame-search parameters are given in normalized score units. This option will only be considered if a linear normalization function with priority over all other normalization functions is specified in the profile. In this case, the frame-search scores specified on the command line will be divided by the slope (R2 parameter) of the normalization function. This option is particularly useful for profiles which are already scaled in units that can be interpreted as −Log(P)-values, e.g. bits.

PARAMETERS

B=#
Minimal insertion and termination score. All internal and external initiation and termination scores will be set to this value if the corresponding value in the original profile is lower than this value. This parameter is used to impose a more local alignment behavior on the frame-search profile in order to deal with discontinuities in DNA sequences (long introns, alternative splicing, chimeric clones, etc.) Default: B=−50(−0.5 with option -r).
F=#
Frame-shift error penalty. Default: F=−100(−1.0 with option -r).
F=#
Insert score multiplier. The values of the original insert extension scores will be multiplied by this factor in order to compensate for the fact that a single amino acid corresponds to three overlapping codon positions in the target sequence. Default: I=1/3.
X=#
Stop codon penalty. Default: X=−100(−1.0 with option -r)
Y=#
Intron opening penalty. Default: Y=−300(−3.0 with option -r).
Z=#
Intron extension penalty. Default: Z=−1(−0.01 with option -r)

EXAMPLES

(1)
ptof -r sh3.prf F=−1.2 I=0.6 X=−1.5 B=−0.5 > sh3.fsp
2ft < R76849.seq | pfsearch -fy sh3.fsp - C=5.0

The protein domain profile in sh3.prf is first converted into a frame-search profile sh3.fps. Then both strands of the Fasta-formatted EST sequence in R76849.seq (GenBank/EMBL-accession: R76849) are converted into interleaved frame-translated protein sequences and searched for SH3 domains with the frame-search profile generated in the preceding step.

The output may be compared to the result of a more conventional search strategy using a protein profile in conjunction with a six-frame translation of the same DNA sequence:

6ft < R76849.seq | pfsearch -fy sh3.prf - C=5.0

See also manual pages of pfsearch, 2ft and 6ft.

AUTHOR

Philipp Bucher
Philipp.Bucher@isrec.unil.ch
July 1999 pftools 2.2