Scroll to navigation

PFSEARCHV2(1) General Commands Manual PFSEARCHV2(1)

NAME

pfsearchV2 - search a protein or DNA sequence library for sequence segments matching a profile

SYNOPSIS

pfsearchV2 [ -abflLrsuxyz ] [ profile-file | - ]
[ seq-library-file | - ] [C=#] [W=#]

DESCRIPTION

pfsearchV2 compares a query profile against a DNA or protein sequence library. The result is an unsorted list of profile-sequence matches written to the standard output. A variety of output formats containing different information can be specified via the options -a, -l, -L, -r, -u, -s, -x, -y, -z, and the command-line parameter C=#. profile-file contains a profile in PROSITE format. seq-library-file contains a sequence library in EMBL/SWISS-PROT format (assumed by default) or in Pearson/Fasta format (indicated by option -f). pfsearchV2 can be used as a filter if - is used instead of one of the input filenames.

OPTIONS

Report optimal alignment scores for all sequences regardless of the cut-off value. This option simultaneously forces DISJOINT=UNIQUE.
Search the complementary strands of DNA sequences as well.
Input sequence-library is in Pearson/Fasta format.
Indicate by number the highest cut-off level exceeded by the match score in the output list.
Indicate by character string the highest cut-off level exceeded by the match score in the output list. Note that the generalized profile format includes a text string field to specify a name for a cut-off level. The -L option causes the program to display the first two characters of this text string (usually something like "!", "?", "??", etc.) at the beginning of each match description.
Use raw scores rather than normalized scores for match selection. Normalized scores will not be listed in the output.
List the sequences of the matched regions as well. The output will be a Pearson/Fasta-formatted sequence library.
Forces DISJOINT=UNIQUE.
List profile-sequence alignments in pftools PSA format.
Display alignments between the profile and the matched sequence regions in a human-friendly format.
Indicate starting and ending position of the matched profile range. The latter position will be given as a negative offset from the end of the profile. Thus the range [ 1, -1] means entire profile.

PARAMETERS

Cut-off value. Over-writes the level zero cut-off value specified in the profile. An integer argument is interpreted as a raw score value, a decimal argument as a normalized score value. An integer value forces option -r.
Output width. Output lines will be truncated after W characters. Default: W=132.

EXAMPLES

(1)
pfsearchV2 -f sh3.prf sh3.seq C=6.0

Searches the Pearson/Fasta-formatted protein sequence library sh3.seq for SH3 domains with a cut-off value of 6.0 normalized score units. sh3.seq contains 20 SH3 domain-containing protein sequences from SWISS-PROT release 32. sh3.prf contains the PROSITE entry SH3/PS50002.

(2)
pfsearchV2 -bx ecp.prf CVPBR322 | psa2msa -du | readseq -p -fMSF > ecp.msf

Generates a multiple sequence alignment of potential E. coli promoters on both strands of plasmid pBR322. ecp.prf contains a profile for E. coli promoters. CVPBR322 contains EMBL entry J01749|CVPBR322. The result file ecp.msf can further be processed by GCG programs accepting MSF files as input.

See also manual pages of psa2msa.

AUTHOR

Philipp Bucher
Philipp.Bucher@isrec.unil.ch

February 1998 pftools 2.1