OPTIONS¶
-sequence seqall
-onlydend toggle
Default value: N
-dend toggle
Default value: N
-dendfile infile
-slow toggle
A distance is calculated between every pair of sequences
and these are used to construct the dendrogram which guides the final multiple
alignment. The scores are calculated from separate pairwise alignments. These
can be calculated using 2 methods: dynamic programming (slow but accurate) or
by the method of Wilbur and Lipman (extremely fast but approximate). The
slow-accurate method is fine for short sequences but will be VERY SLOW for
many (e.g. >100) long (e.g. >1000 residue) sequences. Default value:
Y
Pairwise align options¶
-pwmatrix list
The scoring table which describes the similarity of each
amino acid to each other. There are three 'in-built' series of weight matrices
offered. Each consists of several matrices which work differently at different
evolutionary distances. To see the exact details, read the documentation.
Crudely, we store several matrices in memory, spanning the full range of amino
acid distance (from almost identical sequences to highly divergent ones). For
very similar sequences, it is best to use a strict weight matrix which only
gives a high score to identities and the most favoured conservative
substitutions. For more divergent sequences, it is appropriate to use 'softer'
matrices which give a high score to many other frequent substitutions. 1)
BLOSUM (Henikoff). These matrices appear to be the best available for carrying
out data base similarity (homology searches). The matrices used are: Blosum80,
62, 45 and 30. 2) PAM (Dayhoff). These have been extremely widely used since
the late '70s. We use the PAM 120, 160, 250 and 350 matrices. 3) GONNET .
These matrices were derived using almost the same procedure as the Dayhoff one
(above) but are much more up to date and are based on a far larger data set.
They appear to be more sensitive than the Dayhoff series. We use the GONNET
40, 80, 120, 160, 250 and 350 matrices. We also supply an identity matrix
which gives a score of 1.0 to two identical amino acids and a score of zero
otherwise. This matrix is not very useful. Default value: b
-pwdnamatrix list
The scoring table which describes the scores assigned to
matches and mismatches (including IUB ambiguity codes). Default value: i
-usermatrix variable
-pairwisedatafile infile
Matrix options¶
-matrix list
This gives a menu where you are offered a choice of
weight matrices. The default for proteins is the PAM series derived by Gonnet
and colleagues. Note, a series is used! The actual matrix that is used depends
on how similar the sequences to be aligned at this alignment step are.
Different matrices work differently at each evolutionary distance. There are
three 'in-built' series of weight matrices offered. Each consists of several
matrices which work differently at different evolutionary distances. To see
the exact details, read the documentation. Crudely, we store several matrices
in memory, spanning the full range of amino acid distance (from almost
identical sequences to highly divergent ones). For very similar sequences, it
is best to use a strict weight matrix which only gives a high score to
identities and the most favoured conservative substitutions. For more
divergent sequences, it is appropriate to use 'softer' matrices which give a
high score to many other frequent substitutions. 1) BLOSUM (Henikoff). These
matrices appear to be the best available for carrying out data base similarity
(homology searches). The matrices used are: Blosum80, 62, 45 and 30. 2) PAM
(Dayhoff). These have been extremely widely used since the late '70s. We use
the PAM 120, 160, 250 and 350 matrices. 3) GONNET . These matrices were
derived using almost the same procedure as the Dayhoff one (above) but are
much more up to date and are based on a far larger data set. They appear to be
more sensitive than the Dayhoff series. We use the GONNET 40, 80, 120, 160,
250 and 350 matrices. We also supply an identity matrix which gives a score of
1.0 to two identical amino acids and a score of zero otherwise. This matrix is
not very useful. Alternatively, you can read in your own (just one matrix, not
a series). Default value: b
-usermamatrix variable
-dnamatrix list
This gives a menu where a single matrix (not a series)
can be selected. Default value: i
-umamatrix variable
-mamatrixfile infile
Slow align options¶
-pwgapopen float
The penalty for opening a gap in the pairwise alignments.
Default value: 10.0
-pwgapextend float
The penalty for extending a gap by 1 residue in the
pairwise alignments. Default value: 0.1
Fast align options¶
-ktup integer
This is the size of exactly matching fragment that is
used. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for
sensitivity. For longer sequences (e.g. >1000 residues) you may need to
increase the default. Default value: @($(acdprotein)?1:2)
-gapw integer
This is a penalty for each gap in the fast alignments. It
has little affect on the speed or sensitivity except for extreme values.
Default value: @($(acdprotein)?3:5)
-topdiags integer
The number of k-tuple matches on each diagonal (in an
imaginary dot-matrix plot) is calculated. Only the best ones (with most
matches) are used in the alignment. This parameter specifies how many.
Decrease for speed; increase for sensitivity. Default value:
@($(acdprotein)?5:4)
-window integer
This is the number of diagonals around each of the 'best'
diagonals that will be used. Decrease for speed; increase for sensitivity.
Default value: @($(acdprotein)?5:4)
-nopercent boolean
Default value: N
Gap options¶
-gapopen float
The penalty for opening a gap in the alignment.
Increasing the gap opening penalty will make gaps less frequent. Default
value: 10.0
-gapextend float
The penalty for extending a gap by 1 residue. Increasing
the gap extension penalty will make gaps shorter. Terminal gaps are not
penalised. Default value: 5.0
-endgaps boolean
End gap separation: treats end gaps just like internal
gaps for the purposes of avoiding gaps that are too close (set by 'gap
separation distance'). If you turn this off, end gaps will be ignored for this
purpose. This is useful when you wish to align fragments where the end gaps
are not biologically meaningful. Default value: Y
-gapdist integer
Gap separation distance: tries to decrease the chances of
gaps being too close to each other. Gaps that are less than this distance
apart are penalised more than other gaps. This does not prevent close gaps; it
makes them less frequent, promoting a block-like appearance of the alignment.
Default value: 8
-norgap boolean
Residue specific penalties: amino acid specific gap
penalties that reduce or increase the gap opening penalties at each position
in the alignment or sequence. As an example, positions that are rich in
glycine are more likely to have an adjacent gap than positions that are rich
in valine. Default value: N
-hgapres string
This is a set of the residues 'considered' to be
hydrophilic. It is used when introducing Hydrophilic gap penalties. Default
value: GPSNDQEKR
-nohgap boolean
Hydrophilic gap penalties: used to increase the chances
of a gap within a run (5 or more residues) of hydrophilic amino acids; these
are likely to be loop or random coil regions where gaps are more common. The
residues that are 'considered' to be hydrophilic are set by '-hgapres'.
Default value: N
-maxdiv integer
This switch, delays the alignment of the most distantly
related sequences until after the most closely related sequences have been
aligned. The setting shows the percent identity level required to delay the
addition of a sequence; sequences that are less identical than this level to
any other sequences will be aligned later. Default value: 30
Output section¶
-outseq seqoutset
-dendoutfile outfile