CLUSTALW(1)

Clustal Manual

CLUSTALW(1)

NAME¶

clustalw - Multiple alignment of nucleic acid and protein sequences

SYNOPSIS¶

clustalw [-infile] file.ext [OPTIONS]

clustalw [-help | -fullhelp]

DESCRIPTION¶

Clustal W is a general purpose multiple alignment program for DNA or proteins.

The program performs simultaneous alignment of many nucleotide or amino acid sequences. It is typically run interactively, providing a menu and an online help. If you prefer to use it in command-line (batch) mode, you will have to give several options, the minimum being -infile.

OPTIONS¶

DATA (sequences)¶

-infile=file.ext

Input sequences.

-profile1=file.ext and -profile2=file.ext

Profiles (old alignment)

VERBS (do things)¶

-options

List the command line parameters.

-help or -check

Outline the command line params.

-fullhelp

Output full help content.

-align

Do full multiple alignment.

-tree

Calculate NJ tree.

-pim

Output percent identity matrix (while calculating the tree).

-bootstrap=n

Bootstrap a NJ tree (n= number of bootstraps; def. = 1000).

-convert

Output the input sequences in a different file format.

PARAMETERS (set things)¶

General settings:

-interactive

Read command line, then enter normal interactive menus.

-quicktree

Use FAST algorithm for the alignment guide tree.

-type=

PROTEIN or DNA sequences.

-negative

Protein alignment with negative values in matrix.

-outfile=

Sequence alignment file name.

-output=

GCG, GDE, PHYLIP, PIR or NEXUS.

-outputorder=

INPUT or ALIGNED

-case

LOWER or UPPER (for GDE output only).

-seqnos=

OFF or ON (for Clustal output only).

-seqnos_range=

OFF or ON (NEW: for all output formats).

-range=m,n

Sequence range to write starting m to m+n.

-maxseqlen=n

Maximum allowed input sequence length.

-quiet

Reduce console output to minimum.

-stats=file

Log some alignments statistics to file.

Fast Pairwise Alignments:

-ktuple=n

Word size.

-topdiags=n

Number of best diags.

-window=n

Window around best diags.

-pairgap=n

Gap penalty.

-score

PERCENT or ABSOLUTE.

Slow Pairwise Alignments:

-pwmatrix=

:Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename

-pwdnamatrix=

DNA weight matrix=BLOSUMIUB, BLOSUMCLUSTALW or BLOSUMfilename.

-pwgapopen=f

Gap opening penalty.

-pwgapext=f

Gap extension penalty.

Multiple Alignments:

-newtree=

File for new guide tree.

-usetree=

File for old guide tree.

-matrix=

Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename.

-dnamatrix=

DNA weight matrix=IUB, CLUSTALW or filename.

-gapopen=f

Gap opening penalty.

-gapext=f

Gap extension penalty.

-engaps

No end gap separation pen.

-gapdist=n

Gap separation pen. range.

-nogap

Residue-specific gaps off.

-nohgap

Hydrophilic gaps off.

-hgapresidues=

List hydrophilic res.

-maxdiv=n

Percent identity for delay.

-type=

PROTEIN or DNA

-transweight=f

Transitions weighting.

-iteration=

NONE or TREE or ALIGNMENT.

-numiter=n

Maximum number of iterations to perform.

Profile Alignments:

-profile

Merge two alignments by profile alignment.

-newtree1=

File for new guide tree for profile1.

-newtree2=

File for new guide tree for profile2.

-usetree1=

File for old guide tree for profile1.

-usetree2=

File for old guide tree for profile2.

Sequence to Profile Alignments:

-sequences

Sequentially add profile2 sequences to profile1 alignment.

-newtree=

File for new guide tree.

-usetree=

File for old guide tree.

Structure Alignments:

-nosecstr1

Do not use secondary structure-gap penalty mask for profile 1.

-nosecstr2

Do not use secondary structure-gap penalty mask for profile 2.

-secstrout=STRUCTURE or MASK or BOTH or NONE

Output in alignment file.

-helixgap=n

Gap penalty for helix core residues.

-strandgap=n

Gap penalty for strand core residues.

loopgap=n

Gap penalty for loop regions.

-terminalgap=n

Gap penalty for structure termini.

-helixendin=n

Number of residues inside helix to be treated as terminal.

-helixendout=n

Number of residues outside helix to be treated as terminal.

-strandendin=n

Number of residues inside strand to be treated as terminal.

-strandendout=n

Number of residues outside strand to be treated as terminal.

Trees:

-outputtree=nj OR phylip OR dist OR nexus

-seed=n

Seed number for bootstraps.

-kimura

Use Kimura's correction.

-tossgaps

Ignore positions with gaps.

-bootlabels=node

Position of bootstrap values in tree display.

-clustering=

NJ or UPGMA.

BUGS¶

The Clustal bug tracking system can be found at http://bioinf.ucd.ie/bugzilla/buglist.cgi?quicksearch=clustal.

REFERENCES¶

•Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0.[1] Bioinformatics, 23, 2947-2948.

•Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. (2003). Multiple sequence alignment with the Clustal series of programs.[2] Nucleic Acids Res., 31, 3497-3500.

•Jeanmougin F, Thompson JD, Gouy M, Higgins DG, Gibson TJ. (1998). Multiple sequence alignment with Clustal X[3]. Trends Biochem Sci., 23, 403-405.

•Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. (1997). The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.[4] Nucleic Acids Res., 25, 4876-4882.

•Higgins DG, Thompson JD, Gibson TJ. (1996). Using CLUSTAL for multiple sequence alignments.[5] Methods Enzymol., 266, 383-402.

•Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.[6] Nucleic Acids Res., 22, 4673-4680.

•Higgins DG. (1994). CLUSTAL V: multiple alignment of DNA and protein sequences.[7] Methods Mol Biol., 25, 307-318

•Higgins DG, Bleasby AJ, Fuchs R. (1992). CLUSTAL V: improved software for multiple sequence alignment.[8] Comput. Appl. Biosci., 8, 189-191.

•Higgins,D.G. and Sharp,P.M. (1989). Fast and sensitive multiple sequence alignments on a microcomputer.[9] Comput. Appl. Biosci., 5, 151-153.

•Higgins,D.G. and Sharp,P.M. (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer.[10] Gene, 73, 237-244.

AUTHORS¶

Des Higgins

Julie Thompson

Toby Gibson

Charles Plessy <plessy@debian.org>

Prepared this manpage in DocBook XML for the Debian distribution.

COPYRIGHT¶

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/, or on Debian systems, /usr/share/common-licenses/LGPL-3.

This manual page and its XML source can be used, modified, and redistributed as if it were in public domain.

NOTES¶

1.: Clustal W and Clustal X version 2.0.

http://www.ncbi.nlm.nih.gov/pubmed/17846036

2.: Multiple sequence alignment with the Clustal series of programs.

http://www.ncbi.nlm.nih.gov/pubmed/12824352

3.: Multiple sequence alignment with Clustal X

http://www.ncbi.nlm.nih.gov/pubmed/9810230

4.: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.

http://www.ncbi.nlm.nih.gov/pubmed/9396791

5.: Using CLUSTAL for multiple sequence alignments.

http://www.ncbi.nlm.nih.gov/pubmed/8743695

6.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

http://www.ncbi.nlm.nih.gov/pubmed/7984417