NAME¶
- HMMER - profile hidden Markov model software
-
SYNOPSIS¶
- hmm2align
- Align multiple sequences to a profile HMM.
- hmm2build
- Build a profile HMM from a given multiple sequence alignment.
- hmm2calibrate
- Determine appropriate statistical significance parameters for a profile
HMM prior to doing database searches.
- hmm2convert
- Convert HMMER profile HMMs to other formats, such as GCG profiles.
- hmm2emit
- Generate sequences probabilistically from a profile HMM.
- hmm2fetch
- Retrieve an HMM from an HMM database
- hmm2index
- Create a binary SSI index for an HMM database
- hmm2pfam
- Search a profile HMM database with a sequence (i.e., annotate various
kinds of domains in the query sequence).
- hmm2search
- Search a sequence database with a profile HMM (i.e., find additional
homologues of a modeled family).
DESCRIPTION¶
These programs use profile hidden Markov models (profile HMMs) to model the
primary structure consensus of a family of protein or nucleic acid sequences.
OPTIONS¶
All
HMMER programs give a brief summary of their command-line syntax and
options if invoked without any arguments. When invoked with the single
argument,
-h (i.e., help), a program will report more verbose
command-line usage information, including rarely used, experimental, and
expert options.
-h will report version numbers which are useful if you
need to report a bug or problem to me.
Each
HMMER program has its own man page briefly summarizing command line
usage. There is also a user's guide that came with the software distribution,
which includes a tutorial introduction and more detailed descriptions of the
programs.
See
http://hmmer.janelia.org/ for on-line documentation and the current HMMER
release.
In general, no command line options should be needed by beginning users. The
defaults are set up for optimum performance in most situations. Options that
are single lowercase letters (e.g.
-a ) are "common" options
that are expected to be frequently used and will be important in many
applications. Options that are single uppercase letters (e.g.
-B ) are
usually less common options, but also may be important in some applications.
Options that are full words (e.g.
--verbose ) are either rarely used,
experimental, or expert options. Some experimental options are only there for
my own ongoing experiments with HMMER, and may not be supported or documented
adequately.
In general,
HMMER attempts to read most common biological sequence file
formats. It autodetects the format of the file. It also autodetects whether
the sequences are protein or nucleic acid. Standard IUPAC degeneracy codes are
allowed in addition to the usual 4-letter or 20-letter codes.
- Unaligned sequences
- Unaligned sequence files may be in FASTA, Swissprot, EMBL, GenBank, PIR,
Intelligenetics, Strider, or GCG format. These formats are documented in
the User's Guide.
- Sequence alignments
- Multiple sequence alignments may be in CLUSTALW, SELEX, or GCG MSF format.
These formats are documented in the User's Guide.
ENVIRONMENT VARIABLES¶
For ease of using large stable sequence and HMM databases,
HMMER looks
for sequence files and HMM files in the current working directory as well as
in system directories specified by environment variables.
- BLASTDB
- Specifies the directory location of sequence databases. Example:
/seqlibs/blast-db/. In installations that use BLAST software, this
environment variable is likely to already be set.
- HMMERDB
- Specifies the directory location of HMM databases. Example:
/seqlibs/pfam/.