Scroll to navigation

HHALIGN(1) User Commands HHALIGN(1)

NAME

hhalign - align a query alignment/HMM to a template alignment/HMM

SYNOPSIS

hhalign -i query [-t template] [options]

DESCRIPTION

HHalign version 2.0.16 (January 2013) Align a query alignment/HMM to a template alignment/HMM by HMM-HMM alignment If only one alignment/HMM is given it is compared to itself and the best off-diagonal alignment plus all further non-overlapping alignments above significance threshold are shown. Remmert M, Biegert A, Hauser A, and Soding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9:173-175 (2011). (C) Johannes Soeding, Michael Remmert, Andreas Biegert, Andreas Hauser
-i <file>
input query alignment (fasta/a2m/a3m) or HMM file (.hhm)
-t <file>
input template alignment (fasta/a2m/a3m) or HMM file (.hhm)

Output options:

-o <file>
write output alignment to file
-ofas <file>
write alignments in FASTA, A2M (-oa2m) or A3M (-oa3m) format
-Oa3m <file>
write query alignment in a3m format to file (default=none)
-Aa3m <file>
append query alignment in a3m format to file (default=none)
-atab <file>
write alignment as a table (with posteriors) to file (default=none)
-index <file> use given alignment to calculate Viterbi score (default=none)
-v <int>
verbose mode: 0:no screen output 1:only warings 2: verbose
-seq
[1,inf[ max. number of query/template sequences displayed (def=1)
-nocons
don't show consensus sequence in alignments (default=show)
-nopred
don't show predicted 2ndary structure in alignments (default=show)
-nodssp
don't show DSSP 2ndary structure in alignments (default=show)
-ssconf
show confidences for predicted 2ndary structure in alignments
-aliw int
number of columns per line in alignment list (def=80)
-P <float>
for self-comparison: max p-value of alignments (def=0.001
-p <float>
minimum probability in summary and alignment list (def=0)
-E <float>
maximum E-value in summary and alignment list (def=1E+06)
-Z <int>
maximum number of lines in summary hit list (def=100)
-z <int>
minimum number of lines in summary hit list (def=1)
-B <int>
maximum number of alignments in alignment list (def=100)
-b <int>
minimum number of alignments in alignment list (def=1)
-rank int
specify rank of alignment to write with -Oa3m or -Aa3m option (default=1)

Filter input alignment (options can be combined):

-id
[0,100] maximum pairwise sequence identity (%) (def=90)
-diff [0,inf[ filter most diverse set of sequences, keeping at least this
many sequences in each block of >50 columns (def=100)
-cov
[0,100] minimum coverage with query (%) (def=0)
-qid
[0,100] minimum sequence identity with query (%) (def=0)
-qsc
[0,100] minimum score per column with query (def=-20.0)

Input alignment format:

-M a2m
use A2M/A3M (default): upper case = Match; lower case = Insert; '-' = Delete; '.' = gaps aligned to inserts (may be omitted)
-M first
use FASTA: columns with residue in 1st sequence are match states
-M [0,100]
use FASTA: columns with fewer than X% gaps are match states

HMM-HMM alignment options:

-glob/-loc
global or local alignment mode (def=local)
-alt <int>
show up to this number of alternative alignments (def=1)
-realign
realign displayed hits with max. accuracy (MAC) algorithm
-norealign
do NOT realign displayed hits with MAC algorithm (def=realign)
-mact [0,1[
posterior probability threshold for MAC alignment (def=0.350) A threshold value of 0.0 yields global alignments.
-sto <int>
use global stochastic sampling algorithm to sample this many alignments
-excl <range> exclude query positions from the alignment, e.g. '1-33,97-168'
-shift [-1,1] score offset (def=-0.030)
-corr [0,1]
weight of term for pair correlations (def=0.10)
-ssm
0-4 0:no ss scoring [default=2]
1:ss scoring after alignment 2:ss scoring during alignment
-ssw
[0,1] weight of ss score (def=0.11)
-def
read default options from ./.hhdefaults or <home>/.hhdefault.
Example: hhalign -i T0187.a3m -t d1hz4a_.hhm -png T0187pdb.png

Output options:

-o <file>
write output alignment to file
-ofas <file>
write alignments in FASTA, A2M (-oa2m) or A3M (-oa3m) format
-Oa3m <file>
write query alignment in a3m format to file (default=none)
-Aa3m <file>
append query alignment in a3m format to file (default=none)
-atab <file>
write alignment as a table (with posteriors) to file (default=none)
-v <int>
verbose mode: 0:no screen output 1:only warings 2: verbose
-seq
[1,inf[ max. number of query/template sequences displayed (def=1)
-nocons
don't show consensus sequence in alignments (default=show)
-nopred
don't show predicted 2ndary structure in alignments (default=show)
-nodssp
don't show DSSP 2ndary structure in alignments (default=show)
-ssconf
show confidences for predicted 2ndary structure in alignments
-aliw int
number of columns per line in alignment list (def=80)
-P <float>
for self-comparison: max p-value of alignments (def=0.001
-p <float>
minimum probability in summary and alignment list (def=0)
-E <float>
maximum E-value in summary and alignment list (def=1E+06)
-Z <int>
maximum number of lines in summary hit list (def=100)
-z <int>
minimum number of lines in summary hit list (def=1)
-B <int>
maximum number of alignments in alignment list (def=100)
-b <int>
minimum number of alignments in alignment list (def=1)
-rank int
specify rank of alignment to write with -Oa3m or -Aa3m option (default=1)
-tc <file>
write a TCoffee library file for the pairwise comparison
-tct [0,100]
min. probobability of residue pairs for TCoffee (def=5%)

Options to filter input alignment (options can be combined):

-id
[0,100] maximum pairwise sequence identity (%) (def=90)
-diff [0,inf[
filter most diverse set of sequences, keeping at least this many sequences in each block of >50 columns (def=100)
-cov
[0,100] minimum coverage with query (%) (def=0)
-qid
[0,100] minimum sequence identity with query (%) (def=0)
-qsc
[0,100] minimum score per column with query (def=-20.0)

HMM-building options:

-M a2m
use A2M/A3M (default): upper case = Match; lower case = Insert; '-' = Delete; '.' = gaps aligned to inserts (may be omitted)
-M first
use FASTA: columns with residue in 1st sequence are match states
-M [0,100]
use FASTA: columns with fewer than X% gaps are match states
-tags
do NOT neutralize His-, C-myc-, FLAG-tags, and trypsin recognition sequence to background distribution

Pseudocount (pc) options:

-pcm
0-2 position dependence of pc admixture 'tau' (pc mode, default=2)
0: no pseudo counts:
tau = 0
1: constant
tau = a
2: diversity-dependent: tau = a/(1 + ((Neff[i]-1)/b)^c) (Neff[i]: number of effective seqs in local MSA around column i) 3: constant diversity pseudocounts
-pca
[0,1] overall pseudocount admixture (def=1.0)
-pcb
[1,inf[ Neff threshold value for -pcm 2 (def=1.5)
-pcc
[0,3] extinction exponent c for -pcm 2 (def=1.0)
-pre_pca [0,1]
PREFILTER pseudocount admixture (def=0.8)
-pre_pcb [1,inf[ PREFILTER threshold for Neff (def=1.8)

Context-specific pseudo-counts:

-nocontxt
use substitution-matrix instead of context-specific pseudocounts
-contxt <file> context file for computing context-specific pseudocounts (default=./data/context_data.lib)
-cslib
<file> column state file for fast database prefiltering (default=./data/cs219.lib)

Gap cost options:

-gapb [0,inf[
Transition pseudocount admixture (def=1.00)
-gapd [0,inf[
Transition pseudocount admixture for open gap (default=0.15)
-gape [0,1.5]
Transition pseudocount admixture for extend gap (def=1.00)
-gapf ]0,inf]
factor to increase/reduce the gap open penalty for deletes (def=0.60)
-gapg ]0,inf]
factor to increase/reduce the gap open penalty for inserts (def=0.60)
-gaph ]0,inf]
factor to increase/reduce the gap extend penalty for deletes(def=0.60)
-gapi ]0,inf]
factor to increase/reduce the gap extend penalty for inserts(def=0.60)
-egq
[0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00)
-egt
[0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00)

Alignment options:

-glob/-loc
global or local alignment mode (def=global)
-mac
use Maximum Accuracy (MAC) alignment instead of Viterbi
-mact [0,1]
posterior prob threshold for MAC alignment (def=0.350)
-sto <int>
use global stochastic sampling algorithm to sample this many alignments
-sc
<int> amino acid score (tja: template HMM at column j) (def=1)
0
= log2 Sum(tja*qia/pa) (pa: aa background frequencies)
1
= log2 Sum(tja*qia/pqa) (pqa = 1/2*(pa+ta) )
2
= log2 Sum(tja*qia/ta) (ta: av. aa freqs in template)
3
= log2 Sum(tja*qia/qa) (qa: av. aa freqs in query)
-corr [0,1]
weight of term for pair correlations (def=0.10)
-shift [-1,1]
score offset (def=-0.030)
-r
repeat identification: multiple hits not treated as independent
-ssm
0-2 0:no ss scoring [default=2]
1:ss scoring after alignment 2:ss scoring during alignment
-ssw
[0,1] weight of ss score compared to column score (def=0.11)
-ssa
[0,1] ss confusion matrix = (1-ssa)*I + ssa*psipred-confusion-matrix [def=1.00)
-calm 0-3
empirical score calibration of 0:query 1:template 2:both (def=off)
Default options can be specified in './.hhdefaults' or '~/.hhdefaults'
November 2014 hhalign 2.0.16