Scroll to navigation

HHALIGN(1) User Commands HHALIGN(1)

NAME

hhalign - align a query alignment/HMM to a template alignment/HMM

SYNOPSIS

hhalign -i query -t template [options]

DESCRIPTION

HHalign 3.0.0 (15-03-2015) Align a query alignment/HMM to a template alignment/HMM by HMM-HMM alignment If only one alignment/HMM is given it is compared to itself and the best off-diagonal alignment plus all further non-overlapping alignments above significance threshold are shown. Remmert M, Biegert A, Hauser A, and Soding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9:173-175 (2011). (C) Johannes Soeding, Michael Remmert, Andreas Biegert, Andreas Hauser
-i <file>
input/query: single sequence or multiple sequence alignment (MSA) in a3m, a2m, or FASTA format, or HMM in hhm format
-t <file>
input/template: single sequence or multiple sequence alignment (MSA) in a3m, a2m, or FASTA format, or HMM in hhm format

<file> may be 'stdin' or 'stdout' throughout.

Input alignment format:

-M a2m
use A2M/A3M (default): upper case = Match; lower case = Insert;
'-' = Delete; '.' = gaps aligned to inserts (may be omitted)
-M first
use FASTA: columns with residue in 1st sequence are match states
-M [0,100]
use FASTA: columns with fewer than X% gaps are match states
-tags/-notags
do NOT / do neutralize His-, C-myc-, FLAG-tags, and trypsin recognition sequence to background distribution (def=-notags)

Output options:

-o <file>
write results in standard format to file (default=<infile.hhr>)
-oa3m <file>
write query alignment in a3m or PSI-BLAST format (-opsi) to file (default=none)
-aa3m <file>
append query alignment in a3m (-aa3m) or PSI-BLAST format (-apsi )to file (default=none)
-Ofas <file>
write pairwise alignments in FASTA xor A2M (-Oa2m) xor A3M (-Oa3m) format
-add_cons
generate consensus sequence as master sequence of query MSA (default=don't)
-hide_cons
don't show consensus sequence in alignments (default=show)
-hide_pred
don't show predicted 2ndary structure in alignments (default=show)
-hide_dssp
don't show DSSP 2ndary structure in alignments (default=show)
-show_ssconf
show confidences for predicted 2ndary structure in alignments
-seq <int>
max. number of query/template sequences displayed (default=1)
-aliw <int>
number of columns per line in alignment list (default=80)
-p [0,100]
minimum probability in summary and alignment list (default=0)
-E [0,inf[
maximum E-value in summary and alignment list (default=1E+06)
-Z <int>
maximum number of lines in summary hit list (default=100)
-z <int>
minimum number of lines in summary hit list (default=1)
-B <int>
maximum number of alignments in alignment list (default=100)
-b <int>
minimum number of alignments in alignment list (default=1)

Filter options applied to query MSA, template MSA, and result MSA

-id
[0,100] maximum pairwise sequence identity (def=90)
-diff [0,inf[
filter MSAs by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 Zero and non-numerical values turn off the filtering. (def=100)
-cov
[0,100] minimum coverage with master sequence (%) (def=0)
-qid
[0,100] minimum sequence identity with master sequence (%) (def=0)
-qsc
[0,100] minimum score per column with master sequence (default=-20.0)
-mark
do not filter out sequences marked by ">@"in their name line

HMM-HMM alignment options:

-norealign
do NOT realign displayed hits with MAC algorithm (def=realign)
-mact [0,1[
posterior prob threshold for MAC realignment controlling greediness at alignment ends: 0:global >0.1:local (default=0.35)
-glob/-loc
use global/local alignment mode for searching/ranking (def=local)
-realign
realign displayed hits with max. accuracy (MAC) algorithm
-excl <range>
exclude query positions from the alignment, e.g. '1-33,97-168'
-template_excl <range>
exclude template positions from the alignment, e.g. '1-33,97-168'
-ovlp <int>
banded alignment: forbid <ovlp> largest diagonals |i-j| of DP matrix (def=0)
-alt <int>
show up to this many alternative alignments with raw score > smin(def=1)
-smin <float>
minimum raw score for alternative alignments (def=20.0)
-shift [-1,1]
profile-profile score offset (def=-0.03)
-corr [0,1]
weight of term for pair correlations (def=0.10)
-sc
<int> amino acid score (tja: template HMM at column j) (def=1)
0
= log2 Sum(tja*qia/pa) (pa: aa background frequencies)
1
= log2 Sum(tja*qia/pqa) (pqa = 1/2*(pa+ta) )
2
= log2 Sum(tja*qia/ta) (ta: av. aa freqs in template)
3
= log2 Sum(tja*qia/qa) (qa: av. aa freqs in query)
5
local amino acid composition correction
-ssm {0,..,4}
secondary structure scoring [default=2]
0:
= no ss scoring
1,2:
= ss scoring after or during alignment
3,4:
= ss scoring after or during alignment, predicted vs. predicted
-ssw [0,1]
weight of ss score (def=0.11)
-ssa [0,1]
ss confusion matrix = (1-ssa)*I + ssa*psipred-confusion-matrix [def=1.00)
-wg
use global sequence weighting for realignment!

Gap cost options:

-gapb [0,inf[
Transition pseudocount admixture (def=1.00)
-gapd [0,inf[
Transition pseudocount admixture for open gap (default=0.15)
-gapd [0,inf[
Transition pseudocount admixture for open gap (default=0.15)
-gape [0,1.5]
Transition pseudocount admixture for extend gap (def=1.00)
-gapf ]0,inf]
factor to increase/reduce the gap open penalty for deletes (def=0.60)
-gapg ]0,inf]
factor to increase/reduce the gap open penalty for inserts (def=0.60)
-gaph ]0,inf]
factor to increase/reduce the gap extend penalty for deletes(def=0.60)
-gapi ]0,inf]
factor to increase/reduce the gap extend penalty for inserts(def=0.60)
-egq
[0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00)
-egt
[0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00)

Pseudocount (pc) options:

Context specific hhm pseudocounts:
-pc_hhm_contxt_mode {0,..,3}
position dependence of pc admixture 'tau' (pc mode, default=2)
0: no pseudo counts:
tau = 0
1: constant
tau = a
2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)^c) 3: CSBlast admixture: tau = a(1+b)/(Neff[i]+b) (Neff[i]: number of effective seqs in local MSA around column i)
-pc_hhm_contxt_a
[0,1] overall pseudocount admixture (def=0.9)
-pc_hhm_contxt_b
[1,inf[ Neff threshold value for mode 2 (def=4.0)
-pc_hhm_contxt_c
[0,3] extinction exponent c for mode 2 (def=1.0)
Context independent hhm pseudocounts (used for templates; used for query if contxt file is not available):
-pc_hhm_nocontxt_mode {0,..,3}
position dependence of pc admixture 'tau' (pc mode, default=2)
0: no pseudo counts:
tau = 0
1: constant
tau = a
2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)^c) (Neff[i]: number of effective seqs in local MSA around column i)
-pc_hhm_nocontxt_a
[0,1] overall pseudocount admixture (def=1.0)
-pc_hhm_nocontxt_b
[1,inf[ Neff threshold value for mode 2 (def=1.5)
-pc_hhm_nocontxt_c
[0,3] extinction exponent c for mode 2 (def=1.0)
Context-specific pseudo-counts:
-nocontxt
use substitution-matrix instead of context-specific pseudocounts

-contxt <file> context file for computing context-specific pseudocounts (default=./data/context_data.crf)

-csw
[0,inf] weight of central position in cs pseudocount mode (def=1.6)
-csb
[0,1] weight decay parameter for positions in cs pc mode (def=0.9)

Other options:

-v <int>
verbose mode: 0:no screen output 1:only warings 2: verbose (def=2)
-atab
<file> write all alignments in tabular layout to file
-maxres <int>
max number of HMM columns (def=20001)

-maxmem [1,inf[ limit memory for realignment (in GB) (def=3.0)

Example: hhalign -i T0187.a3m -t d1hz4a_.hhm -o result.hhr

February 2019 hhalign 3.0~beta3+dfsg