Scroll to navigation

ALF(1) ALF(1)

NAME

alf - Alignment free sequence comparison

SYNOPSIS

alf [OPTIONS] -i IN.FASTA [-o OUT.TXT]

DESCRIPTION

Compute pairwise similarity of sequences using alignment-free methods in IN.FASTA and write out tab-delimited matrix with pairwise scores to OUT.TXT.

OPTIONS

Display the help message.
Display version information.
When given, details about the progress are printed to the screen.

Input / Output:

Name of the multi-FASTA input file. Valid filetypes are: .sam[.*], .raw[.*], .gbk[.*], .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent (de)compression.
Name of the file to which the tab-delimtied matrix with pairwise scores will be written to. Default is to write to stdout. Valid filetype is: .alf[.*], where * is any of the following extensions: tsv for transparent (de)compression.

General Algorithm Parameters:

Select method to use. One of N2, D2, D2Star, and D2z. Default: N2.
Size of the k-mers. Default: 4.
Order of background Markov Model. Default: 1.

N2 Algorithm Parameters:

Which strand to score. Use both_strands to score both strands simultaneously. One of input, both_strands, mean, min, and max. Default: input.
Number of mismatches, one of 0 and 1. When 1 is used, N2 uses the k-mer-neighbour with one mismatch. Default: 0.
Real-valued weight of counts for words with mismatches. Default: 0.1.
Print k-mer weights for every sequence to this file if given. Valid filetype is: .txt.

CONTACT AND REFERENCES

Jonathan Goeke <goeke@molgen.mpg.de>
Jonathan Goeke, Marcel H. Schulz, Julia Lasserre, and Martin Vingron. Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with Word Neighbourhood Counts. Bioinformatics (2012).
http://www.seqan.de/projects/alf
alf 1.1.10 [tarball]