NAME¶
fastahack - indexing and sequence extraction from FASTA files
SYNOPSIS¶
fastahack [options] <fasta reference>
DESCRIPTION¶
fastahack is a small application for indexing and extracting sequences and
subsequences from FASTA files. The included Fasta.cpp library provides a FASTA
reader and indexer that can be embedded into applications which would benefit
from directly reading subsequences from FASTA files. The library automatically
handles index file generation and use.
Features:
- FASTA index (.fai) generation for FASTA files
- Sequence extraction
- Subsequence extraction
- Sequence statistics (currently only entropy is provided)
Sequence and subsequence extraction use fseek64 to provide fastest-possible
extraction without RAM-intensive file loading operations. This makes fastahack
a useful tool for bioinformaticists who need to quickly extract many
subsequences from a reference FASTA sequence.
OPTIONS¶
- -i, --index
- generate fasta index <fasta reference>.fai
- -r, --region REGION
- print the specified region
- -c, --stdin
- read a stream of line-delimited region specifiers on stdin and print the
corresponding sequence for each on stdout
- -e, --entropy
- print the shannon entropy of the specified region
- -d, --dump
- print the fasta file in the form 'seq_name <tab> sequence'
REGION is of the form
- <seq>, <seq>:<start>[sep]<end>,
<seq1>:<start>[sep]<seq2>:<end>
where start and end are 1-based, and the region includes the end position. [sep]
is "-" or ".."
Specifying a sequence name alone will return the entire sequence, specifying
range will return that range, and specifying a single coordinate pair, e.g.
<seq>:<start> will return just that base.
AUTHOR¶
This software was written by Erik Garrison <erik.garrison@bc.edu>.
This manpage was written by Andreas Tille for the Debian distribution and can be
used for any other usage of the program.