Scroll to navigation



fastahack - indexing and sequence extraction from FASTA files


fastahack [options] <fasta reference>


fastahack is a small application for indexing and extracting sequences and subsequences from FASTA files. The included Fasta.cpp library provides a FASTA reader and indexer that can be embedded into applications which would benefit from directly reading subsequences from FASTA files. The library automatically handles index file generation and use.


FASTA index (.fai) generation for FASTA files
Sequence extraction
Subsequence extraction
Sequence statistics (currently only entropy is provided)

Sequence and subsequence extraction use fseek64 to provide fastest-possible extraction without RAM-intensive file loading operations. This makes fastahack a useful tool for bioinformaticists who need to quickly extract many subsequences from a reference FASTA sequence.


-i, --index
generate fasta index <fasta reference>.fai
-r, --region REGION
print the specified region
-c, --stdin
read a stream of line-delimited region specifiers on stdin and print the corresponding sequence for each on stdout
-e, --entropy
print the shannon entropy of the specified region
-d, --dump
print the fasta file in the form 'seq_name <tab> sequence'

REGION is of the form

<seq>, <seq>:<start>[sep]<end>, <seq1>:<start>[sep]<seq2>:<end>

where start and end are 1-based, and the region includes the end position. [sep] is "-" or ".."

Specifying a sequence name alone will return the entire sequence, specifying range will return that range, and specifying a single coordinate pair, e.g. <seq>:<start> will return just that base.


This software was written by Erik Garrison <>.

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.

June 2016 fastahack 0.0+20160309