NAME¶

fastahack - indexing and sequence extraction from FASTA files

SYNOPSIS¶

fastahack [options] <fasta reference>

DESCRIPTION¶

fastahack is a small application for indexing and extracting sequences and subsequences from FASTA files. The included Fasta.cpp library provides a FASTA reader and indexer that can be embedded into applications which would benefit from directly reading subsequences from FASTA files. The library automatically handles index file generation and use.

Features:

: FASTA index (.fai) generation for FASTA files
: Sequence extraction
: Subsequence extraction
: Sequence statistics (currently only entropy is provided)

Sequence and subsequence extraction use fseek64 to provide fastest-possible extraction without RAM-intensive file loading operations. This makes fastahack a useful tool for bioinformaticists who need to quickly extract many subsequences from a reference FASTA sequence.

OPTIONS¶

-i, --index: generate fasta index <fasta reference>.fai
-r, --region REGION: print the specified region
-c, --stdin: read a stream of line-delimited region specifiers on stdin and print the corresponding sequence for each on stdout
-e, --entropy: print the shannon entropy of the specified region
-d, --dump: print the fasta file in the form 'seq_name <tab> sequence'

REGION is of the form

: <seq>, <seq>:<start>[sep]<end>, <seq1>:<start>[sep]<seq2>:<end>

where start and end are 1-based, and the region includes the end position. [sep] is "-" or ".."

Specifying a sequence name alone will return the entire sequence, specifying range will return that range, and specifying a single coordinate pair, e.g. <seq>:<start> will return just that base.

AUTHOR¶

This software was written by Erik Garrison <erik.garrison@bc.edu>.

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.

June 2016

fastahack 0.0+20160309

Source file:	fastahack.1.en.gz (from libfastahack0 )
Source last updated:	2019-09-09T11:33:41Z
Converted to HTML:	2022-09-07T21:58:46Z