table of contents
other versions
- jessie 1.5.3-2
- jessie-backports 1.5.9+ds-2~bpo8+1
- stretch 1.5.9+ds-4
- testing 1.5.10+ds-2
- unstable 1.5.10+ds-3
GT-EXTRACTSEQ(1) | GenomeTools Manual | GT-EXTRACTSEQ(1) |
NAME¶
gt-extractseq - Extract sequences from given sequence file(s) or fastaindex.SYNOPSIS¶
gt extractseq [option ...] [sequence_file(s)] | fastaindexDESCRIPTION¶
-frompos [value]extract sequence from this position counting from 1 on
(default: 0)
-topos [value]
extract sequence up to this position counting from 1 on
(default: 0)
-match [string]
extract all sequences whose description matches the given
pattern. The given pattern must be a valid extended regular expression.
(default: undefined)
-keys [filename]
extract substrings for keys in specified file (default:
undefined)
-width [value]
set output width for FASTA sequence printing (0 disables
formatting) (default: 0)
-o [filename]
redirect output to specified file (default:
undefined)
-gzip [yes|no]
write gzip compressed output file (default: no)
-bzip2 [yes|no]
write bzip2 compressed output file (default: no)
-force [yes|no]
force writing to output file (default: no)
-help
display help and exit
-version
display version information and exit
The option -keys allows one to extract substrings or sequences from the given
sequence file or from a fasta index. The substrings to be extracted are
specified in a key file given as argument to this option. The key file must
contain lines of the form
k
k i j
>tr|A0AQI4|A0AQI4_9ARCH Putative ammonia monooxygenase (Fragment)
>k i j
•duplicated lines in the input file lead to only
one sequence in the output
•the sequences are output according to the order
in the original sequence files
•the formatting of the output can be controlled by
the options -width, -o, -gzip, and -bzip2
If the sequence input comes from a fasta index (see below), the following holds:
•option -width is required
•option -o, -gzip and -bzip2
do not work
•the sequences are output in the order the
corresponding keys appear in the key file
If the end of the argument list only contains one filename, say fastaindex, then
it is checked if there is a file fastaindex.kys. This makes up part of the
fasta index, which is contructed by calling the suffixerator tool as follows:
gt suffixerator -protein -ssp -tis -des -sds -kys -indexname fastaindex \ -db inputfile1 [inputfile2 ..]
•a file fastaindex.esq representing the
sequence.
•a file fastaindex.ssp specifying the sequence
separator positions.
•a file fastaindex.des showing the fasta headers
line by line.
•a file fastaindex.sds giving the sequence header
delimiter positions.
•a file fastaindex.kys containing the keys in the
fasta files.
For the suffixerator command to work, the keys of the form |key| in the fasta
header must satisfy the following constraints:
•they all have to be of the same length, not
longer than 128, and not shorter than 1
•they have to appear in lexicographic order
REPORTING BUGS¶
Report bugs to <gt-users@genometools.org>.09/05/2014 | GenomeTools 1.5.3 |