table of contents
gt-extractseq - Extract sequences from given sequence file(s) or fastaindex.
gt extractseq [option ...] [sequence_file(s)] | fastaindex
The option -keys allows one to extract substrings or sequences from the given sequence file or from a fasta index. The substrings to be extracted are specified in a key file given as argument to this option. The key file must contain lines of the form
k i j
where k is a string (the key) and the optional i and j are positive integers such that i⇐j. k is the key and the optional numbers i and j specify the first position of the substring and the last position of the substring to be extracted. The positions are counted from 1. If k is identical to the string between the first first and second occurrence of the symbol | in a fasta header, then the fasta header and the corresponding sequence is output. For example in the fasta header
>tr|A0AQI4|A0AQI4_9ARCH Putative ammonia monooxygenase (Fragment)
the fasta key is A0AQI4. If i and j are both specified, then the corresponding substring is shown in fasta format. In the latter case the header of the fasta formatted sequence in the output begins with
>k i j
followed by the original original fasta header.
If the sequence input are fasta files, then the following holds:
If the sequence input comes from a fasta index (see below), the following holds:
If the end of the argument list only contains one filename, say fastaindex, then it is checked if there is a file fastaindex.kys. This makes up part of the fasta index, which is constructed by calling the suffixerator tool as follows:
gt suffixerator -protein -ssp -tis -des -sds -kys -indexname fastaindex \
-db inputfile1 [inputfile2 ..]
This reads the protein sequence files given to the option -db and creates several files:
For the suffixerator command to work, the keys of the form |key| in the fasta header must satisfy the following constraints:
Report bugs to https://github.com/genometools/genometools/issues.