'\" t .\" Title: gt-extractfeat .\" Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author] .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 07/22/2020 .\" Manual: GenomeTools Manual .\" Source: GenomeTools 1.6.1 .\" Language: English .\" .TH "GT\-EXTRACTFEAT" "1" "07/22/2020" "GenomeTools 1\&.6\&.1" "GenomeTools Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" gt-extractfeat \- Extract features given in GFF3 file from sequence file\&. .SH "SYNOPSIS" .sp \fBgt extractfeat\fR [option \&...] [GFF3_file] .SH "DESCRIPTION" .PP \fB\-type\fR [\fIstring\fR] .RS 4 set type of features to extract (default: undefined) .RE .PP \fB\-join\fR [\fIyes|no\fR] .RS 4 join feature sequences in the same subgraph into a single one (default: no) .RE .PP \fB\-translate\fR [\fIyes|no\fR] .RS 4 translate the features (of a DNA sequence) into protein (default: no) .RE .PP \fB\-seqid\fR [\fIyes|no\fR] .RS 4 add sequence ID of extracted features to FASTA descriptions (default: no) .RE .PP \fB\-target\fR [\fIyes|no\fR] .RS 4 add target ID(s) of extracted features to FASTA descriptions (default: no) .RE .PP \fB\-coords\fR [\fIyes|no\fR] .RS 4 add location of extracted features to FASTA descriptions (default: no) .RE .PP \fB\-retainids\fR [\fIyes|no\fR] .RS 4 use ID attributes of extracted features as FASTA descriptions (default: no) .RE .PP \fB\-gcode\fR [\fIvalue\fR] .RS 4 specify genetic code to use (default: 1) .RE .PP \fB\-seqfile\fR [\fIfilename\fR] .RS 4 set the sequence file from which to take the sequences (default: undefined) .RE .PP \fB\-encseq\fR [\fIfilename\fR] .RS 4 set the encoded sequence indexname from which to take the sequences (default: undefined) .RE .PP \fB\-seqfiles\fR .RS 4 set the sequence files from which to extract the features use \fI\-\-\fR to terminate the list of sequence files .RE .PP \fB\-matchdesc\fR [\fIyes|no\fR] .RS 4 search the sequence descriptions from the input files for the desired sequence IDs (in GFF3), reporting the first match (default: no) .RE .PP \fB\-matchdescstart\fR [\fIyes|no\fR] .RS 4 exactly match the sequence descriptions from the input files for the desired sequence IDs (in GFF3) from the beginning to the first whitespace (default: no) .RE .PP \fB\-usedesc\fR [\fIyes|no\fR] .RS 4 use sequence descriptions to map the sequence IDs (in GFF3) to actual sequence entries\&. If a description contains a sequence range (e\&.g\&., III:1000001\&.\&.2000000), the first part is used as sequence ID (\fIIII\fR) and the first range position as offset (\fI1000001\fR) (default: no) .RE .PP \fB\-regionmapping\fR [\fIstring\fR] .RS 4 set file containing sequence\-region to sequence file mapping (default: undefined) .RE .PP \fB\-v\fR [\fIyes|no\fR] .RS 4 be verbose (default: no) .RE .PP \fB\-width\fR [\fIvalue\fR] .RS 4 set output width for FASTA sequence printing (0 disables formatting) (default: 0) .RE .PP \fB\-o\fR [\fIfilename\fR] .RS 4 redirect output to specified file (default: undefined) .RE .PP \fB\-gzip\fR [\fIyes|no\fR] .RS 4 write gzip compressed output file (default: no) .RE .PP \fB\-bzip2\fR [\fIyes|no\fR] .RS 4 write bzip2 compressed output file (default: no) .RE .PP \fB\-force\fR [\fIyes|no\fR] .RS 4 force writing to output file (default: no) .RE .PP \fB\-help\fR .RS 4 display help and exit .RE .PP \fB\-version\fR .RS 4 display version information and exit .RE .sp Genetic code numbers for option \fI\-gcode\fR: .sp 1: Standard 2: Vertebrate Mitochondrial 3: Yeast Mitochondrial 4: Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma 5: Invertebrate Mitochondrial 6: Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear 9: Echinoderm Mitochondrial; Flatworm Mitochondrial 10: Euplotid Nuclear 11: Bacterial, Archaeal and Plant Plastid 12: Alternative Yeast Nuclear 13: Ascidian Mitochondrial 14: Alternative Flatworm Mitochondrial 15: Blepharisma Macronuclear 16: Chlorophycean Mitochondrial 21: Trematode Mitochondrial 22: Scenedesmus obliquus Mitochondrial 23: Thraustochytrium Mitochondrial 24: Pterobranchia Mitochondrial 25: Candidate Division SR1 and Gracilibacteria .sp File format for option \fI\-regionmapping\fR: .sp The file supplied to option \-regionmapping defines a \(lqmapping\(rq\&. A mapping maps the sequence\-region entries given in the \fIGFF3_file\fR to a sequence file containing the corresponding sequence\&. Mappings can be defined in one of the following two forms: .sp .if n \{\ .RS 4 .\} .nf mapping = { chr1 = "hs_ref_chr1\&.fa\&.gz", chr2 = "hs_ref_chr2\&.fa\&.gz" } .fi .if n \{\ .RE .\} .sp or .sp .if n \{\ .RS 4 .\} .nf function mapping(sequence_region) return "hs_ref_"\&.\&.sequence_region\&.\&."\&.fa\&.gz" end .fi .if n \{\ .RE .\} .sp The first form defines a Lua (http://www\&.lua\&.org) table named \(lqmapping\(rq which maps each sequence region to the corresponding sequence file\&. The second one defines a Lua function \(lqmapping\(rq, which has to return the sequence file name when it is called with the sequence_region as argument\&. .SH "REPORTING BUGS" .sp Report bugs to https://github\&.com/genometools/genometools/issues\&.