'\" t
.\"     Title: gt-extractfeat
.\"    Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
.\"      Date: 07/22/2020
.\"    Manual: GenomeTools Manual
.\"    Source: GenomeTools 1.6.1
.\"  Language: English
.\"
.TH "GT\-EXTRACTFEAT" "1" "07/22/2020" "GenomeTools 1\&.6\&.1" "GenomeTools Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
gt-extractfeat \- Extract features given in GFF3 file from sequence file\&.
.SH "SYNOPSIS"
.sp
\fBgt extractfeat\fR [option \&...] [GFF3_file]
.SH "DESCRIPTION"
.PP
\fB\-type\fR [\fIstring\fR]
.RS 4
set type of features to extract (default: undefined)
.RE
.PP
\fB\-join\fR [\fIyes|no\fR]
.RS 4
join feature sequences in the same subgraph into a single one (default: no)
.RE
.PP
\fB\-translate\fR [\fIyes|no\fR]
.RS 4
translate the features (of a DNA sequence) into protein (default: no)
.RE
.PP
\fB\-seqid\fR [\fIyes|no\fR]
.RS 4
add sequence ID of extracted features to FASTA descriptions (default: no)
.RE
.PP
\fB\-target\fR [\fIyes|no\fR]
.RS 4
add target ID(s) of extracted features to FASTA descriptions (default: no)
.RE
.PP
\fB\-coords\fR [\fIyes|no\fR]
.RS 4
add location of extracted features to FASTA descriptions (default: no)
.RE
.PP
\fB\-retainids\fR [\fIyes|no\fR]
.RS 4
use ID attributes of extracted features as FASTA descriptions (default: no)
.RE
.PP
\fB\-gcode\fR [\fIvalue\fR]
.RS 4
specify genetic code to use (default: 1)
.RE
.PP
\fB\-seqfile\fR [\fIfilename\fR]
.RS 4
set the sequence file from which to take the sequences (default: undefined)
.RE
.PP
\fB\-encseq\fR [\fIfilename\fR]
.RS 4
set the encoded sequence indexname from which to take the sequences (default: undefined)
.RE
.PP
\fB\-seqfiles\fR
.RS 4
set the sequence files from which to extract the features use
\fI\-\-\fR
to terminate the list of sequence files
.RE
.PP
\fB\-matchdesc\fR [\fIyes|no\fR]
.RS 4
search the sequence descriptions from the input files for the desired sequence IDs (in GFF3), reporting the first match (default: no)
.RE
.PP
\fB\-matchdescstart\fR [\fIyes|no\fR]
.RS 4
exactly match the sequence descriptions from the input files for the desired sequence IDs (in GFF3) from the beginning to the first whitespace (default: no)
.RE
.PP
\fB\-usedesc\fR [\fIyes|no\fR]
.RS 4
use sequence descriptions to map the sequence IDs (in GFF3) to actual sequence entries\&. If a description contains a sequence range (e\&.g\&., III:1000001\&.\&.2000000), the first part is used as sequence ID (\fIIII\fR) and the first range position as offset (\fI1000001\fR) (default: no)
.RE
.PP
\fB\-regionmapping\fR [\fIstring\fR]
.RS 4
set file containing sequence\-region to sequence file mapping (default: undefined)
.RE
.PP
\fB\-v\fR [\fIyes|no\fR]
.RS 4
be verbose (default: no)
.RE
.PP
\fB\-width\fR [\fIvalue\fR]
.RS 4
set output width for FASTA sequence printing (0 disables formatting) (default: 0)
.RE
.PP
\fB\-o\fR [\fIfilename\fR]
.RS 4
redirect output to specified file (default: undefined)
.RE
.PP
\fB\-gzip\fR [\fIyes|no\fR]
.RS 4
write gzip compressed output file (default: no)
.RE
.PP
\fB\-bzip2\fR [\fIyes|no\fR]
.RS 4
write bzip2 compressed output file (default: no)
.RE
.PP
\fB\-force\fR [\fIyes|no\fR]
.RS 4
force writing to output file (default: no)
.RE
.PP
\fB\-help\fR
.RS 4
display help and exit
.RE
.PP
\fB\-version\fR
.RS 4
display version information and exit
.RE
.sp
Genetic code numbers for option \fI\-gcode\fR:
.sp
1: Standard 2: Vertebrate Mitochondrial 3: Yeast Mitochondrial 4: Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma 5: Invertebrate Mitochondrial 6: Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear 9: Echinoderm Mitochondrial; Flatworm Mitochondrial 10: Euplotid Nuclear 11: Bacterial, Archaeal and Plant Plastid 12: Alternative Yeast Nuclear 13: Ascidian Mitochondrial 14: Alternative Flatworm Mitochondrial 15: Blepharisma Macronuclear 16: Chlorophycean Mitochondrial 21: Trematode Mitochondrial 22: Scenedesmus obliquus Mitochondrial 23: Thraustochytrium Mitochondrial 24: Pterobranchia Mitochondrial 25: Candidate Division SR1 and Gracilibacteria
.sp
File format for option \fI\-regionmapping\fR:
.sp
The file supplied to option \-regionmapping defines a \(lqmapping\(rq\&. A mapping maps the sequence\-region entries given in the \fIGFF3_file\fR to a sequence file containing the corresponding sequence\&. Mappings can be defined in one of the following two forms:
.sp
.if n \{\
.RS 4
.\}
.nf
mapping = {
  chr1  = "hs_ref_chr1\&.fa\&.gz",
  chr2  = "hs_ref_chr2\&.fa\&.gz"
}
.fi
.if n \{\
.RE
.\}
.sp
or
.sp
.if n \{\
.RS 4
.\}
.nf
function mapping(sequence_region)
  return "hs_ref_"\&.\&.sequence_region\&.\&."\&.fa\&.gz"
end
.fi
.if n \{\
.RE
.\}
.sp
The first form defines a Lua (http://www\&.lua\&.org) table named \(lqmapping\(rq which maps each sequence region to the corresponding sequence file\&. The second one defines a Lua function \(lqmapping\(rq, which has to return the sequence file name when it is called with the sequence_region as argument\&.
.SH "REPORTING BUGS"
.sp
Report bugs to https://github\&.com/genometools/genometools/issues\&.