.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8. .TH GFFREAD "1" "June 2019" "gffread 0.11.2" "User Commands" .SH NAME gffread \- GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction .SH SYNOPSIS .B gffread [\-g | ][\-s ] [\-o ] [\-t ] [\-r [[]:].. [\-R]] [\-CTVNJMKQAFPGUBHZWTOLE] [\-w ] [\-x ] [\-y ] [\-i ] [\-\-sort\-by ] .SH DESCRIPTION .IP Filter and convert GFF3/GTF2 records, extract corresponding sequences etc. By default (i.e. without \fB\-O\fR) only process transcripts, ignore other features. .IP is a GFF file, use '\-' for stdin .SH OPTIONS .TP \fB\-i\fR discard transcripts having an intron larger than .TP \fB\-l\fR discard transcripts shorter than bases .TP \fB\-r\fR only show transcripts overlapping coordinate range .. (on chromosome/contig , strand if provided) .TP \fB\-R\fR for \fB\-r\fR option, discard all transcripts that are not fully contained within the given range .TP \fB\-U\fR discard single\-exon transcripts .TP \fB\-C\fR coding only: discard mRNAs that have no CDS features .HP \fB\-\-nc\fR non\-coding only: discard mRNAs that have CDS features .HP \fB\-\-ignore\-locus\fR : discard locus features and attributes found in the input .TP \fB\-A\fR use the description field from and add it as the value for a 'descr' attribute to the GFF record .TP \fB\-s\fR is a tab\-delimited file providing this info for each of the mapped sequences: (useful for \fB\-A\fR option with mRNA/EST/protein mappings) .PP Sorting: (by default, chromosomes are kept in the order they were found) .HP \fB\-\-sort\-alpha\fR : chromosomes (reference sequences) are sorted alphabetically .HP \fB\-\-sort\-by\fR : sort the reference sequences by the order in which their .IP names are given in the file .SS "Misc options:" .TP \fB\-F\fR attempt to preserve all GFF attributes preservation .HP \fB\-\-keep\-exon\-attrs\fR : for \fB\-F\fR option, do not attempt to reduce redundant .IP exon/CDS attributes .TP \fB\-G\fR do not keep exon attributes, move them to the transcript feature (for GFF3 output) .HP \fB\-\-keep\-genes\fR : in transcript\-only mode (default), also preserve gene records .HP \fB\-\-keep\-comments\fR: for GFF3 input/output, try to preserve comments .TP \fB\-O\fR process other non\-transcript GFF records (by default non\-transcript records are ignored) .TP \fB\-V\fR discard any mRNAs with CDS having in\-frame stop codons (requires \fB\-g\fR) .TP \fB\-H\fR for \fB\-V\fR option, check and adjust the starting CDS phase if the original phase leads to a translation with an in\-frame stop codon .TP \fB\-B\fR for \fB\-V\fR option, single\-exon transcripts are also checked on the opposite strand (requires \fB\-g\fR) .TP \fB\-P\fR add transcript level GFF attributes about the coding status of each transcript, including partialness or in\-frame stop codons (requires \fB\-g\fR) .HP \fB\-\-add\-hasCDS\fR : add a "hasCDS" attribute with value "true" for transcripts .IP that have CDS features .HP \fB\-\-adj\-stop\fR stop codon adjustment: enables \fB\-P\fR and performs automatic .IP adjustment of the CDS stop coordinate if premature or downstream .TP \fB\-N\fR discard multi\-exon mRNAs that have any intron with a non\-canonical splice site consensus (i.e. not GT\-AG, GC\-AG or AT\-AC) .TP \fB\-J\fR discard any mRNAs that either lack initial START codon or the terminal STOP codon, or have an in\-frame stop codon (i.e. only print mRNAs with a complete CDS) .HP \fB\-\-no\-pseudo\fR: filter out records matching the 'pseudo' keyword .HP \fB\-\-in\-bed\fR: input should be parsed as BED format (automatic if the input .IP filename ends with .bed*) .HP \fB\-\-in\-tlf\fR: input GFF\-like one\-line\-per\-transcript format without exon/CDS .IP features (see \fB\-\-tlf\fR option below); automatic if the input filename ends with .tlf) .SS "Clustering:" .HP \fB\-M\fR/\-\-merge : cluster the input transcripts into loci, discarding .IP "duplicated" transcripts (those with the same exact introns and fully contained or equal boundaries) .HP \fB\-d\fR : for \fB\-M\fR option, write duplication info to file .HP \fB\-\-cluster\-only\fR: same as \fB\-M\fR/\-\-merge but without discarding any of the .IP "duplicate" transcripts, only create "locus" features .TP \fB\-K\fR for \fB\-M\fR option: also discard as redundant the shorter, fully contained .IP transcripts (intron chains matching a part of the container) .TP \fB\-Q\fR for \fB\-M\fR option, no longer require boundary containment when assessing redundancy (can be combined with \fB\-K\fR); only introns have to match for multi\-exon transcripts, and >=80% overlap for single\-exon transcripts .TP \fB\-Y\fR for \fB\-M\fR option, enforce \fB\-Q\fR but also discard overlapping single\-exon transcripts, even on the opposite strand (can be combined with \fB\-K\fR) .SS "Output options:" .HP \fB\-\-force\-exons\fR: make sure that the lowest level GFF features are considered .IP "exon" features .HP \fB\-\-gene2exon\fR: for single\-line genes not parenting any transcripts, add an .IP exon feature spanning the entire gene (treat it as a transcript) .TP \fB\-D\fR decode url encoded characters within attributes .TP \fB\-Z\fR merge very close exons into a single exon (when intron size<4) .TP \fB\-g\fR full path to a multi\-fasta file with the genomic sequences for all input mappings, OR a directory with single\-fasta files (one per genomic sequence, with file names matching sequence names) .TP \fB\-w\fR write a fasta file with spliced exons for each GFF transcript .TP \fB\-x\fR write a fasta file with spliced CDS for each GFF transcript .TP \fB\-y\fR write a protein fasta file with the translation of CDS for each record .TP \fB\-W\fR for \fB\-w\fR and \fB\-x\fR options, write in the FASTA defline the exon coordinates projected onto the spliced sequence; for \fB\-y\fR option, write transcript attributes in the FASTA defline .TP \fB\-S\fR for \fB\-y\fR option, use '*' instead of '.' as stop codon translation .TP \fB\-L\fR Ensembl GTF to GFF3 conversion (implies \fB\-F\fR; should be used with \fB\-m\fR) .TP \fB\-m\fR is a name mapping table for converting reference sequence names, having this 2\-column format: WARNING: all GFF records on reference sequences whose original IDs are not found in the 1st column of this table will be discarded! .TP \fB\-t\fR use in the 2nd column of each GFF/GTF output line .TP \fB\-o\fR print the GFF records to (those that passed any given filters). Use \fB\-o\-\fR to enable printing of to stdout .TP \fB\-T\fR for \fB\-o\fR, output will be GTF instead of GFF3 .HP \fB\-\-bed\fR for \fB\-o\fR, output BED format instead of GFF3 .HP \fB\-\-tlf\fR for \fB\-o\fR, output "transcript line format" which is like GFF .IP but exons, CDS features and related data are stored as GFF attributes in the transcript feature line, like this: .IP exoncount=N;exons=;CDSphase=;CDS= .IP is a comma\-delimited list of exon_start\-exon_end coordinates; is CDS_start:CDS_end coordinates or a list like ; .HP \fB\-v\fR,\-E expose (warn about) duplicate transcript IDs and other potential .IP problems with the given GFF/GTF records .SH AUTHOR This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.