'\" t .\" Title: gthbssmtrain .\" Author: [see the "AUTHOR(S)" section] .\" Generator: Asciidoctor 2.0.12 .\" Date: .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" .TH "GTHBSSMTRAIN" "1" "" "\ \&" "\ \&" .ie \n(.g .ds Aq \(aq .el .ds Aq ' .ss \n[.ss] 0 .nh .ad l .de URL \fI\\$2\fP <\\$1>\\$3 .. .als MTO URL .if \n[.g] \{\ . mso www.tmac . am URL . ad l . . . am MTO . ad l . . . LINKSTYLE blue R < > .\} .SH "NAME" gthbssmtrain \- train splice site model .SH "SYNOPSIS" .sp \fBgthbssmtrain\fP [option ...] GFF3_file .SH "DESCRIPTION" .sp Create BSSM training data from annotation given in GFF3_file. .SH "OPTIONS" .sp \fB\-outdir\fP .RS 4 set name of output directory to which the training files are written default: training_data .RE .sp \fB\-gcdonor\fP .RS 4 extract training data for GC donor sites default: yes .RE .sp \fB\-filtertype\fP .RS 4 set type of features to used for filtering (usually \(aqexon\(aq or \(aqCDS\(aq) default: exon .RE .sp \fB\-goodexoncount\fP .RS 4 set the minimum number of good exons a feature must have to be included into the training data default: 1 .RE .sp \fB\-cutoff\fP .RS 4 set the minimum score an exon must have to count towards the ``good exon count\(aq\(aq (exons without a score count as good) default: 1.00 .RE .sp \fB\-extracttype\fP .RS 4 set type of features to be extracted as exons (usually \(aqexon\(aq or \(aqCDS\(aq) default: CDS .RE .sp \fB\-seqfile\fP .RS 4 set the sequence file from which to take the sequences default: undefined .RE .sp \fB\-encseq\fP .RS 4 set the encoded sequence indexname from which to take the sequences default: undefined .RE .sp \fB\-seqfiles\fP .RS 4 set the sequence files from which to extract the features use \(aq\-\-\(aq to terminate the list of sequence files .RE .sp \fB\-matchdesc\fP .RS 4 search the sequence descriptions from the input files for the desired sequence IDs (in GFF3), reporting the first match default: no .RE .sp \fB\-matchdescstart\fP .RS 4 exactly match the sequence descriptions from the input files for the desired sequence IDs (in GFF3) from the beginning to the first whitespace default: no .RE .sp \fB\-usedesc\fP .RS 4 use sequence descriptions to map the sequence IDs (in GFF3) to actual sequence entries. If a description contains a sequence range (e.g., III:1000001..2000000), the first part is used as sequence ID (\(aqIII\(aq) and the first range position as offset (\(aq1000001\(aq) default: no .RE .sp \fB\-regionmapping\fP .RS 4 set file containing sequence\-region to sequence file mapping default: undefined .RE .sp \fB\-seed\fP .RS 4 set seed for random number generator manually 0 generates a seed from the current time and the process id default: 0 .RE .sp \fB\-v\fP .RS 4 be verbose default: no .RE .sp \fB\-gzip\fP .RS 4 write gzip compressed output files default: no .RE .sp \fB\-bzip2\fP .RS 4 write bzip2 compressed output files default: no .RE .sp \fB\-force\fP .RS 4 force writing to output files default: no .RE .sp \fB\-help\fP .RS 4 display help and exit .RE .sp \fB\-version\fP .RS 4 display version information and exit .RE