'\" t
.\"     Title: gthbssmtrain
.\"    Author: [see the "AUTHOR(S)" section]
.\" Generator: Asciidoctor 2.0.12
.\"      Date: 
.\"    Manual: \ \&
.\"    Source: \ \&
.\"  Language: English
.\"
.TH "GTHBSSMTRAIN" "1" "" "\ \&" "\ \&"
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.ss \n[.ss] 0
.nh
.ad l
.de URL
\fI\\$2\fP <\\$1>\\$3
..
.als MTO URL
.if \n[.g] \{\
.  mso www.tmac
.  am URL
.    ad l
.  .
.  am MTO
.    ad l
.  .
.  LINKSTYLE blue R < >
.\}
.SH "NAME"
gthbssmtrain \- train splice site model
.SH "SYNOPSIS"
.sp
\fBgthbssmtrain\fP [option ...] GFF3_file
.SH "DESCRIPTION"
.sp
Create BSSM training data from annotation given in GFF3_file.
.SH "OPTIONS"
.sp
\fB\-outdir\fP
.RS 4
set name of output directory to which the training files are
written
default: training_data
.RE
.sp
\fB\-gcdonor\fP
.RS 4
extract training data for GC donor sites
default: yes
.RE
.sp
\fB\-filtertype\fP
.RS 4
set type of features to used for filtering (usually \(aqexon\(aq or
\(aqCDS\(aq)
default: exon
.RE
.sp
\fB\-goodexoncount\fP
.RS 4
set the minimum number of good exons a feature must have to be
included into the training data
default: 1
.RE
.sp
\fB\-cutoff\fP
.RS 4
set the minimum score an exon must have to count towards the
``good exon count\(aq\(aq (exons without a score count as good)
default: 1.00
.RE
.sp
\fB\-extracttype\fP
.RS 4
set type of features to be extracted as exons (usually \(aqexon\(aq or
\(aqCDS\(aq)
default: CDS
.RE
.sp
\fB\-seqfile\fP
.RS 4
set the sequence file from which to take the sequences
default: undefined
.RE
.sp
\fB\-encseq\fP
.RS 4
set the encoded sequence indexname from which to take the
sequences
default: undefined
.RE
.sp
\fB\-seqfiles\fP
.RS 4
set the sequence files from which to extract the features
use \(aq\-\-\(aq to terminate the list of sequence files
.RE
.sp
\fB\-matchdesc\fP
.RS 4
search the sequence descriptions from the input files for the
desired sequence IDs (in GFF3), reporting the first match
default: no
.RE
.sp
\fB\-matchdescstart\fP
.RS 4
exactly match the sequence descriptions from the input files for
the desired sequence IDs (in GFF3) from the beginning to the
first whitespace
default: no
.RE
.sp
\fB\-usedesc\fP
.RS 4
use sequence descriptions to map the sequence IDs (in GFF3) to
actual sequence entries.
If a description contains a sequence range (e.g.,
III:1000001..2000000), the first  part is used as sequence ID
(\(aqIII\(aq) and the first range position as offset (\(aq1000001\(aq)
default: no
.RE
.sp
\fB\-regionmapping\fP
.RS 4
set file containing sequence\-region to sequence file mapping
default: undefined
.RE
.sp
\fB\-seed\fP
.RS 4
set seed for random number generator manually
0 generates a seed from the current time and the process id
default: 0
.RE
.sp
\fB\-v\fP
.RS 4
be verbose
default: no
.RE
.sp
\fB\-gzip\fP
.RS 4
write gzip compressed output files
default: no
.RE
.sp
\fB\-bzip2\fP
.RS 4
write bzip2 compressed output files
default: no
.RE
.sp
\fB\-force\fP
.RS 4
force writing to output files
default: no
.RE
.sp
\fB\-help\fP
.RS 4
display help and exit
.RE
.sp
\fB\-version\fP
.RS 4
display version information and exit
.RE