'\" t .\" Title: gt-shredder .\" Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author] .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 07/22/2020 .\" Manual: GenomeTools Manual .\" Source: GenomeTools 1.6.1 .\" Language: English .\" .TH "GT\-SHREDDER" "1" "07/22/2020" "GenomeTools 1\&.6\&.1" "GenomeTools Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" gt-shredder \- Shredder sequence file(s) into consecutive pieces of random length\&. .SH "SYNOPSIS" .sp \fBgt shredder\fR [option \&...] [sequence_file \&...] .SH "DESCRIPTION" .PP \fB\-coverage\fR [\fIvalue\fR] .RS 4 set the number of times the sequence_file is shreddered (default: 1) .RE .PP \fB\-minlength\fR [\fIvalue\fR] .RS 4 set the minimum length of the shreddered fragments (default: 300) .RE .PP \fB\-maxlength\fR [\fIvalue\fR] .RS 4 set the maximum length of the shreddered fragments (default: 700) .RE .PP \fB\-overlap\fR [\fIvalue\fR] .RS 4 set the overlap between consecutive pieces (default: 0) .RE .PP \fB\-sample\fR [\fIvalue\fR] .RS 4 take samples of the generated sequences pieces with the given probability (default: 1\&.000000) .RE .PP \fB\-clipdesc\fR [\fIyes|no\fR] .RS 4 clip descriptions after first space (fooled by \fI\et\fR, \fI\en\fR etc) adds offset and length to ensure unique identifier (default: no) .RE .PP \fB\-width\fR [\fIvalue\fR] .RS 4 set output width for FASTA sequence printing (0 disables formatting) (default: 0) .RE .PP \fB\-o\fR [\fIfilename\fR] .RS 4 redirect output to specified file (default: undefined) .RE .PP \fB\-gzip\fR [\fIyes|no\fR] .RS 4 write gzip compressed output file (default: no) .RE .PP \fB\-bzip2\fR [\fIyes|no\fR] .RS 4 write bzip2 compressed output file (default: no) .RE .PP \fB\-force\fR [\fIyes|no\fR] .RS 4 force writing to output file (default: no) .RE .PP \fB\-help\fR .RS 4 display help and exit .RE .PP \fB\-version\fR .RS 4 display version information and exit .RE .sp Each sequence given in \fIsequence_file\fR is shreddered into consecutive pieces of random length (between \fI\-minlength\fR and \fI\-maxlength\fR) until it is consumed\&. By this means the last shreddered fragment of a given sequence can be shorter than the argument to option \fI\-minlength\fR\&. To get rid of such fragments use gt seqfilter (see example below)\&. .SH "EXAMPLES:" .sp Shredder a given BAC: .sp .if n \{\ .RS 4 .\} .nf $ gt shredder U89959_genomic\&.fas > fragments\&.fas .fi .if n \{\ .RE .\} .sp Shredder an EST collection into pieces between 50 and 100 bp and get rid of all (terminal) fragments shorter than 50 bp: .sp .if n \{\ .RS 4 .\} .nf $ gt shredder \-minlength 50 \-maxlength 100 U89959_ests\&.fas \e | gt seqfilter \-minlength 50 \- > fragments\&.fas # 130 out of 1260 sequences have been removed (10\&.317%) .fi .if n \{\ .RE .\} .sp Shredder an EST collection and show only random 10% of the resulting fragments: .sp .if n \{\ .RS 4 .\} .nf $ gt shredder \-sample 0\&.1 U89959_ests\&.fas .fi .if n \{\ .RE .\} .SH "REPORTING BUGS" .sp Report bugs to https://github\&.com/genometools/genometools/issues\&.