'\" t .\" Title: COMPSEQ .\" Author: Debian Med Packaging Team .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 05/11/2012 .\" Manual: EMBOSS Manual for Debian .\" Source: EMBOSS 6.4.0 .\" Language: English .\" .TH "COMPSEQ" "1e" "05/11/2012" "EMBOSS 6.4.0" "EMBOSS Manual for Debian" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" compseq \- Calculate the composition of unique words in sequences .SH "SYNOPSIS" .HP \w'\fBcompseq\fR\ 'u \fBcompseq\fR \fB\-sequence\ \fR\fB\fIseqall\fR\fR [\fB\-infile\ \fR\fB\fIinfile\fR\fR] \fB\-word\ \fR\fB\fIinteger\fR\fR [\fB\-frame\ \fR\fB\fIinteger\fR\fR] \fB\-ignorebz\ \fR\fB\fIboolean\fR\fR \fB\-reverse\ \fR\fB\fIboolean\fR\fR [\fB\-calcfreq\ \fR\fB\fIboolean\fR\fR] \fB\-outfile\ \fR\fB\fIoutfile\fR\fR [\fB\-zerocount\ \fR\fB\fIboolean\fR\fR] .HP \w'\fBcompseq\fR\ 'u \fBcompseq\fR \fB\-help\fR .SH "DESCRIPTION" .PP \fBcompseq\fR is a command line program from EMBOSS (\(lqthe European Molecular Biology Open Software Suite\(rq)\&. It is part of the "Nucleic:Composition,Protein:Composition" command group(s)\&. .SH "OPTIONS" .SS "Input section" .PP \fB\-sequence\fR \fIseqall\fR .RS 4 .RE .PP \fB\-infile\fR \fIinfile\fR .RS 4 This is a file previously produced by \*(Aqcompseq\*(Aq that can be used to set the expected frequencies of words in this analysis\&. The word size in the current run must be the same as the one in this results file\&. Obviously, you should use a file produced from protein sequences if you are counting protein sequence word frequencies, and you must use one made from nucleotide frequencies if you are analysing a nucleotide sequence\&. .RE .SS "Required section" .PP \fB\-word\fR \fIinteger\fR .RS 4 This is the size of word (n\-mer) to count\&. Thus if you want to count codon frequencies for a nucleotide sequence, you should enter 3 here\&. Default value: 2 .RE .SS "Additional section" .PP \fB\-frame\fR \fIinteger\fR .RS 4 The normal behaviour of \*(Aqcompseq\*(Aq is to count the frequencies of all words that occur by moving a window of length \*(Aqword\*(Aq up by one each time\&. This option allows you to move the window up by the length of the word each time, skipping over the intervening words\&. You can count only those words that occur in a single frame of the word by setting this value to a number other than zero\&. If you set it to 1 it will only count the words in frame 1, 2 will only count the words in frame 2 and so on\&. .RE .PP \fB\-ignorebz\fR \fIboolean\fR .RS 4 The amino acid code B represents Asparagine or Aspartic acid and the code Z represents Glutamine or Glutamic acid\&. These are not commonly used codes and you may wish not to count words containing them, just noting them in the count of \*(AqOther\*(Aq words\&. Default value: Y .RE .PP \fB\-reverse\fR \fIboolean\fR .RS 4 Set this to be true if you also wish to also count words in the reverse complement of a nucleic sequence\&. Default value: N .RE .PP \fB\-calcfreq\fR \fIboolean\fR .RS 4 If this is set true then the expected frequencies of words are calculated from the observed frequency of single bases or residues in the sequences\&. If you are reporting a word size of 1 (single bases or residues) then there is no point in using this option because the calculated expected frequency will be equal to the observed frequency\&. Calculating the expected frequencies like this will give an approximation of the expected frequencies that you might get by using an input file of frequencies produced by a previous run of this program\&. If an input file of expected word frequencies has been specified then the values from that file will be used instead of this calculation of expected frequency from the sequence, even if \*(Aqcalcfreq\*(Aq is set to be true\&. Default value: N .RE .SS "Output section" .PP \fB\-outfile\fR \fIoutfile\fR .RS 4 This is the results file\&. .RE .PP \fB\-zerocount\fR \fIboolean\fR .RS 4 You can make the output results file much smaller if you do not display the words with a zero count\&. Default value: Y .RE .SH "BUGS" .PP Bugs can be reported to the Debian Bug Tracking system (http://bugs\&.debian\&.org/emboss), or directly to the EMBOSS developers (http://sourceforge\&.net/tracker/?group_id=93650&atid=605031)\&. .SH "SEE ALSO" .PP compseq is fully documented via the \fBtfm\fR(1) system\&. .SH "AUTHOR" .PP \fBDebian Med Packaging Team\fR <\&debian\-med\-packaging@lists\&.alioth\&.debian\&.org\&> .RS 4 Wrote the script used to autogenerate this manual page\&. .RE .SH "COPYRIGHT" .br .PP This manual page was autogenerated from an Ajax Control Definition of the EMBOSS package\&. It can be redistributed under the same terms as EMBOSS itself\&. .sp