'\" t
.\"     Title: COMPSEQ
.\"    Author: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
.\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/>
.\"      Date: 05/11/2012
.\"    Manual: EMBOSS Manual for Debian
.\"    Source: EMBOSS 6.4.0
.\"  Language: English
.\"
.TH "COMPSEQ" "1e" "05/11/2012" "EMBOSS 6.4.0" "EMBOSS Manual for Debian"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
compseq \- Calculate the composition of unique words in sequences
.SH "SYNOPSIS"
.HP \w'\fBcompseq\fR\ 'u
\fBcompseq\fR \fB\-sequence\ \fR\fB\fIseqall\fR\fR [\fB\-infile\ \fR\fB\fIinfile\fR\fR] \fB\-word\ \fR\fB\fIinteger\fR\fR [\fB\-frame\ \fR\fB\fIinteger\fR\fR] \fB\-ignorebz\ \fR\fB\fIboolean\fR\fR \fB\-reverse\ \fR\fB\fIboolean\fR\fR [\fB\-calcfreq\ \fR\fB\fIboolean\fR\fR] \fB\-outfile\ \fR\fB\fIoutfile\fR\fR [\fB\-zerocount\ \fR\fB\fIboolean\fR\fR]
.HP \w'\fBcompseq\fR\ 'u
\fBcompseq\fR \fB\-help\fR
.SH "DESCRIPTION"
.PP
\fBcompseq\fR
is a command line program from EMBOSS (\(lqthe European Molecular Biology Open Software Suite\(rq)\&. It is part of the "Nucleic:Composition,Protein:Composition" command group(s)\&.
.SH "OPTIONS"
.SS "Input section"
.PP
\fB\-sequence\fR \fIseqall\fR
.RS 4
.RE
.PP
\fB\-infile\fR \fIinfile\fR
.RS 4
This is a file previously produced by \*(Aqcompseq\*(Aq that can be used to set the expected frequencies of words in this analysis\&. The word size in the current run must be the same as the one in this results file\&. Obviously, you should use a file produced from protein sequences if you are counting protein sequence word frequencies, and you must use one made from nucleotide frequencies if you are analysing a nucleotide sequence\&.
.RE
.SS "Required section"
.PP
\fB\-word\fR \fIinteger\fR
.RS 4
This is the size of word (n\-mer) to count\&. Thus if you want to count codon frequencies for a nucleotide sequence, you should enter 3 here\&. Default value: 2
.RE
.SS "Additional section"
.PP
\fB\-frame\fR \fIinteger\fR
.RS 4
The normal behaviour of \*(Aqcompseq\*(Aq is to count the frequencies of all words that occur by moving a window of length \*(Aqword\*(Aq up by one each time\&. This option allows you to move the window up by the length of the word each time, skipping over the intervening words\&. You can count only those words that occur in a single frame of the word by setting this value to a number other than zero\&. If you set it to 1 it will only count the words in frame 1, 2 will only count the words in frame 2 and so on\&.
.RE
.PP
\fB\-ignorebz\fR \fIboolean\fR
.RS 4
The amino acid code B represents Asparagine or Aspartic acid and the code Z represents Glutamine or Glutamic acid\&. These are not commonly used codes and you may wish not to count words containing them, just noting them in the count of \*(AqOther\*(Aq words\&. Default value: Y
.RE
.PP
\fB\-reverse\fR \fIboolean\fR
.RS 4
Set this to be true if you also wish to also count words in the reverse complement of a nucleic sequence\&. Default value: N
.RE
.PP
\fB\-calcfreq\fR \fIboolean\fR
.RS 4
If this is set true then the expected frequencies of words are calculated from the observed frequency of single bases or residues in the sequences\&. If you are reporting a word size of 1 (single bases or residues) then there is no point in using this option because the calculated expected frequency will be equal to the observed frequency\&. Calculating the expected frequencies like this will give an approximation of the expected frequencies that you might get by using an input file of frequencies produced by a previous run of this program\&. If an input file of expected word frequencies has been specified then the values from that file will be used instead of this calculation of expected frequency from the sequence, even if \*(Aqcalcfreq\*(Aq is set to be true\&. Default value: N
.RE
.SS "Output section"
.PP
\fB\-outfile\fR \fIoutfile\fR
.RS 4
This is the results file\&.
.RE
.PP
\fB\-zerocount\fR \fIboolean\fR
.RS 4
You can make the output results file much smaller if you do not display the words with a zero count\&. Default value: Y
.RE
.SH "BUGS"
.PP
Bugs can be reported to the Debian Bug Tracking system (http://bugs\&.debian\&.org/emboss), or directly to the EMBOSS developers (http://sourceforge\&.net/tracker/?group_id=93650&atid=605031)\&.
.SH "SEE ALSO"
.PP
compseq is fully documented via the
\fBtfm\fR(1)
system\&.
.SH "AUTHOR"
.PP
\fBDebian Med Packaging Team\fR <\&debian\-med\-packaging@lists\&.alioth\&.debian\&.org\&>
.RS 4
Wrote the script used to autogenerate this manual page\&.
.RE
.SH "COPYRIGHT"
.br
.PP
This manual page was autogenerated from an Ajax Control Definition of the EMBOSS package\&. It can be redistributed under the same terms as EMBOSS itself\&.
.sp