'\" t .\" Title: DOMAINNR .\" Author: Debian Med Packaging Team .\" Generator: DocBook XSL Stylesheets v1.75.2 .\" Date: 08/11/2010 .\" Manual: EMBOSS Manual for Debian .\" Source: DOMAINATRIX 0.1.0+20100721 .\" Language: English .\" .TH "DOMAINNR" "1e" "08/11/2010" "DOMAINATRIX 0.1.0+20100721" "EMBOSS Manual for Debian" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" domainnr \- Removes redundant domains from a DCF file\&. .SH "SYNOPSIS" .HP \w'\fBdomainnr\fR\ 'u \fBdomainnr\fR \fB\-dcfinfile\ \fR\fB\fIinfile\fR\fR [\fB\-datafile\ \fR\fB\fImatrixf\fR\fR] \fB\-retain\ \fR\fB\fItoggle\fR\fR \fB\-node\ \fR\fB\fIlist\fR\fR \fB\-mode\ \fR\fB\fIlist\fR\fR \fB\-threshold\ \fR\fB\fIfloat\fR\fR \fB\-threshlow\ \fR\fB\fIfloat\fR\fR \fB\-threshup\ \fR\fB\fIfloat\fR\fR [\fB\-gapopen\ \fR\fB\fIfloat\fR\fR] [\fB\-gapextend\ \fR\fB\fIfloat\fR\fR] \fB\-dcfoutfile\ \fR\fB\fIoutfile\fR\fR \fB\-redoutfile\ \fR\fB\fIoutfile\fR\fR \fB\-logfile\ \fR\fB\fIoutfile\fR\fR .HP \w'\fBdomainnr\fR\ 'u \fBdomainnr\fR \fB\-help\fR .SH "DESCRIPTION" .PP \fBdomainnr\fR is a command line program from EMBOSS (\(lqthe European Molecular Biology Open Software Suite\(rq)\&. It is part of the "Utils:Database creation" command group(s)\&. .SH "OPTIONS" .SS "Input section" .PP \fB\-dcfinfile\fR \fIinfile\fR .RS 4 This option specifies name of DCF file (domain classification file) (input)\&. A \*(Aqdomain classification file\*(Aq contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL\-like)\&. The files are generated by using SCOPPARSE and CATHPARSE\&. Domain sequence information can be added to the file by using DOMAINSEQS\&. .RE .PP \fB\-datafile\fR \fImatrixf\fR .RS 4 This option specifies the residue substitution matrix\&. This is used for sequence comparison\&. Default value: EBLOSUM62 .RE .PP \fB\-retain\fR \fItoggle\fR .RS 4 This option specifies whether to write redundant domains to a separate file\&. If this option is selected, redundant domains are written to a separate output file\&. Default value: N .RE .SS "Required section" .PP \fB\-node\fR \fIlist\fR .RS 4 This option specifies the node for redundancy removal\&. Redundancy can be removed at any specified node in the SCOP or CATH hierarchies\&. For example by selecting \*(AqClass\*(Aq entries belonging to the same Class will be non\-redundant\&. Default value: 1 .RE .PP \fB\-mode\fR \fIlist\fR .RS 4 This option specifies whether to remove redundancy at a single threshold % sequence similarity or remove redundancy outside a range of acceptable threshold % similarity\&. All permutations of pair\-wise sequence alignments are calculated for each domain family in turn using the EMBOSS implementation of the Needleman and Wunsch global alignment algorithm\&. Redundant sequences are removed in one of two modes as follows: (i) If a pair of proteins achieve greater than a threshold percentage sequence similarity (specified by the user) the shortest sequence is discarded\&. (ii) If a pair of proteins have a percentage sequence similarity that lies outside an acceptable range (specified by the user) the shortest sequence is discarded\&. Default value: 1 .RE .PP \fB\-threshold\fR \fIfloat\fR .RS 4 This option specifies the % sequence identity redundancy threshold, which determines the redundancy calculation\&. If a pair of proteins achieve greater than this threshold the shortest sequence is discarded\&. Default value: 95\&.0 .RE .PP \fB\-threshlow\fR \fIfloat\fR .RS 4 This option specifies the % sequence identity redundancy threshold, which determines the redundancy calculation\&. If a pair of proteins have a percentage sequence similarity that lies outside an acceptable range the shortest sequence is discarded\&. Default value: 30\&.0 .RE .PP \fB\-threshup\fR \fIfloat\fR .RS 4 This option specifies the % sequence identity redundancy threshold, which determines the redundancy calculation\&. If a pair of proteins have a percentage sequence similarity that lies outside an acceptable range the shortest sequence is discarded\&. Default value: 90\&.0 .RE .SS "Additional section" .PP \fB\-gapopen\fR \fIfloat\fR .RS 4 This option specifies the gap insertion penalty\&. This is the score taken away when a gap is created\&. The best value depends on the choice of comparison matrix\&. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences\&. Default value: 10 .RE .PP \fB\-gapextend\fR \fIfloat\fR .RS 4 This option specifies the gap extension penalty\&. This is added to the standard gap penalty for each base or residue in the gap\&. This is how long gaps are penalized\&. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty\&. Default value: 0\&.5 .RE .SS "Advanced section" .SS "Output section" .PP \fB\-dcfoutfile\fR \fIoutfile\fR .RS 4 This option specifies the name of non\-redundant DCF file (domain classification file) (output)\&. A \*(Aqdomain classification file\*(Aq contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL\-like)\&. The files are generated by using SCOPPARSE and CATHPARSE\&. Domain sequence information can be added to the file by using DOMAINSEQS\&. Default value: test\&.scop .RE .PP \fB\-redoutfile\fR \fIoutfile\fR .RS 4 This option specifies the name of DCF file (domain classification file) for redundant sequences (output)\&. A \*(Aqdomain classification file\*(Aq contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL\-like)\&. The files are generated by using SCOPPARSE and CATHPARSE\&. Domain sequence information can be added to the file by using DOMAINSEQS\&. .RE .PP \fB\-logfile\fR \fIoutfile\fR .RS 4 This option specifies the name of log file for the build\&. The log file contains messages about any errors arising while domainnr ran\&. Default value: domainnr\&.log .RE .SH "BUGS" .PP Bugs can be reported to the Debian Bug Tracking system (http://bugs\&.debian\&.org/emboss), or directly to the EMBOSS developers (http://sourceforge\&.net/tracker/?group_id=93650&atid=605031)\&. .SH "SEE ALSO" .PP domainnr is fully documented via the \fBtfm\fR(1) system\&. .SH "AUTHOR" .PP \fBDebian Med Packaging Team\fR <\&debian\-med\-packaging@lists\&.alioth\&.debian\&.org\&> .RS 4 Wrote the script used to autogenerate this manual page\&. .RE .SH "COPYRIGHT" .br .PP This manual page was autogenerated from an Ajax Control Definition of the EMBOSS package\&. It can be redistributed under the same terms as EMBOSS itself\&. .sp