'\" t .\" Title: SEQNR .\" Author: Debian Med Packaging Team .\" Generator: DocBook XSL Stylesheets v1.75.2 .\" Date: 08/11/2010 .\" Manual: EMBOSS Manual for Debian .\" Source: DOMSEARCH 0.1.0++20100721 .\" Language: English .\" .TH "SEQNR" "1e" "08/11/2010" "DOMSEARCH 0.1.0++20100721" "EMBOSS Manual for Debian" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" seqnr \- Removes redundancy from DHF files\&. .SH "SYNOPSIS" .HP \w'\fBseqnr\fR\ 'u \fBseqnr\fR \fB\-dhfinpath\ \fR\fB\fIdirlist\fR\fR \fB\-dosing\ \fR\fB\fItoggle\fR\fR \fB\-singletsdir\ \fR\fB\fIdirectory\fR\fR \fB\-dosets\ \fR\fB\fItoggle\fR\fR \fB\-insetsdir\ \fR\fB\fIdirectory\fR\fR [\fB\-matrix\ \fR\fB\fImatrixf\fR\fR] \fB\-mode\ \fR\fB\fIlist\fR\fR \fB\-thresh\ \fR\fB\fIfloat\fR\fR \fB\-threshlow\ \fR\fB\fIfloat\fR\fR \fB\-threshup\ \fR\fB\fIfloat\fR\fR [\fB\-gapopen\ \fR\fB\fIfloat\fR\fR] [\fB\-gapextend\ \fR\fB\fIfloat\fR\fR] \fB\-dhfoutdir\ \fR\fB\fIoutdir\fR\fR \fB\-dored\ \fR\fB\fItoggle\fR\fR \fB\-redoutdir\ \fR\fB\fIoutdir\fR\fR \fB\-logfile\ \fR\fB\fIoutfile\fR\fR .HP \w'\fBseqnr\fR\ 'u \fBseqnr\fR \fB\-help\fR .SH "DESCRIPTION" .PP \fBseqnr\fR is a command line program from EMBOSS (\(lqthe European Molecular Biology Open Software Suite\(rq)\&. It is part of the "Utils:Database creation" command group(s)\&. .SH "OPTIONS" .SS "Input section" .PP \fB\-dhfinpath\fR \fIdirlist\fR .RS 4 This option specifies the location of DHF files (domain hits files) (input)\&. A \*(Aqdomain hits file\*(Aq contains database hits (sequences) with domain classification information, in the DHF format (FASTA or EMBL\-like)\&. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database\&. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH\&. Default value: \&./ .RE .PP \fB\-dosing\fR \fItoggle\fR .RS 4 This option specifies whether to use singlet sequences (e\&.g\&. DHF files) to filter input\&. Optionally, up to two further directories of sequences may be read: these are considered in the redundancy calculation but never appear in the output files\&. Default value: Y .RE .PP \fB\-singletsdir\fR \fIdirectory\fR .RS 4 This option specifies the location of singlet filter sequences (e\&.g\&. DHF files) (input)\&. A \*(Aqdomain hits file\*(Aq contains database hits (sequences) with domain classification information, in the DHF format (FASTA or EMBL\-like)\&. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database\&. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH\&. Default value: \&./ .RE .PP \fB\-dosets\fR \fItoggle\fR .RS 4 This option specifies whether to use sets of sequences (e\&.g\&. DHF files) to filter input\&. Optionally, up to two further directories of sequences may be read: these are considered in the redundancy calculation but never appear in the output files\&. Default value: Y .RE .PP \fB\-insetsdir\fR \fIdirectory\fR .RS 4 This option specifies location of sets of filter sequences (e\&.g\&. DAF files) (input)\&. A \*(Aqdomain alignment file\*(Aq contains a sequence alignment of domains belonging to the same SCOP or CATH family\&. The file is in clustal format annotated with domain family classification information\&. The files generated by using SCOPALIGN will contain a structure\-based sequence alignment of domains of known structure only\&. Such alignments can be extended with sequence relatives (of unknown structure) by using SEQALIGN\&. Default value: \&./ .RE .PP \fB\-matrix\fR \fImatrixf\fR .RS 4 This option specifies the residue substitution matrix that is used for sequence comparison\&. Default value: EBLOSUM62 .RE .SS "Required section" .PP \fB\-mode\fR \fIlist\fR .RS 4 This option specifies whether to remove redundancy at a single threshold % sequence similarity or remove redundancy outside a range of acceptable threshold % similarity\&. All permutations of pair\-wise sequence alignments are calculated for each set of input sequences in turn using the EMBOSS implementation of the Needleman and Wunsch global alignment algorithm\&. Redundant sequences are removed in one of two modes as follows: (i) If a pair of proteins achieve greater than a threshold percentage sequence similarity (specified by the user) the shortest sequence is discarded\&. (ii) If a pair of proteins have a percentage sequence similarity that lies outside an acceptable range (specified by the user) the shortest sequence is discarded\&. Default value: 1 .RE .PP \fB\-thresh\fR \fIfloat\fR .RS 4 This option specifies the % sequence identity redundancy threshold\&. The % sequence identity redundancy threshold determines the redundancy calculation\&. If a pair of proteins achieve greater than this threshold the shortest sequence is discarded\&. Default value: 95\&.0 .RE .PP \fB\-threshlow\fR \fIfloat\fR .RS 4 This option specifies the % sequence identity redundancy threshold (lower limit)\&. The % sequence identity redundancy threshold determines the redundancy calculation\&. If a pair of proteins have a percentage sequence similarity that lies outside an acceptable range the shortest sequence is discarded\&. Default value: 30\&.0 .RE .PP \fB\-threshup\fR \fIfloat\fR .RS 4 This option specifies the % sequence identity redundancy threshold (upper limit)\&. The % sequence identity redundancy threshold determines the redundancy calculation\&. If a pair of proteins have a percentage sequence similarity that lies outside an acceptable range the shortest sequence is discarded\&. Default value: 90\&.0 .RE .SS "Additional section" .PP \fB\-gapopen\fR \fIfloat\fR .RS 4 This option specifies the gap insertion penalty\&. The gap insertion penalty is the score taken away when a gap is created\&. The best value depends on the choice of comparison matrix\&. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences\&. Default value: 10 .RE .PP \fB\-gapextend\fR \fIfloat\fR .RS 4 This option specifies the gap extension penalty\&. The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap\&. This is how long gaps are penalized\&. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty\&. Default value: 0\&.5 .RE .SS "Output section" .PP \fB\-dhfoutdir\fR \fIoutdir\fR .RS 4 This option specifies the location of DHF files (domain hits files) of non\-redundant sequences (output)\&. A \*(Aqdomain hits file\*(Aq contains database hits (sequences) with domain classification information, in the DHF format (FASTA or EMBL\-like)\&. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database\&. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH\&. Default value: \&./ .RE .PP \fB\-dored\fR \fItoggle\fR .RS 4 This option specifies whether to retain redundant sequences\&. If this option is set a DHF file (domain hits file) of redundant sequences is written\&. Default value: N .RE .PP \fB\-redoutdir\fR \fIoutdir\fR .RS 4 This option specifies the location of DHF files (domain hits files) of redundant sequences (output)\&. A \*(Aqdomain hits file\*(Aq contains database hits (sequences) with domain classification information, in the DHF format (FASTA or EMBL\-like)\&. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database\&. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH\&. Default value: \&./ .RE .PP \fB\-logfile\fR \fIoutfile\fR .RS 4 This option specifies the name of SEQNR log file (output)\&. The log file contains messages about any errors arising while SEQNR ran\&. Default value: seqnr\&.log .RE .SH "BUGS" .PP Bugs can be reported to the Debian Bug Tracking system (http://bugs\&.debian\&.org/emboss), or directly to the EMBOSS developers (http://sourceforge\&.net/tracker/?group_id=93650&atid=605031)\&. .SH "SEE ALSO" .PP seqnr is fully documented via the \fBtfm\fR(1) system\&. .SH "AUTHOR" .PP \fBDebian Med Packaging Team\fR <\&debian\-med\-packaging@lists\&.alioth\&.debian\&.org\&> .RS 4 Wrote the script used to autogenerate this manual page\&. .RE .SH "COPYRIGHT" .br .PP This manual page was autogenerated from an Ajax Control Definition of the EMBOSS package\&. It can be redistributed under the same terms as EMBOSS itself\&. .sp