Scroll to navigation

READSEQ(1) General Commands Manual READSEQ(1)

NAME

readseq - Reads and writes nucleic/protein sequences in various formats
 

SYNOPSIS

readseq [-options] in.seq > out.seq
 

DESCRIPTION

This manual page documents briefly the readseq command. This manual page was written for the Debian GNU/Linux distribution because the original program does not have a manual page. Instead, it has documentation in text form, see below.
readseq reads and writes biosequences (nucleic/protein) in various formats. Data files may have multiple sequences. readseq is particularly useful as it automatically detects many sequence formats, and interconverts among them.
 

FORMATS

Formats which readseq currently understands:
 

* IG/Stanford, used by Intelligenetics and others

* GenBank/GB, genbank flatfile format

* NBRF format

* EMBL, EMBL flatfile format

* GCG, single sequence format of GCG software

* DNAStrider, for common Mac program

* Fitch format, limited use

* Pearson/Fasta, a common format used by Fasta programs and others

* Zuker format, limited use. Input only.

* Olsen, format printed by Olsen VMS sequence editor. Input only.

* Phylip3.2, sequential format for Phylip programs

* Phylip, interleaved format for Phylip programs (v3.3, v3.4)

* Plain/Raw, sequence data only (no name, document, numbering)

+ MSF multi sequence format used by GCG software

+ PAUP's multiple sequence (NEXUS) format

+ PIR/CODATA format used by PIR

+ ASN.1 format used by NCBI

+ Pretty print with various options for nice looking output. Output only.

+ LinAll format, limited use (LinAll and ConStruct programs)

+ Vienna format used by ViennaRNA programs
 
See the included "Formats" file for detail on file formats.
 

OPTIONS

 
-help
Show summary of options.
 
-a[ll]
Select All sequences
 
-c[aselower]
Change to lower case
 
-C[ASEUPPER]
Change to UPPER CASE
 
-degap[=-]
Remove gap symbols
 
-i[tem=2,3,4]
Select Item number(s) from several
 
-l[ist]
List sequences only
 
-o[utput=]out.seq
Redirect Output
 
-p[ipe]
Pipe (command line, <stdin, >stdout)
 
-r[everse]
Change to Reverse-complement
 
-v[erbose]
Verbose progress
 
-f[ormat=]# Format number for output, or

-f[ormat=]Name Format name for output:
1. IG/Stanford 11. Phylip3.2
2. GenBank/GB 12. Phylip
3. NBRF 13. Plain/Raw
4. EMBL 14. PIR/CODATA
5. GCG 15. MSF
6. DNAStrider 16. ASN.1
7. Fitch 17. PAUP/NEXUS
8. Pearson/Fasta 18. Pretty (out-only)
9. Zuker (in-only) 19. LinAll
10. Olsen (in-only) 20. Vienna
 
Pretty format options:
 
-wid[th]=#
Sequence line width
 
-tab=#
Left indent
 
-col[space]=#
Column space within sequence line on output
 
-gap[count]
Count gap chars in sequence numbers
 
-nameleft, -nameright[=#]
Name on left/right side [=max width]
 
-nametop
Name at top/bottom
 
-numleft, -numright
Seq index on left/right side
 
-numtop, -numbot
Index on top/bottom
 
-match[=.]
Use match base for 2..n species
 
-inter[line=#]
Blank line(s) between sequence blocks
 
 

EXAMPLES


readseq

-- for interactive use
 

readseq my.1st.seq my.2nd.seq -all -format=genbank -output=my.gb

-- convert all of two input files to one genbank format output file
 

readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match

-- output to standard output a file in a pretty format
 

readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev

-- select 4 items from input, degap, reverse, and uppercase them
 

cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn

-- pipe a bunch of data thru readseq, converting all to asn
 

SEE ALSO

The programs are documented fully in text form. See the files in /usr/share/doc/readseq
 

AUTHOR

This manual page was written by Stephane Bortzmeyer <bortzmeyer@debian.org>, for the Debian GNU/Linux system (but may be used by others).