NAME¶
convert_project - convert assembly and sequencing file types
DESCRIPTION¶
This program is part of the MIRA assembler package. It is used to convert
project file types into other types. Please check out the documentation below
for more detailed information about convert_project.
SYNOPSIS¶
- convert_project
- [-f <fromtype>] [-t <totype> [-t <totype> ...]]
[-aChimMsuZ] [-AcflnNoqrtvxXyz {...}] {infile} {outfile} [<totype>
<totype> ...]
OPTIONS¶
- -f <fromtype>
- load this type of project files, where fromtype is:
- caf
- a complete assembly or single sequences from CAF
- maf
- a complete assembly or single sequences from CAF
- fasta
- sequences from a FASTA file
- fastq
- sequences from a FASTQ file
- gbf
- sequences from a GBF file
- phd
- sequences from a PHD file
- fofnexp
- sequences in EXP files from file of filenames
- -t <totype>
- write the sequences/assembly to this type (multiple mentions of -t
are allowed):
- ace
- sequences or complete assembly to ACE
- caf
- sequences or complete assembly to CAF
- maf
- sequences or complete assembly to MAF
- sam
- complete assembly to SAM
- samnbb
- like above, but leaving out reference (backbones) in mapping
assemblies
- gbf
- sequences or consensus to GBF
- gff3
- consensus to GFF3
- wig
- assembly coverage info to wiggle file
- gcwig
- assembly gc content info to wiggle file
- fasta
- sequences or consensus to FASTA file (qualities to
- .qual)
- fastq
- sequences or consensus to FASTQ file
- exp
- sequences or complete assembly to EXP files in
- directories. Complete assemblies are suited for gap4 import as directed
assembly. Note: using caf2gap to import into gap4 is recommended
though
- text
- complete assembly to text alignment (only when -f is
- caf, maf or gbf)
- html
- complete assembly to HTML (only when -f is caf, maf or
- gbf)
- tcs
- complete assembly to tcs
- hsnp
- surrounding of SNP tags (SROc, SAOc, SIOc) to HTML (only when -f is
caf, maf or gbf)
- asnp
- analysis of SNP tags (only when -f is caf, maf or gbf)
- cstats
- contig statistics file like from MIRA (only when source contains
contigs)
- crlist
- contig read list file like from MIRA (only when source contains
contigs)
- maskedfasta
- reads where sequencing vector is masked out (with X) to FASTA file
(qualities to .qual)
- scaf
- sequences or complete assembly to single sequences CAF
- -a
- Append to target files instead of rewriting
- -A <string>
- String with MIRA parameters to be parsed Useful when setting parameters
affecting consensus calling like -CO:mrpg etc. E.g.: -a
"454_SETTINGS -CO:mrpg=3"
- -b
- Blind data Replaces all bases in reads/contigs with a 'c'
- -C
- Perform hard clip to reads When reading formats which define clipping
points, will
- save only the unclipped part into the result file.
- Applies only to files/formats which do not contain
- contigs.
- -d
- Delete gap only columns When output is contigs: delete columns that
are
- entirely gaps (like after having deleted reads during editing in gap4 or
similar)
- When output is reads: delete gaps in reads
- -F
- Filter to read groups Special use case, do not use yet.
- -m
- Make contigs (only for -t = caf or maf) Encase single reads as
contig singlets into the CAF/MAF file.
- -n <filename>
- when given, selects only reads or contigs given by name in that file.
- -i
- when -n is used, inverts the selection
- -o
- fastq quality Offset (only for -f = 'fastq') Offset of quality
values in FASTQ file. Default of 0 tries to automatically recognise.
- -Q <quality>
- Set default quality for bases in file types without quality values
Furthermore, do not stop if expected quality files are missing (e.g.
'.fasta')
- -R <name>
- Rename contigs/singlets/reads with given name string to which a counter is
appended. Known bug: will create duplicate names if input
- contains contigs/singlets as well as free reads, i.e. reads not in contigs
nor singlets.
- -S <name>
- (name)Scheme for renaming reads, important for paired-ends Only 'solexa'
is currently supported.
Beware: CAF and MAf can also contain just reads.
- -M
- Do not extract contigs (or their consensus), but the sequence of the reads
they are composed of.
- -N <filename>
- like -n, but sorts output according to order given in file.
- -r [cCqf]
- Recalculate consensus and / or consensus quality values and / or SNP
feature tags. 'c' recalc cons & cons qualities (with IUPAC) 'C' recalc
cons & cons qualities (forcing non-IUPAC) 'q' recalc consensus
qualities only 'f' recalc SNP features Note: only the last of cCq is
relevant, f works as a
- switch and can be combined with cQq (e.g. "-r C -r
f")
- Note: if the CAF/MAF contains multiple strains, recalculation of cons
& cons qualities is forced, you
- can just influence whether IUPACs are used or not.
- -s
- split output into multiple files instead of creating a single file
- -u
- 'fillUp strain genomes' Fill holes in the genome of one strain (N or @)
with sequence from a consensus of other strains Takes effect only with
-r and -t gbf or fasta/q in FASTA/Q: bases filled up are in
lower case in GBF: bases filled up are in upper case
- -q <integer>
- Defines minimum quality a consensus base of a strain must have, consensus
bases below this will be 'N' Default: 0 Only used with -r, and
-f is caf/maf and -t is (fasta
- or gbf)
- -v
- Print version number and exit
- -x <integer>
- Minimum contig or read length When loading, discard all contigs / reads
with a length less than this value. Default: 0 (=switched off) Note: not
applied to reads in contigs!
- -X <integer>
- Similar to -x but applies only to reads and then to the clipped
length.
- -y <integer>
- Minimum average contig coverage When loading, discard all contigs with an
average coverage less than this value. Default: 1
- -z <integer>
- Minimum number of reads in contig When loading, discard all contigs with a
number of reads less than this value. Default: 0 (=switched off)
- -l <integer>
- when output as text or HTML: number of bases shown in one alignment line.
Default: 60.
- -c <character>
- when output as text or HTML: character used to pad endgaps. Default: ' '
(blank)
Aliases: caf2html, exp2fasta, ... etc. Any combination of
"<validfromtype>2<validtotype>" can be used as program
name (also using links) so as that convert_project automatically sets
-f and
-t accordingly.
EXAMPLES¶
- convert_project source.maf dest.sam
- convert_project source.caf dest.fasta wig ace
- convert_project -x 2000 -y 10 source.caf dest.caf
- caf2html -l 100 -c . source.caf dest
SEE ALSO¶
A more extensive documentation is provided in the mira-doc package and can be
found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html.
You can also subscribe one of the MIRA mailing lists at
- http://www.chevreux.org/mira_mailinglists.html
After subscribing, mail general questions to the MIRA talk mailing list:
- mira_talk@freelists.org
BUGS¶
To report bugs or ask for features, please use the new ticketing system at:
- http://sourceforge.net/apps/trac/mira-assembler/
AUTHOR¶
The author of the mira code is Bastien Chevreux <bach@chevreux.org>
This manual page was written by Andreas Tille <tille@debian.org> but can
be freely used for any other distribution.