NAME¶
tigr-glimmer — Find/Score potential genes in genome-file using the
probability model in icm-file
SYNOPSIS¶
tigr-glimmer3 [
genome-file]
[icm-file] [[options]]
DESCRIPTION¶
tigr-glimmer is a system for finding genes in microbial DNA, especially
the genomes of bacteria and archaea.
tigr-glimmer (Gene Locator and
Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to
identify the coding regions and distinguish them from noncoding DNA. The IMM
approach, described in our Nucleic Acids Research paper on
tigr-glimmer
1.0 and in our subsequent paper on
tigr-glimmer 2.0, uses a combination
of Markov models from 1st through 8th-order, weighting each model according to
its predictive power.
tigr-glimmer 1.0 and 2.0 use 3-periodic
nonhomogenous Markov models in their IMMs.
tigr-glimmer is the primary microbial gene finder at TIGR, and has been
used to annotate the complete genomes of B. burgdorferi (Fraser et al.,
Nature, Dec. 1997), T. pallidum (Fraser et al., Science, July 1998), T.
maritima, D. radiodurans, M. tuberculosis, and non-TIGR projects including C.
trachomatis, C. pneumoniae, and others. Its analyses of some of these genomes
and others is available at the TIGR microbial database site.
A special version of
tigr-glimmer designed for small eukaryotes,
GlimmerM, was used to find the genes in chromosome 2 of the malaria parasite,
P. falciparum.. GlimmerM is described in S.L. Salzberg, M. Pertea, A.L.
Delcher, M.J. Gardner, and H. Tettelin, "Interpolated Markov models for
eukaryotic gene finding," Genomics 59 (1999), 24-31. Click here
(
http://www.tigr.org/software/glimmerm/) to visit the GlimmerM site, which
includes information on how to download the GlimmerM system.
The
tigr-glimmer system consists of two main programs. The first of these
is the training program, build-imm. This program takes an input set of
sequences and builds and outputs the IMM for them. These sequences can be
complete genes or just partial orfs. For a new genome, this training data can
consist of those genes with strong database hits as well as very long open
reading frames that are statistically almost certain to be genes. The second
program is glimmer, which uses this IMM to identify putative genes in an
entire genome.
tigr-glimmer automatically resolves conflicts between
most overlapping genes by choosing one of them. It also identifies genes that
are suspected to truly overlap, and flags these for closer inspection by the
user. These ``suspect'' gene candidates have been a very small percentage of
the total for all the genomes analyzed thus far.
tigr-glimmer is a
program that...
OPTIONS¶
- -C n
- Use n as GC percentage of independent model
-
- Note: n should be a percentage, e.g., -C 45.2
- -f
- Use ribosome-binding energy to choose start codon
- +f
- Use first codon in orf as start codon
- -g n
- Set minimum gene length to n
- -i filename
- Use filename to select regions of bases
that are off limits, so that no bases within that area will be
examined
- -l
- Assume linear rather than circular genome, i.e., no
wraparound
- -L filename
- Use filename to specify a list of orfs that should be
scored separately, with no overlap rules
- -M
- Input is a multifasta file of separate genes to be scored
separately, with no overlap rules
- -o n
- Set minimum overlap length to n. Overlaps shorter than this
are ignored.
- -p n
- Set minimum overlap percentage to n%. Overlaps shorter than
this percentage of *both* strings are ignored.
- -q n
- Set the maximum length orf that can be rejected because of
the independent probability score column to (n - 1)
- -r
- Don't use independent probability score column
- +r
- Use independent probability score column
- -r
- Don't use independent probability score column
- -s s
- Use string s as the ribosome binding pattern to find start
codons.
- +S
- Do use stricter independent intergenic model that doesn't
give probabilities to in-frame stop codons. (Option is obsolete since this
is now the only behaviour
- -t n
- Set threshold score for calling as gene to n. If the
in-frame score >= n, then the region is given a number and considered a
potential gene.
- -w n
- Use "weak" scores on tentative genes n or longer.
Weak scores ignore the independent probability score.
SEE ALSO¶
tigr-adjust (1), tigr-anomaly (1), tigr-build-icm (1), tigr-check (1),
tigr-codon-usage (1), tigr-compare-lists (1), tigr-extract (1), tigr-generate
(1), tigr-get-len (1), tigr-get-putative (1), tigr-glimmer3 (1),
tigr-long-orfs (1)
http://www.tigr.org/software/glimmer/
Please see the readme in /usr/share/doc/glimmer for a description on how to use
Glimmer.
AUTHOR¶
This manual page was quickly copied from the glimmer web site by Steffen Moeller
moeller@debian.org for the
Debian system.