NAME¶
- compstruct - calculate accuracy of RNA secondary structure
predictions
-
SYNOPSIS¶
compstruct [options] trusted_file test_file
DESCRIPTION¶
compstruct evaluates the accuracy of RNA secondary structure predictions,
at the on a per-base-pair basis. The
trusted_file contains one or more
sequences with trusted (known) RNA secondary structure annotation. The
test_file contains the same sequences, in the same order, with
predicted RNA secondary structure annotation.
compstruct reads the
structures and compares them, and calculates both the sensitivity (the number
of true base pairs that are correctly predicted) and the specificity (positive
predictive value, the number of predicted base pairs that are true). Results
are reported for each individual sequence, and in summary for all sequences
together.
Both files must contain secondary structure annotation in WUSS notation. Only
SELEX and Stockholm formats support structure markup at present.
The default definition of a correctly predicted base pair is that a true pair
(i,j) must exactly match a predicted pair (i,j).
Mathews, Zuker, Turner and colleagues (see: Mathews et al., JMB 288:911-940,
1999) use a more relaxed definition. Mathews defines "correct" as
follows: a true pair (i,j) is correctly predicted if any of the following
pairs are predicted: (i,j), (i+1,j), (i-1,j), (i,j+1), or (i,j-1). This rule
allows for "slipped helices" off by one base. The
-m option
activates this rule for both sensitivity and for specificity. For specificity,
the rule is reversed: predicted pair (i,j) is considered to be true if the
true structure contains one of the five pairs (i,j), (i+1,j), (i-1,j),
(i,j+1), or (i,j-1).
OPTIONS¶
- -h
- Print brief help; includes version number and summary of all options,
including expert options.
- -m
- Use the Mathews relaxed accuracy rule (see above), instead of requiring
exact prediction of base pairs.
- -p
- Count pseudoknotted base pairs towards the accuracy, in either trusted or
predicted structures. By default, pseudoknots are ignored.
- Normally, only the trusted_file would have pseudoknot annotation,
since most RNA secondary structure prediction programs do not predict
pseudoknots. Using the -p option allows you to penalize the
prediction program for not predicting known pseudoknots. In a case where
both the trusted_file and the test_file have pseudoknot
annotation, the -p option lets you count pseudoknots in evaluating
the prediction accuracy. Beware, however, the case where you use a
pseudoknot-capable prediction program to generate the test_file,
but the trusted_file does not have pseudoknot annotation; in this
case, -p will penalize any predicted pseudoknots when it calculates
specificity, even if they're right, because they don't appear in the
trusted annotation; this is probably not what you'd want to do.
EXPERT OPTIONS¶
- --informat <s>
- Specify that the two sequence files are in format <s>. In this
case, both files must be in the same format. The default is to
autodetect the file formats, in which case they could be different (one
SELEX, one Stockholm).
- --quiet
- Don't print any verbose header information.
SEE ALSO¶
afetch(1),
alistat(1),
compalign(1),
revcomp(1),
seqsplit(1),
seqstat(1),
sfetch(1),
shuffle(1),
sindex(1),
sreformat(1),
stranslate(1),
weight(1).
AUTHOR¶
Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington
University School of Medicine Freely distributed under the GNU General Public
License (GPL) See COPYING in the source code distribution for more details, or
contact me.
Sean Eddy
HHMI/Department of Genetics
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
St Louis, MO 63108 USA
Phone: 1-314-362-7666
FAX : 1-314-362-2157
Email: eddy@genetics.wustl.edu