NAME¶
Marpa::R2::Semantics::Order - How the SLIF ranks ambiguous parses
Description¶
Marpa allows ambiguous parses. While an unambiguous parse can produce at most
one parse tree and one parse result, an ambiguous parse will produce a parse
series. A parse series is a sequence of parse trees, each of which will have
its own parse result.
This document describes ways of controlling the order in which the SLIF
recognizer's "value()" method evaluates the parse trees of an
ambiguous parse. It also describes ways to exclude selected parse trees from
the parse series.
Duplicate parses are eliminated¶
When evaluating the parse trees in a parse series, Marpa never evaluates the
same parse tree twice. What this means probably matches the programmer's
intuition of what it should mean. Marpa considers two parse trees to be the
same if they are
semantic equivalents.
Two parse trees are semantic equivalents if and only if a recursive, top-down
evaluation of each applies the same rules in the same order at the same G1
locations. If the semantics are deterministic, and if two parse trees are
semantic equivalents according to this definition, the two parse trees will
always produce the same parse result.
The two parse trees are called semantic equivalents, because from the point of
view of a deterministic semantics they are indistinguishable. When the Marpa
documentation refers to duplicate parses, unless otherwise stated, it means
that the two are semantic equivalents.
Default parse order¶
By calling the recognizer's "value()" method repeatedly, Marpa can
produce all the parse results in the current parse series. The default is for
the parse results to be returned in an
arbitrary parse order. This
corresponds to the ""none"" value of the recognizer's
"ranking_method" named argument.
Traversal of the parse trees in arbitrary parse order will be always be
well-behaved in the sense that no two parse trees will be semantic duplicates,
and no unique (semantic non-duplicate) parse tree will be omitted in it. No
other property of arbitrary parse order is guaranteed. For example, the order
may change each time the parse series is traversed.
Ranking methods¶
SLIF recognizer objects have a "ranking_method" named argument, whose
value can be the name of a ranking method, or ""none"",
indicating that the default ranking method is to be used.
The "rule" ranking method¶
The rule method ranks alternative parses according to their rule alternatives.
Every rule alternative has a
numeric rank. A rule's rank can be
specified using the the "rank" adverb argument for that RHS
alternative. Rule ranks must be integers. They may be negative. If no numeric
rank is specified, the numeric rank is 0.
The "high_rule_only" ranking method¶
The "high_rule_only" ranking method is similar to the "rule"
ranking method, except that, at every choice point, it discards all of the
choices which have a rank lower than that of the highest ranked choice.
The "high_rule_only" ranking method can reduce the ambiguity of a
parse, but it does not necessarily do so. This is because, at each choice
point among the parse trees, it is possible that several of the choices, or
all of them, will have the same rank as the highest ranked choice.
Rule ranking¶
A parse series is kept in a structure called a
parse bocage. The parse
bocage is a tree-like structure, whose root node is the common root of all the
parse trees of the parse series. In an unambiguous parse, there will be only
one parse tree, and the parse bocage will be equivalent to that parse tree. In
an ambiguous parse, there will be
choice points in the parse bocage. At
the choice points, there will be two or more
alternatives -- choices
which result in different parse trees.
When ranking, the logic traverses the parse bocage, looking for choice points.
From the point of view of the individual parse trees, this traversal will be
top-down and left-to-right. At the choice points, the choices are ranked as
follows:
- •
- Different numeric ranks:
If the two choices have different numeric ranks, they must also have
different rule alternatives. The choice whose rule alternative has the
higher numeric rank will rank high.
- •
- Same rule alternative:
If the two choices have the same rule alternative, they rank as described
under "Null variant ranking".
- •
- Same numeric rank, different rule alternatives:
Two different rule alternatives can have the same numeric rank. If the two
choices are for rule alternatives that are different, but that have the
same numeric rank, the relative order of the two choices is
arbitrary.
Note that, in the above, the logic is the same regardless of the DSL rule to
which the rule alternatives belong. Different rule alternatives can, in the
case of a prioritized rule, belong to the same DSL rule. But two rule
alternatives may also be different because they are from two different DSL
rules.
Null variant ranking¶
Some rules have a RHS which contains
proper nullables: symbols which may
be nulled, but which are not nulling symbols. (Nulling symbols are symbols
which are
always nulled.)
When a rule alternative contains proper nullables, each instance of that rule
creates a
nulling variant. A
nulling variant is a specific
pattern of null and non-null symbols in a rule instance's RHS. In many cases,
this creates an ambiguity -- different nulling variants can match the same
substring in the input. In ambiguous parsings of this kind, some applications
may want to rank nulling variants that start with non-null symbols higher.
Other applications may want to do the opposite -- to rank nulling variants
that start with null symbols higher.
The "null-ranking" adverb for RHS alternatives specifies which nulling
variants are ranked high or low. If the "null-ranking" is
""low"", then the closer a nulling variant places its
visible (non-null) symbols to the start of the rule instance, the
higher it ranks. A null ranking of "low" is the default. If the
"null-ranking" is ""high"", then the closer a
nulling variant places its
null symbols to the start of the rule
instance, the higher it ranks. In ranking nulling variants with more than one
proper nullable, major-to-minor is left-to-right.
A general approach to sorting parses¶
The most general way to sort Marpa parses is for the application to take
control. The application can set up the Marpa semantic actions so that the
parse result of every parse tree is a "<rank, true_value>"
duple. The duples can then be sorted by "rank". Once the results are
sorted, the "rank" element of the duple can be discarded. (Those
familiar with the Schwartzian transform may note a resemblance. In Perl,
duples can be implemented as references to arrays of 2 elements.)
The user needs to be careful. In theory, ambiguity can cause an exponential
explosion in the number of results. In practice, ambiguity tends to get out of
hand very easily. Producing and sorting all the parses can take a very long
time.
This section is a restatement of earlier material in more formal language. It is
recorded here for those who find it helpful. Most readers will want to ignore
this section.
Call the set of parse trees, "T". Semantic equivalence is an
equivalence relation on "T". Call this relation "~". Call
"E", the quotient set of "T" by "~". In this
document, the term
arbitrary parse order is used to mean an arbitrary
choice among the relations which are strict total orders of "E".
Copyright and License¶
Copyright 2014 Jeffrey Kegler
This file is part of Marpa::R2. Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2. If not, see
http://www.gnu.org/licenses/.