NAME¶
Marpa::R2::Semantics - How the SLIF evaluates parses
Synopsis¶
use Marpa::R2;
my $grammar = Marpa::R2::Scanless::G->new(
{ bless_package => 'My_Nodes',
source => \(<<'END_OF_SOURCE'),
:default ::= action => [values] bless => ::lhs
lexeme default = action => [ start, length, value ]
bless => ::name latm => 1
:start ::= Script
Script ::= Expression+ separator => comma
comma ~ [,]
Expression ::=
Number bless => primary
| '(' Expression ')' bless => paren assoc => group
|| Expression '**' Expression bless => exponentiate assoc => right
|| Expression '*' Expression bless => multiply
| Expression '/' Expression bless => divide
|| Expression '+' Expression bless => add
| Expression '-' Expression bless => subtract
Number ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+
# allow comments
:discard ~ <hash comment>
<hash comment> ~ <terminated hash comment> | <unterminated
final hash comment>
<terminated hash comment> ~ '#' <hash comment body> <vertical space char>
<unterminated final hash comment> ~ '#' <hash comment body>
<hash comment body> ~ <hash comment char>*
<vertical space char> ~ [\x{A}\x{B}\x{C}\x{D}\x{2028}\x{2029}]
<hash comment char> ~ [^\x{A}\x{B}\x{C}\x{D}\x{2028}\x{2029}]
END_OF_SOURCE
}
);
my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
my $input = '42*2+7/3, 42*(2+7)/3, 2**7-3, 2**(7-3)';
$recce->read(\$input);
my $value_ref = $recce->value();
die "No parse was found\n" if not defined $value_ref;
# Result will be something like "86.33... 126 125 16"
# depending on the floating point precision
my $result = ${$value_ref}->doit();
package My_Nodes;
sub My_Nodes::primary::doit { return $_[0]->[0]->doit() }
sub My_Nodes::Number::doit { return $_[0]->[2] }
sub My_Nodes::paren::doit { my ($self) = @_; $self->[1]->doit() }
sub My_Nodes::add::doit {
my ($self) = @_;
$self->[0]->doit() + $self->[2]->doit();
}
sub My_Nodes::subtract::doit {
my ($self) = @_;
$self->[0]->doit() - $self->[2]->doit();
}
sub My_Nodes::multiply::doit {
my ($self) = @_;
$self->[0]->doit() * $self->[2]->doit();
}
sub My_Nodes::divide::doit {
my ($self) = @_;
$self->[0]->doit() / $self->[2]->doit();
}
sub My_Nodes::exponentiate::doit {
my ($self) = @_;
$self->[0]->doit()**$self->[2]->doit();
}
sub My_Nodes::Script::doit {
my ($self) = @_;
return join q{ }, map { $_->doit() } @{$self};
}
About this document¶
This document describes the semantics for Marpa's primary interface, the SLIF.
What is semantics?¶
A parser is an algorithm that takes a string of symbols (tokens or characters)
and finds a structure in it. Traditionally, that structure is a tree.
Rarely is an application interested only in the tree. Usually the idea is that
the string "means" something: the idea is that the string has a
semantics. Traditionally and most often, the tree is an intermediate
step in producing a value, a value which represents the "meaning" or
"semantics" of the string.
"Evaluating" a tree means finding its semantics. The rest of this
document describes Marpa's methods for evaluating trees. Those of you who have
dealt with other traditional parsers, such as yacc and bison, will find
Marpa's approach familiar.
Instances¶
At the start of evaluation, semantics is associated with instances of rule
alternatives or of lexemes. An
instance is an occurrence in terms of G1
locations. Every instance has two locations: a start location and an end
location.
A rule alternative is the LHS of a rule, together with one of its RHS
alternatives. Unless a rule is a prioritized rule, it has exactly one rule
alternative.
Prioritized rules very often only have one rule alternative, in which case they
are called trivial prioritized rules. But prioritized rules may have many rule
alternatives.
When a rule has only one rule alternative, or when context makes it clear what
is meant, a rule alternative is often simply called a rule. In particular, a
rule alternative instance is almost always called simply a
rule
instance.
Nodes¶
In a parse tree, nodes are points where the tree branches or terminates. Tree
terminations are also called terminals or "leaves".
Every rule instance in a Marpa parse is represented by a branch point (or
"node") in the tree. The topmost node of a tree is its "root
node". (Trees are easiest to draw upside down, so traditionally in
programming, the top of a tree is its root.)
A node, or branch point, "branches" into zero or more "child
nodes". The node just above a child node, the one from which the child
node branches out, is called its parent node.
If the node is for a non-quantified rule instance, the parent node is the LHS of
the rule, and the child nodes are the RHS of the rule alternative. If the node
is for a quantified rule, the parent node is the LHS of the quantified rule,
and the child nodes are the items of the sequence of symbols on the right hand
side. If the node is for a lexeme, the node represents the lexeme's symbol and
there will be no child nodes.
A parent node can have zero or more children. Rule instances with zero children
are nulled rule instances, and are "leaf nodes". Leaf nodes are also
called
terminals. In Marpa's parse trees, every terminal is either a
lexeme or a nulled rule instance.
In Marpa, evaluation only takes place within the structural (G1) subgrammar, and
the descriptions of the behaviors of rule and lexeme instances below applies
only to the G1 subgrammar. L0 rule alternatives and terminal symbols do not
become nodes in the parse tree, and are never evaluated.
The order of node evaluation¶
The nodes of a Marpa parse tree are evaluated recursively, left-to-right and
bottom-up. This means that, when a parent node is evaluated, the values of all
child nodes are known and available for use by the semantics. The final value
of a parse is the value of the top node of the parse tree.
Parse trees and parse series¶
Because Marpa allows ambiguous parsing, each parse can produce a
parse
series -- a series of zero or more parse trees, each with its own parse
result. The first call to the the SLIF recognizer's "value()" method
after the recognizer is created is the start of the first parse series, and
the Parse Series Setup Phase takes place at this point.
The first parse series continues until there is a call to the
"series_restart()" method or until the recognizer is destroyed. An
application is usually interested in only one parse series.
The "series_restart()" method starts a new parse series. The Parse
Series Setup Phase for that parse series will take place during the next call
of the SLIF recognizer's "value()" method.
The Parse Series Setup Phase is one of several phases in which the semantics are
executed. Applications will find that the order in which these phases occurs
"just works". But in some cases the details will matter.
Applications whose behavior might depend on the details include those which
make unusual use of side effects in the semantics; and those which alter their
symbol tables at runtime. A full description of phases of Marpa'a semantic
processing is in a separate document.
Nulled subtrees¶
A nulled subtree is a subtree of the parse tree formed by a nulled node and its
direct and indirect child nodes. (All these child nodes will also be nulled
nodes.) Before evaluation, Marpa prunes all nulled subtrees back to their
topmost nulled node. Of all the ways of dealing with nulled subtrees, this is
the simplest and Marpa's users have found it a natural approach. More detail
on the semantics of nulled symbols and subtrees can be found in a separate
document.
Actions and how Marpa finds them¶
The way in which the SLIF finds the value of a node is called that node's
action. Actions can be explicit or implicit. An explicit action is one
that is explicitly specified by the application, in one of the ways to be
described below. A node's implicit action is the one it performs if it has no
explicit action.
Lexeme actions¶
The implicit action for a lexeme is to return its literal value in the input
stream, as a string. An explicit default action name for lexemes may be set
using the the lexeme default statement.
Rule actions¶
The implicit action for a rule instance is to return a Perl "undef".
An explicit action for a RHS alternative can be specified using the
"action" adverb for the its RHS alternative. A default explicit
action for RHS alternatives can be specified with a default pseudo-rule.
Nulled symbol actions¶
As mentioned, nulled subtrees are pruned back to their topmost symbol. Lexemes
are never nulled, so a nulled symbol is always the LHS of a rule instance, and
the action is determined from the rule alternative, as just described.
A complication arises if the symbol appears on the LHS of more than one nullable
rule alternative. Because the symbol is nulled, the input is no help in
determining which rule alternative to use. The rule alternative whose
semantics are used for a nulled symbol is determined as follows:
- •
- If all nullable rule alternatives have the same semantics, that semantics
is used.
- •
- If one of the nullable rule alternatives is empty (that is, has a
zero-length RHS), then the empty alternative's semantics are used.
- •
- In the remaining case, two or more of the rule alternatives have different
action names, but none of the alternatives has a zero-length RHS. When
this happens, Marpa throws an exception. One easy way to fix the issue, is
to add an empty rule with the intended semantics.
In determining whether the semantics of two nullable rule alternatives are
"the same", the blessing is taken into account. Two rule
alternatives are considered to have different semantics if they are blessed
differently. The SLIF's null semantics are described in more detail in a
separate document.
Blessings¶
Part of a rule alternative's or lexeme's action may be a blessing. A blessing is
the name of a Perl package. In the case of a rule evaluation closure, the
argument containing its child values will be blessed into that package.
Not all actions are rule evaluation closures. An action may be, for example, an
array descriptor action. In cases where the action is not a rule evaluation
closure, the value of the action will be blessed into that package.
Only Perl objects pointed to by references can be blessed. It is a fatal error
to try to use a blessing with an inappropriate action.
Implicitly (that is, if no blessing was explicitly specified), an action is not
blessed. The implicit action itself cannot be blessed -- an attempt to do so
is a fatal error.
Explicit blessings are made using the "bless" adverb. The
"bless" adverb is allowed
- •
- for RHS alternatives;
- •
- for lexemes;
- •
- for the default lexeme statement;
- •
- and for the default pseudo-rule.
A L0 RHS alternative cannot have a "bless" adverb.
The value of a "bless" adverb is called a
blessing. If the
blessing is a Perl word (a string of alphanumerics or underscores), the name
of the class will be formed by prepending the value of the
"bless_package" named argument, followed by a double colon
(""::"").
If the blessing begins with a double colon (""::""), it is a
reserved blessing. The reserved blessings are as follows:
- "::undef"
- The RHS alternatives or lexemes will not be blessed. When this document
states that a RHS alternative or lexeme has a blessing of
"::undef", it means exactly the same thing as when it states
that a RHS alternative or lexeme will not be blessed. For both RHS
alternatives and lexemes, the implicit blessing is
"::undef".
- "::lhs"
- The RHS alternative is blessed into a class whose name is based on the LHS
of the RHS alternative. A blessing of "::lhs" is not allowed for
a lexeme.
The class will be the name of the LHS with whitespace changed to an
underscore. (As a reminder, the whitespace in symbol names will have been
normalized, with leading and trailing whitespace removed, and all other
whitespace sequences changed to a single ASCII space.) When a
"::lhs" blessing value applies to a rule alternative, it is a
fatal error if the LHS contains anything other than alphanumerics and
whitespace. In particular, the LHS cannot already contain an underscore
(""_""). The "::lhs" blessing is most useful
in a default pseudo-rule.
- "::name"
- The lexeme is blessed into a class whose name is based on the name of the
lexeme. The "::name" blessing is not allowed for a RHS
alternative.
The class is derived from the symbol name in the same way, and subject to
the same restrictions, as described above for deriving a class name from
the LHS of a rule alternative. The "::name" reserved blessing is
most useful in the lexeme default statement.
If any rule alternative or lexeme of a SLIF grammar has a blessing other than
"::undef", a "bless_package" is required, and failure to
specify one results in a fatal error.
Explicit actions¶
There are four kinds of explicit action names:
- •
- Array descriptors
- •
- Reserved action names
- •
- Perl identifiers
- •
- Perl names
These are detailed in the sections that follow.
Array descriptor actions¶
lexeme default = action => [ start, length, value ]
bless => ::name latm => 1
If an action is enclosed in square brackets, it is an
array descriptor,
and the value of the lexeme or rule alternative will be an array. Inside the
array descriptor is a comma separated list of zero or more array item
descriptors. The
array item descriptors are keywords that describe how
the array is to be filled out.
If the array descriptor is an empty pair of square brackets
(""[]""), then there are zero array item descriptors, and
the value will be an empty array. Otherwise the array item descriptors are
interpreted as lists and those lists are used to fill out the array.
- "length"
- The "length" array item descriptor puts a single-element list
into the array. That one element will be the length of the rule or lexeme
instance. Length is in characters.
- "lhs"
- The "lhs" array item descriptor puts a single-element list into
the array. That one element will be the LHS symbol ID of the rule. Because
of historical reasons, for a lexeme instance, it will the symbol ID, but
for a nulling symbol it will be a Perl "undef".
- "name"
- The "name" array item descriptor puts a single-element list into
the array. This will always be a string. For a rule whose name is defined,
that one element will be the rule name. For an unnamed rule, it will be
the name of the LHS symbol. For a lexeme, it will be the symbol name of
the lexeme. For a nulling symbol it will be the name of that symbol.
- "rule"
- The "rule" array item descriptor puts a single-element list into
the array. For a rule, that one element will be the rule ID. In other
cases, that one element will be a Perl "undef".
- "start"
- The "start" array item descriptor puts a single-element list
into the array. That one element will be the start location of the rule or
lexeme instance. The start location is an offset in the input string. The
elements of the "length" and "start" item descriptors
are defined such that the end location is always start location plus
length.
- "symbol"
- The "symbol" array item descriptor puts a single-element list
into the array. This will always be the name of a symbol. For a rule, it
will be the name of the LHS symbol. For a lexeme, it will be the symbol
name of the lexeme. For a nulling symbol it will be the name of that
symbol.
- "value"
- For a rule alternative, the "value" array item descriptor puts a
list of zero or more elements into the array. The list will contain the
values of the rule instance's children, in left-to-right order.
For a lexeme, the "value" array item descriptor puts a
single-element list into the array. That one element will be a list
containing a single element, the token value of the lexeme.
- "values"
- The "value" and "values" array item descriptors are
synonyms, and may be used interchangeably for both rules alternatives and
lexemes.
Example¶
The array item descriptors fill out the array in the order in which they appear
in the array descriptor. For example, if we are dealing with a rule, and the
array descriptor is ""[ start, length, value ]"", then the
return value is an reference to an array, whose length will vary, but which
will contain at least two elements. The first element will be the start
location in the input string of this rule instance, and the second will be its
length. The remaining elements will be the values of the rule instance's RHS
children, in lexical order. If the rule instance is nulled, the array will
contain only two elements: start location and length.
Reserved action names¶
If the action value begins with a double colon (""::""), it
is a reserved action. The following are recognized:
- •
- "::array"
"::array" is equivalent to "[values]". This means that,
for both lexeme and rule instances, the actions "[values]",
"[value]" and "::array" will do exactly the same
thing.
- •
- "::first"
The value of the rule instance is that of the rule instance's first child.
If there is no such child, the value is a Perl "undef". It is a
fatal error if a RHS alternative with a "::first" action is
blessed. It is also a fatal error to use a "::first" action with
a lexeme.
- •
- "::undef"
The value of the rule or lexeme instance will be a Perl "undef".
It is a fatal error if a RHS alternative with an "::undef"
action is blessed.
Perl identifiers as action names¶
An action name is considered to be a Perl identifier, if it is a sequence of one
or more alphanumerics and underscores. If the action name is a Perl
identifier, it is treated as the name of a Perl variable. To successfully
resolve to actions, Perl identifiers must be resolved to Perl names, as
described below.
Perl names as action names¶
For this purpose, a Perl name is a series of two or more Perl identifiers
separated by double colons (""::""). Note that, by this
definition, a Perl name cannot start with a double colon. Action names
starting with double colons are always treated as reserved action names.
Action names which are Perl names by this definition are treated as if they were
fully qualified Perl names. Fully qualified Perl names are resolved to
variables in Perl's namespace, as described below.
The semantics package¶
To resolve Perl identifiers to Perl names, a semantics package must be defined.
The semantics package can be defined using the SLIF recognizer's
"semantics_package" named argument, or it can be taken from the
argument to the first "value()" call of the parse series. The
"semantics_package" named argument takes precedence.
If the arguments to the "value()" method are used to specify the
semantics package, within a parse series they must consistently specify the
same package. For details, see the description of SLIF recognizer's
"value()" method.
If the user wants the Perl variables implementing the semantics in the
"main" namespace, she can specify "main" as the semantics
package. But it is usually good practice to keep Perl variables intended for
use by Marpa's semantics in their own namespace, especially if the application
is not small.
Resolving Perl identifiers to Perl names¶
A Perl identifier is resolved to a Perl name by prepending the semantic package,
followed by a double colon (""::""). For a Perl identifier
to resolve successfully to a Perl name, a semantics package must be defined.
For example, if the action name is ""some_var"", the action
name will be regarded as a Perl identifer. If the semantics package is
""My_Actions"", Marpa will convert the action name to
""My_Actions::some_var"", and hand it on for processing as
a fully qualified Perl name.
Resolving Perl names to Perl variables¶
Once Marpa has a fully qualified Perl name, it looks in Perl's symbol tables for
a Perl variable with that name, either the name of a subroutine, or of a
scalar. It is important to note that for the purposes of Perl's symbol tables,
and therefore for the purposes of Marpa's resolution of Perl names, references
are scalars.
If Marpa finds a Perl subroutine with that fully qualified Perl name, the action
name is resolved to that subroutine, which then becomes a
rule evaluation
closure. If Marpa does not find a Perl subroutine with that name, but does
find a Perl scalar with that name, the action name is resolved to that Perl
scalar. (Again, for this purpose a Perl reference is a kind of Perl scalar.)
Executing rule evaluation closures¶
A rule evaluation closure action is always called in scalar context, and its
return value will be used as the value of its node. Arguments to the rule
evaluation closure will be as follows:
- •
- If the rule instance is not nulled and the rule alternative is not
blessed, the second and subsequent arguments are the values of its child
nodes, in lexical order.
- •
- If the rule instance is nulled, there will be only one argument: the
per-parse argument.
- •
- If the rule alternative is blessed, and the rule instance is not nulled,
the closure will always have exactly two arguments. The first will be the
per-parse argument, and the second will be a blessed array that contains
the child values in lexical order. (The grouping of child values into an
array is required in order to allow the blessing to stay in effect.)
Note that, in every case, the first argument of a rule evaluation closure is the
per-parse argument.
Quantified rule nodes¶
Everything just said about rule nodes applies to nodes for quantified rules. But
there is a difference between quantified rules and others, and it a big one if
you are writing a rule evaluation closure.
In other rules, the right hand side is fixed in length, and therefore the number
of child nodes is known in advance. This is not the case with a quantified
rule. The rule evaluation closure for a quantified rule must be capable of
dealing with a variable number of child nodes.
Action context¶
sub do_S {
my ($action_object) = @_;
my $rule_id = $Marpa::R2::Context::rule;
my $slg = $Marpa::R2::Context::slg;
my ( $lhs, @rhs ) =
map { $slg->symbol_display_form($_) } $slg->rule_expand($rule_id);
$action_object->{text} =
"rule $rule_id: $lhs ::= "
. ( join q{ }, @rhs ) . "\n"
. "locations: "
. ( join q{-}, Marpa::R2::Context::location() ) . "\n";
return $action_object;
} ## end sub do_S
In addition to the per-parse argument and their child values, rule evaluation
closures also have access to
context variables.
- •
- $Marpa::R2::Context::slg is set to the SLIF grammar being parsed.
- •
- $Marpa::R2::Context::rule is the ID of the current rule alternative. Given
the rule alternative ID, an application can find its LHS and RHS symbols
using the SLIF grammar's "rule_expand()" method.
- •
- "Marpa::R2::Context::location()" returns the start and end G1
locations of the current rule instance. Note that these are G1 locations,
not input stream locations.
Bailing out of parse evaluation¶
my $bail_message = "This is a bail out message!";
sub do_bail_with_message_if_A {
my ($action_object, $terminal) = @_;
Marpa::R2::Context::bail($bail_message) if $terminal eq 'A';
}
sub do_bail_with_object_if_A {
my ($action_object, $terminal) = @_;
Marpa::R2::Context::bail([$bail_message]) if $terminal eq 'A';
}
Perl scalars as actions¶
If a Perl scalar is the action, it becomes the value of the node, as is.
References are scalars in this context so that, for example, the value of the
node could be a reference to an array.
Another possibility is that the Perl scalar action is a reference to code. What
happens in this case is very different from the case where the action is a
rule evaluation closure. A rule evaluation closure is executed to produce the
value of the node. In contrast, the reference to a subroutine is
NOT
executed -- it becomes the value of the node directly.
Assuming no trickery, such as use of Perl's "local" keyword, takes
place, resolution to a Perl scalar will always resolve to a single, global
scalar. Any modification of this scalar will be seen by other nodes of the
current parse, and by other parses. All this suggests that, as a matter of
good practice, Perl scalar actions be used only as constants.
For example, assume that actions are in a package named "My_Actions",
which contains a hash reference named "empty_hash",
package My_Actions;
our $empty_hash = {};
It can be tempting, in building objects which are hashes, to start with a left
node whose action is "empty_hash" and to add contents to it as the
object is passed up the evaluation tree. But $empty_hash points to a single
hash object. This single hash object will shared by all nodes, with all nodes
seeing each other's changes. Worse, all Marpa parsers which use the same
"My_Actions" namespace will share the same hash object. The correct
way to define an "empty_hash" action that initializes an empty hash
is as a rule evaluation closure that returns "{}".
sub My_Actions::empty_hash { return {}; }
Visibility of Perl object actions¶
Most applications do not manipulate the Perl symbol table at runtime, and do not
make use of Perl's "local" keyword for declarations. Applications
which use the Perl global namespace in conventional ways, and which use the
same names to point to the same variables throughout Marpa execution, can
ignore questions about the visibility of the Perl variables used in actions.
Less conventional applications should be aware that, for resolution from a Perl
name to a Perl variable to take place, that Perl name must refer to the
intended variable, and this variable must be visible, at Parse Series Setup
Time. Parse Series Setup Time occurs during the first call to a recognizer's
"value()" method of the parse series. More details about Parse
Series Setup Time can be found in the document that describes the processing
phases of Marpa's semantics.
The per-parse argument¶
The first argument of every rule evaluation closure is the
per-parse
argument. This is initialized
- •
- To the result returned by the per-parse constructor, if there is a
per-parse constructor.
- •
- Otherwise, to the argument to the SLIF recognizer's "value()"
method, if that argument is defined.
- •
- Otherwise, as a last resort, to an empty hashref.
The per-parse argument is destroyed once the evaluation of the parse tree is
finished. Between creation and destruction, the per-parse argument is not
touched by Marpa's internals -- it is reserved for use by the application.
The primary way of passing data while evaluating a parse tree is purely
functional -- results from child nodes are passed up to parent nodes.
Applications can use the per-parse argument for data which does not
conveniently fit the functional model. Symbol tables are one common example of
data that is best handled outside the functional model.
The per-parse constructor¶
The per-parse constructor is the "new()" method of the semantics
package. If there is no semantics package, or if it has no "new()"
method, there is no per-parse constructor.
The per-parse constructor is called with one argument. This argument is the
argument of the SLIF recognizer's "value()" method, if one was
defined. Otherwise it is the name of the semantics package.
The result returned by the per-parse constructor becomes the per-parse argument.
The per-parse constructor is called in the Tree Setup Phase.
Parse order¶
If a parse is ambiguous, all parses are returned, with no duplication. By
default, the order is arbitrary, but it is also possible to control the order.
Details are in the document on parse order.
Infinite loops¶
Grammars with infinite loops (cycles) are generally regarded as useless in
practical applications. Due to lack of interest, the SLIF does not currently
support them, although Libmarpa itself, Marpa's thin interface and the NAIF
all do. Those interested in knowing more can look at the document on the
NAIF's support of infinitely ambiguous grammars.
Copyright and License¶
Copyright 2014 Jeffrey Kegler
This file is part of Marpa::R2. Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2. If not, see
http://www.gnu.org/licenses/.