NAME¶
Marpa::R2::NAIF - Marpa named argument interface (NAIF)
Synopsis¶
use Marpa::R2;
my $grammar = Marpa::R2::Grammar->new(
{ start => 'Expression',
actions => 'My_Actions',
default_action => 'first_arg',
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
}
);
$grammar->precompute();
my $recce = Marpa::R2::Recognizer->new( { grammar => $grammar } );
$recce->read( 'Number', 42 );
$recce->read('Multiply');
$recce->read( 'Number', 1 );
$recce->read('Add');
$recce->read( 'Number', 7 );
sub My_Actions::do_add {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 + $t2;
}
sub My_Actions::do_multiply {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 * $t2;
}
sub My_Actions::first_arg { shift; return shift; }
my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';
About this document¶
This document contains a top-level overview of, and tutorial for, the named
argument inteface (NAIF) for the Marpa parse engine. If you are a new to
Marpa, you want to start with the tutorial for the the Scanless interface
(SLIF) instead.
The NAIF is a middle level interface. It is more low level than the Scanless
interface (SLIF), which uses a domain-specific language. But it is higher
level, and provides more features, than the thin interface, which provides
direct access to the underlying Libmarpa C library.
The two examples in this document show the typical flows of NAIF Marpa method
calls. This document will use these examples to describe the basic features of
Marpa in semi-tutorial fashion. More advanced features, and full reference
details of all features, can be found in the other Marpa API documents.
The three phases¶
A parser needs to:
- •
- Accept a grammar.
- •
- Read input.
- •
- Return values from the parses, according to a semantics.
In Marpa these three tasks are, for the most part, distinct phases. Grammars are
"Marpa::R2::Grammar" objects. The reading of input and the
evaluation of the parse according to the semantics is performed by
"Marpa::R2::Recognizer" objects.
Example 1: a simple calculator¶
The synopsis shows the code for a very simple calculator. It handles only
addition and multiplication of integers. This section explains, line by line,
how it works.
Marpa::R2::Grammar::new¶
my $grammar = Marpa::R2::Grammar->new(
{ start => 'Expression',
actions => 'My_Actions',
default_action => 'first_arg',
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
}
);
Marpa grammars are "Marpa::R2::Grammar" objects. They are created with
the Marpa::R2::Grammar::new constructor. The arguments to
Marpa::R2::Grammar::new are references to hashes of named arguments. In the
key/value pairs of these hashes, the hash key is the name of the argument, and
the hash value is the value of the named argument.
The start named argument
start => 'Expression',
The "start" named argument is required. Its value is a string
containing the name of the grammar's start symbol.
Named arguments for the semantics
actions => 'My_Actions',
default_action => 'first_arg',
The "actions" and "default_action" named arguments specify
semantics. Their argument values are strings, which acquire their semantics
during evaluation.
Evaluation will be described later. Peeking ahead, "actions" provides
the name of a Perl package where Marpa will look for its
actions. The
"default_action" named argument will be interpreted as an
action
name in that package. This action name will resolve to an action -- a Perl
closure that implements semantics. The action specified by
"default_action" is used as the action for rules with no action of
their own.
The rules named argument
rules => [
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
{ lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
{ lhs => 'Factor',
rhs => [qw/Factor Multiply Factor/],
action => 'do_multiply'
},
],
The value of the "rules" named argument is a reference to an array of
rule descriptors. In this example, all the rule descriptors are in the
"long" form -- they are references to hashes of
rule
properties. In each key/value pair of a rule descriptor hash, the key is
the name of a rule property, and the hash value is the value of that rule
property.
The lhs property
The value of the "lhs" rule property must be a string containing the
name of the rule's left hand side symbol. Every Marpa rule must have a left
hand side symbol.
The rhs property
The value of the "rhs" property is a reference to an array of strings
containing names of the rule's right hand symbols, in order. This array may be
zero length, in which case this is an
empty rule -- a rule with no
symbols on the right hand side. There are no empty rules in this example.
The action property
The value of the "action" rule property is a string. Peeking ahead,
each "action" property string will be interpreted as an action name.
This action name will be resolved to a Perl closure that implements the rule's
semantics.
Marpa::R2::Grammar::precompute¶
$grammar->precompute();
Before a Marpa grammar object can be used by a Marpa recognizer, it must be
precomputed. Precomputation compiles data structures that the
recognizer will need.
Marpa::R2::Recognizer::new¶
my $recce = Marpa::R2::Recognizer->new( { grammar => $grammar } );
"Marpa::R2::NAIF::Recognizer::new" creates a new recognizer. Its
arguments are references to hashes of named arguments. In this example the
only named argument is the required argument: ""grammar"".
The value of the "grammar" named argument must be a precomputed
Marpa grammar.
Marpa::R2::Recognizer::read¶
$recce->read( 'Number', 42 );
$recce->read('Multiply');
$recce->read( 'Number', 1 );
$recce->read('Add');
$recce->read( 'Number', 7 );
The "Marpa::R2::NAIF::Recognizer::read" method takes two arguments, a
token name and a
token value. The token name must be the name of
a valid terminal symbol in the grammar. By default symbols are valid as
terminal symbols, if and only if they do NOT occur on the LHS of any rule.
The
token value must be a Perl scalar, but otherwise its form and
semantics are entirely up to the application. If the token value is never
used, it can be omitted. In the calculator example, the values of the
""Add"" and ""Multiply"" tokens are
never used, and are allowed to default to an undefined value.
Marpa::R2::Recognizer::value¶
my $value_ref = $recce->value;
my $value = $value_ref ? ${$value_ref} : 'No Parse';
The "Marpa::R2::NAIF::Recognizer::value" method returns a reference to
the parse result's value, if there was a parse result. If there was no parse
result, "Marpa::R2::NAIF::Recognizer::value" returns
"undef".
Resolving the semantics¶
The first thing "Marpa::R2::NAIF::Recognizer::value" needs to do is to
resolve the semantics.
Resolving the semantics means mapping the action
names into actions.
Actions are Perl closures which directly implement
semantics. In this example, the "actions" named argument is
specified. "actions" is a Perl package name. Marpa will look for
actions in that package.
actions => 'My_Actions',
{ lhs => 'Factor', rhs => [qw/Factor Multiply Factor/], action => 'do_multiply' },
For example, the "action" property for the above rule is
""do_multiply"" and the "actions" named argument
to the grammar was ""My_Actions"". So Marpa looks for a
closure whose fully qualified name is "My_Actions::do_multiply",
which it finds:
sub My_Actions::do_multiply {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 * $t2;
}
Rules do not always have "action" properties. That is the case with
these rules in this example:
{ lhs => 'Expression', rhs => [qw/Term/] },
{ lhs => 'Term', rhs => [qw/Factor/] },
{ lhs => 'Factor', rhs => [qw/Number/] },
The rules in the above display have no action names. When a rule has no action
name, Marpa will fall back to trying to use the default action, as described
next.
default_action => 'first_arg',
The "default_action" named argument is resolved in the same way as are
the "action" properties of the rules. In this example,
default_action is specified as ""first_arg"" and resolves
to "My_Actions::first_arg".
Actions¶
sub My_Actions::first_arg { shift; return shift; }
sub My_Actions::do_add {
my ( undef, $t1, undef, $t2 ) = @_;
return $t1 + $t2;
}
Value actions are Perl closures used as callbacks. Value actions are called when
nodes in a parse tree are evaluated. A value action receives one or more
arguments. The first argument to a value action is always a per-parse-tree
object, which the callbacks can use as a scratchpad. In these examples, the
per-parse-tree object is not used.
For a non-empty rule, the second and any subsequent arguments to the callback
are the values, in lexical order, of the symbols on the right hand side of the
rule. If the action is for an empty rule, the per-parse-tree object will be
its only argument.
Every value action is expected to return a value. With one exception, this value
is passed up to a parent node as an argument. The exception is the value for
the start rule. The return value for the start rule becomes the parse result.
Rules with no action specified for them take their semantics from the
"default_action" named argument. If there is no default action for a
grammar, rules with no action specified for them return a Perl
"undef".
Example 2: an ambiguous parse¶
This is the same calculator as before, rewritten to be ambiguous. Rather than
give multiplication precedence over addition, the rewritten calculator allows
any order of operations. In this example, the actions
("My_Actions::do_add", etc.) and the @tokens array remain the same
as before.
Eliminating precedence makes the grammar shorter, but it also means there can be
multiple parse trees, and that the different parse trees can have different
parse results. In this application we decide, for each input, to return every
one of the parse results.
use Marpa::R2;
my $ambiguous_grammar = Marpa::R2::Grammar->new(
{ start => 'E',
actions => 'My_Actions',
rules => [
[ 'E', [qw/E Add E/], 'do_add' ],
[ 'E', [qw/E Multiply E/], 'do_multiply' ],
[ 'E', [qw/Number/], ],
],
default_action => 'first_arg',
}
);
$ambiguous_grammar->precompute();
my $ambiguous_recce =
Marpa::R2::Recognizer->new( { grammar => $ambiguous_grammar } );
$ambiguous_recce->read( 'Number', 42 );
$ambiguous_recce->read('Multiply');
$ambiguous_recce->read( 'Number', 1 );
$ambiguous_recce->read('Add');
$ambiguous_recce->read( 'Number', 7 );
my @values = ();
while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value() ) ) {
push @values, ${$ambiguous_value_ref};
}
rules => [
[ 'E', [qw/E Add E/], 'do_add' ],
[ 'E', [qw/E Multiply E/], 'do_multiply' ],
[ 'E', [qw/Number/], ],
],
The rule descriptors in the ambiguous example demonstrate the "short"
or array form of rule descriptors. Array form rule descriptors are references
to arrays. Here the elements are, in order, the "lhs" property, the
"rhs" property, and the "action" property.
Marpa::R2::Recognizer::value¶
my @values = ();
while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value() ) ) {
push @values, ${$ambiguous_value_ref};
}
When called more than once, the "Marpa::R2::NAIF::Recognizer::value"
method iterates through the parse results. For each call, it returns a
reference to the parse result. At the end of the iteration, after all parse
results have been returned, "Marpa::R2::NAIF::Recognizer::value"
returns "undef". If there were no parse results,
"Marpa::R2::NAIF::Recognizer::value" returns "undef" the
first time that it is called.
Errors and exceptions¶
As a general rule, methods in the Marpa NAIF API do not return errors. When
there are errors, Marpa NAIF API methods throw an exception.
Inheritance¶
Classes in the Marpa API are not designed to be inherited.
The Marpa:: namespace¶
The "Marpa::" top-level namespace is reserved. For extensions to
Marpa, one appropriate place is the "MarpaX::" namespace. This
practice helps avoid namespace collisions, and follows a CPAN standard, as
exemplified by the "DBIx::" "LWPx::" and
"MooseX::" which are for extensions of, respectively, DBI, LWP and
Moose.
Other documents¶
This document gives a semi-tutorial overview of the entire Marpa NAIF API. For
full details on Marpa's grammar objects and their methods, see the
Marpa::R2::NAIF::Grammar document. For full details on Marpa's recognizer
objects and their methods, see the Marpa::R2::NAIF::Recognizer document.
Marpa::R2::Vocabulary is intended as a quick refresher in parsing terminology,
emphasizing how the standard terms are used in the Marpa context. the NAIF's
standard semantics are fully described in the Marpa::R2::NAIF::Semantics
document. Techniques for tracing and for debugging your Marpa grammars are
described in the Marpa::R2::NAIF::Tracing document and the
Marpa::R2::NAIF::Progress document. For those with a theoretical bent, my
sources, and other useful references, are described in
Marpa::R2::Advanced::Bibliography.
Support¶
Marpa::R2 comes without warranty. Support is provided on a volunteer basis
through the standard mechanisms for CPAN modules. The Support document has
details.
Copyright and License¶
Copyright 2014 Jeffrey Kegler
This file is part of Marpa::R2. Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2. If not, see
http://www.gnu.org/licenses/.