NAME¶
bisonc++ - Generate a C++ parser class and parsing function
SYNOPSIS¶
bisonc++ [OPTIONS]
grammar-file
SECTIONS¶
This manual page contains the following sections:
- 1. DESCRIPTION
- overview and short history of of bisonc++;
- 2. GENERATED FILES
- files bisonc++ may generate;
- 3. OPTIONS
- Bisonc++’s command-line options;
- 4. DIRECTIVES
- Bisonc++’s grammar-specification
directives;
- 5. POLYMORPHIC SEMANTIC VALUES
- How to use polymorphic semantic values in parsers generated
by bisonc++;
- 6. PUBLIC MEMBERS AND -TYPES
- Members and types that can be used by calling
software;
- 7. PRIVATE ENUMS AND -TYPES
- Enumerations and types only available to the Parser
class;
- 8. PRIVATE MEMBER FUNCTIONS
- Member functions that are only available to the
Parser class;
- 9. PRIVATE DATA MEMBERS
- Data members that are only available to the Parser
class;
- 10. TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE
- An overview of the types and variables that are used to
define and store the grammar-tables generated by bisonc++;
- 11. RESTRICTIONS ON TOKEN NAMES
- Name restrictions for user-defined symbols;
- 12. OBSOLETE SYMBOLS
- Symbols available to bison(1), but not to
bisonc++;
- 13. EXAMPLE
- Guess what this is?
- 14. USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS
- How to refer to Parser tokens from within a lexical
scanner;
- 15. FILES
- (Skeleton) files used by bisonc++;
- 16. SEE ALSO
- References to other programs and documentation;
- 17. BUGS
- Some additional stuff that should not qualify as bugs.
- 18. ABOUT bisonc++
- More history;
- AUTHOR
- At the end of this man-page.
Looking for a specific section? Search for its number + a dot.
1. DESCRIPTION¶
Bisonc++ derives from previous work on
bison by Alain Coetmeur
(coetmeur@icdc.fr), who created in the early ’90s a
C++ class
encapsulating the
yyparse function as generated by the GNU-
bison
parser generator.
Initial versions of
bisonc++ (up to version 0.92) wrapped Alain’s
program in a program offering a more modern user-interface, removing all
old-style (
C)
%define directives from
bison++’s
input specification file (see below for an in-depth discussion of the
differences between
bison++ and
bisonc++). Starting with version
0.98,
bisonc++ represents a complete rebuilt of the parser generator,
closely following descriptions given in Aho, Sethi and Ullman’s
Dragon Book. Since version 0.98
bisonc++ is a
C++
program, rather than a
C program generating
C++ code.
Bisonc++ expands the concepts initially implemented in
bison and
bison++, offering a cleaner setup of the generated parser class. The
parser class is derived from a base-class, mainly containing the
parser’s token- and type-definitions as well as several member functions
which should not be modified by the programmer.
Most of these base-class members might also be defined directly in the parser
class, but were defined in the parser’s base-class. This design results
in a very lean parser class, declaring only members that are actually defined
by the programmer or that have to be defined by
bisonc++ itself (e.g.,
the member function
parse as well as some support functions requiring
access to facilities that are only available in the parser class itself,
rather than in the parser’s base class).
This design does not require the use of virtual members: the members which are
not involved in the actual parsing process may always be (re)implemented
directly by the programmer. Thus there is no need to apply or define virtual
member functions.
In fact, there are only two public members in the parser class generated by
bisonc++:
setDebug (see below) and
parse. Remaining
members are private, and those that can be redefined by the programmer using
bisonc++ usually receive initial, very simple default in-line
implementations. The (partial) exception to this rule is the member function
lex, producing the next lexical token. For
lex either a
standardized interface or a mere declaration is offerered (requiring the
programmer to provide his/her own
lex implementation).
To enforce a primitive namespace,
bison used a well-known
naming-convention: all its public symbols started with
yy or
YY.
Bison++ followed
bison in this respect, even though a class by
itself offers enough protection of its identifiers. Consequently, these
yy and
YY conventions are now outdated, and
bisonc++ does
not generate or use symbols defined in either the parser (base) class or in
its member functions starting with
yy or
YY. Instead, following
a suggestion by Lakos (2001), all data members have names starting with
d_, and all static data members have names starting with
s_.
This convention was not introduced to enforce identifier protection, but to
clarify the storage type of variables. Other (local) symbols lack specific
prefixes. Furthermore,
bisonc++ allows its users to define the parser
class in a particular namespace of their own choice.
Bisonc++ should be used as follows:
- o
- As usual, a grammar must be defined. Using bisonc++
this is not different, and the reader is referred to
bisonc++’s manual and other sources (like Aho, Sethi and
Ullman’s book) for details about how to specify and decorate
grammars.
- o
- The number and function of the various %define
declarations as used by bison++, however, is greatly modified.
Actually, all of bison’s %define declarations were
replaced by their (former) first arguments. Furthermore,
`macro-style’ declarations are no longer supported or required.
Finally, all directives use lower-case characters only and do not contain
underscore characters (but sometimes hyphens). E.g., %define DEBUG
is now declared as %debug; %define LSP_NEEDED is now
declared as %lsp-needed (note the hyphen).
- o
- As noted, no `macro style’ %define
declarations are required anymore. Instead, the normal practice of
defining class members in source files and declaring them in a class
header files can be adhered to using bisonc++. Basically,
bisonc++ concentrates on its main tasks: the definition of an
parser class and the implementation of its parsing function int
parse, leaving all other parts of the parser class’ definition
to the programmer.
- o
- Having specified the grammar and (usually) some directives
bisonc++ is able to generate files defining the parser class and
the implementation of the member function parse and its support
functions. See the next section for details about the various files that
may be generated by bisonc++.
- o
- All members (except for the member parse) and its
support functions must be implemented by the programmer. Additional member
functions should be declared in the parser class’ header. At the
very least the member int lex() must be implemented
(although a standardized implementation can also be generated by
bisonc++). The member lex is called by parse to
obtain the next available token. The member function void error(char
const *msg) may also be re-implemented by the programmer, but a
basic in-line implementation is provided by default. The member function
error is called when parse detects (syntactic) errors.
- o
- The parser can now be used in a program. A very simple
example would be:
int main()
{
Parser parser;
return parser.parse();
}
2. GENERATED FILES¶
Bisonc++ may create the following files:
- o
- A file containing the implementation of the member function
parse and its support functions. The member parse is a
public member that can be called to parse a token-sequence according to a
specified LALR1 type of grammar. The implementations of these members is
by default written on the file parse.cc. The programmer should not
modify the contents of this file; by default it is rewritten every time
bisonc++ is called.
- o
- A file containing an initial setup of the parser class,
containing the declaration of the public member parse and of its
(private) support members. New members may safely be declared in the
parser class, as it is only created by bisonc++ if not yet
existing, using the filename <parser-class>.h (where
<parser-class> is the the name of the defined parser
class).
- o
- A file containing the parser class’ base
class. This base class should not be modified by the programmer. It
contains types defined by bisonc++, as well as several (protected)
data members and member functions, which should not be redefined by the
programmer. All symbolic parser terminal tokens are defined in this class,
thereby escalating these definitions in a separate class (cf. Lakos,
(2001)), which in turn prevents circular dependencies between the lexical
scanner and the parser (circular dependencies may easily occur here, as
the parser needs access to the lexical scanner class when defining the
lexical scanner as one of its data members, whereas the lexical scanner
needs access to the parser class to know about the grammar’s
symbolic terminal tokens; escalation is a way out of such circular
dependencies). By default this file is (re)written any time
bisonc++ is called, using the filename
<parser-class>base.h.
- o
- A file containing an implementation header. The
implementation header rather than the parser’s class header file
should be included by the parser’s source files implementing member
functions declared by the programmer. The implementation header first
includes the parser class’s header file, and then provides default
in-line implementations for its members error and print
(which may be altered by the programmer). The member lex may also
receive a standard in-line implementation. Alternatively, its
implementation can be provided by the programmer (see below). Any
directives and/or namespace directives required for the proper compilation
of the parser’s additional member functions should be declared next.
The implementation header is included by the file defining parse.
By default the implementation header is created if not yet existing,
receiving the filename <parser-class>.ih.
- o
- A verbose description of the generated parser. This file is
comparable to the verbose ouput file originally generated by
bison++. It is generated when the option --verbose or
-V is provided. If so, bisonc++ writes the file
<grammar>.output, where <grammar> is the name of
the file containing the grammar definition.
3. OPTIONS¶
If available, single letter options are listed between parentheses following
their associated long-option variants. Single letter options require arguments
if their associated long options require arguments as well. Options affecting
the class header or implementation header file are ignored if these files
already exist.
- o
- --analyze-only (-A)
Only analyze the grammar. No files are (re)written. This option can be used
to test the grammatic correctness of modification `in situ’, without
overwriting previously generated files. If the grammar contains syntactic
errors only syntax analysis is performed.
- o
- --baseclass-header=header (-b)
Use header as the pathname of the file to contain the parser’s
base class. This class defines, e.g., the parser’s symbolic tokens.
Defaults to the name of the parser class plus the suffix base.h. It
is generated, unless otherwise indicated (see --no-baseclass-header
and --dont-rewrite-baseclass-header below).
- o
- --baseclass-preinclude=header (-H)
Use header as the pathname to the file preincluded in the
parser’s base-class header. This option is needed in situations
where the base class header file refers to types which might not yet be
known. E.g., with %polymorphic a std::string value type
might be used. Since the string header file is not by default
included in parserbase.h we need a way to inform the compiler about
this and possibly other headers. The suggested procedure is to use a
pre-include header file declaring the required types. By default `
header’ is surrounded by double quotes: #include
"header" is used when the option -H header is
specified. When the argument is surrounded by pointed brackets #include
<header> is included. In the latter case, quotes might be
required to escape interpretation by the shell (e.g., using -H
’<header>’).
- o
- --baseclass-skeleton=skeleton (-B)
Use skeleton as the pathname of the file containing the skeleton of
the parser’s base class. Its filename defaults to
bisonc++base.h.
- o
- --class-header=header (-c)
Use header as the pathname of the file to contain the parser class.
Defaults to the name of the parser class plus the suffix .h
- o
- --class-name className
Defines the name of the C++ class that is generated. If neither this
option, nor the %class-name directory is specified, then the
default class name ( Parser) is used.
- o
- --class-skeleton=skeleton (-C)
Use skeleton as the pathname of the file containing the skeleton of
the parser class. Its filename defaults to bisonc++.h.
- o
- --construction
Details about the construction of the parsing tables are written to the same
file as written by the --verbose option (i.e.,
<grammar>.output, where <grammar> is the input
file read by bisonc++. This information is primarily useful for
developers. It augments the information written to the verbose grammar
output file, generated by the --verbose option.
- o
- --debug
Provide parse and its support functions with debugging code, showing
the actual parsing process on the standard output stream. When included,
the debugging output is active by default, but its activity may be
controlled using the setDebug(bool on-off) member. An
#ifdef DEBUG macro is not supported by bisonc++. Rerun
bisonc++ without the --debug option to remove the debugging
code.
- o
- --error-verbose
When a syntactic error is reported, the generated parse function will dump
the parser’s state stack to the standard output stream. The stack
dump shows on separate lines a stack index followed by the state stored at
the indicated stack element. The first stack element is the stack’s
top element.
- o
- --filenames=filename (-f)
Specify a filename to use for all files produced by bisonc++.
Specific options overriding particular filenames are also available (which
then, in turn, overide the name specified by this option).
- o
- --flex
Bisonc++ generates code calling d_scanner.yylex() to obtain
the next lexical token, and calling d_scanner.YYText() for the
matched text, unless overruled by options or directives explicitly
defining these functions. By default, the interface defined by
flexc++(1) is used. This option is only interpreted if the
--scanner option or %scanner directive is also used.
- o
- --force-class-header
By default the generated class header is not overwritten once it has been
created. This option can be used to force the (re)writing of the file
containing the parser’s class.
- o
- --force-implementation-header
By default the generated implementation header is not overwritten once it
has been created. This option can be used to force the (re)writing of the
implementation header file.
- o
- --help (-h)
Write basic usage information to the standard output stream and
terminate.
- o
- --implementation-header=header (-i)
Use header as the pathname of the file to contain the implementation
header. Defaults to the name of the generated parser class plus the suffix
.ih. The implementation header should contain all directives and
declarations only used by the implementations of the parser’s
member functions. It is the only header file that is included by the
source file containing parse’s implementation . User defined
implementation of other class members may use the same convention, thus
concentrating all directives and declarations that are required for the
compilation of other source files belonging to the parser class in one
header file.
- o
- --implementation-skeleton=skeleton
(-I)
Use skeleton as the pathname of the file containing the skeleton of
the implementation header. Its filename defaults to
bisonc++.ih.
- o
- --insert-stype
This option is only effective if the debug option (or %debug
directive) has also been specified. When insert-stype has been
specified the parsing function’s debug output will also show
selected semantic values. It should only be used if objects or variables
of the semantic value type STYPE__ can be inserted into
ostreams.
- o
- --max-inclusion-depth=value
Set the maximum number of nested grammar files. Defaults to 10.
- o
- --namespace=namespace (-n)
Define the parser base class, the paser class and the parser implentations
in the namespace namespace. By default no namespace is defined. If
this options is used the implementation header will contain a commented
out using namespace declaration for the requested
namespace.
- o
- --no-baseclass-header
Do not write the file containing the parser class’ base class, even if
that file doesn’t yet exist. By default the file containing the
parser’s base class is (re)written each time bisonc++ is
called. Note that this option should normally be avoided, as the base
class defines the symbolic terminal tokens that are returned by the
lexical scanner. By suppressing the construction of this file any
modification in these terminal tokens will not be communicated to the
lexical scanner.
- o
- --no-lines
Do not put #line preprocessor directives in the file containing the
parser’s parse function. By default #line preprocessor
directives are placed in the file containing the parser’s
parse function. This option allows the compiler and debuggers to
associate errors with lines in your grammar specification file, rather
than with the source file containing the parse function
itself.
- o
- --no-parse-member
Do not write the file containing the parser’s predefined parser member
functions, even if that file doesn’t yet exist. By default the file
containing the parser’s parse member function is (re)written
each time bisonc++ is called. Note that this option should normally
be avoided, as this file contains parsing tables which are altered
whenever the grammar definition is modified.
- o
- --own-debug
Extensively displays the actions performed by bisonc++’s parser
when it processes the grammar specification s. This implies the
--verbose option.
- o
- --own-tokens (-T)
The tokens returned as well as the text matched when bisonc++ reads
its input files(s) are shown when this option is used.
- This option does not result in the generated parsing
function displaying returned tokens and matched text. If that is what you
want, use the --print-tokens option.
- o
- --parsefun-skeleton=skeleton (-P)
Use skeleton as the pathname of the file containing the parsing
member function’s skeleton. Its filename defaults to
bisonc++.cc.
- o
- --parsefun-source=source (-p)
Define source as the name of the source file to contain the parser
member function parse. Defaults to parse.cc.
- o
- --polymorphic-skeleton=skeleton (-M)
Use skeleton as the pathname of the file containing the skeleton of
the polymorphic template classes. Its filename defaults to
bisonc++polymorphic.
- o
- --polymorphic-inline-skeleton=skeleton
(-m)
Use skeleton as the pathname of the file containing the skeleton of
the inline implementations of the members of the polymorphic template
classes. Its filename defaults to bisonc++polymorphic.
- o
- --print-tokens (-t)
The generated parsing function implements a function print__
displaying (on the standard output stream) the tokens returned by the
parser’s scanner as well as the corresponding matched text. This
implementation is suppressed when the parsing function is generated
without using this option. The member print__) is called from
Parser::print, which is defined in-line in the the parser’s
class header. Calling Parser::print__, therefore, can also easily
be controlled by an option controlled by the program using the parser
generated by bisonc++.
- This option does not show the tokens returned and
text matched by bisonc++ itself when reading its input s. If
that is what you want, use the --own-tokens option.
- o
- --required-tokens=number
Following a syntactic error, require at least number successfully
processed tokens before another syntactic error can be reported. By
default number is zero.
- o
- --scanner=header (-s)
Use header as the pathname to the file defining a class
Scanner. When this option is used the parser’s member
int lex() is predefined as
int lex()
{
return d_scanner.lex();
}
and an object Scanner d_scanner is composed into the parser. The
example shows the function that’s called by default. By specifying
the --flex option (or %flex directive) the function
d_scanner.yylex() is called. Any other function to call can be
specified using the --scanner-token-function option (or
%scanner-token-function directive).
- By default header is surrounded by double quotes
(using, e.g., #include "header"). When the argument is
surrounded by pointed brackets #include <header> is
included.
- o
- --scanner-debug
Show de scanner’s matched rules and returned tokens. This offers an
extensive display of the rules and tokens matched and returned by
bisonc++’s scanner, not of just the tokens and matched text
received by bisonc++. If that is what you want use the
--own-tokens option.
- o
- --scanner-matched-text-function=function-call
The scanner function returning the text that was matched at the last call of
the scanner’s token function. A complete function call expression
should be provided (including a scanner object, if used). This option
overrules the d_scanner.matched() call used by default when the
%scanner directive is specified, and it overrules the
d_scanner.YYText() call used when the %flex directive is
provided. Example:
--scanner-matched-text-function "myScanner.matchedText()"
- o
- --scanner-token-function=function-call
The scanner function returning the next token, called from the
parser’s lex function. A complete function call expression
should be provided (including a scanner object, if used). This option
overrules the d_scanner.lex() call used by default when the
%scanner directive is specified, and it overrules the
d_scanner.yylex() call used when the %flex directive is
provided. Example:
--scanner-token-function "myScanner.nextToken()"
- o
- --show-filenames
Writes the names of the generated files to the standard error stream.
- o
- --skeleton-directory=directory (-S)
Specifies the directory containing the skeleton files to use. This option
can be overridden by the specific skeleton-specifying options ( -B -C,
-H, -I, -M and -m).
- o
- --target-directory=directory
Specifies the directory where generated files should be written. By default
this is the directory of bisonc++’s input file. The
--target-directory option does not affect files that were
explicitly named (either as option or as directive).
- o
- --thread-safe
No static data are modified, making bisonc++ thread-safe.
- o
- --usage
Write basic usage information to the standard output stream and
terminate.
- o
- --verbose (-V)
Write a file containing verbose descriptions of the parser states and what
is done for each type of look-ahead token in that state. This file also
describes all conflicts detected in the grammar, both those resolved by
operator precedence and those that remain unresolved. It is not created by
default, but if requested the information is written on
<grammar>.output, where <grammar> is the grammar
specification file passed to bisonc++.
- o
- --version (-v)
Display bisonc++’s version number and terminate.
4. DIRECTIVES¶
The following directives can be specified in the initial section of the grammar
specification file. When command-line options for directives exist, they
overrule the corresponding directives given in the grammar specification file.
Directives affecting the class header or implementation header file are
ignored if these files already exist.
- o
- %baseclass-header header
Defines the pathname of the file containing the parser’s base class.
This directive is overridden by the --baseclass-header or -b
command-line options.
- o
- %baseclass-preinclude header
Use header as the pathname to the file pre-included in the
parser’s base-class header. See the description of the
--baseclass-preinclude option for details about this directive.
Like the convention adopted for this argument, header will (by
default) be surrounded by double quotes. However, when the argument is
surrounded by pointed brackets #include <header> is
included.
- o
- %class-header header
Defines the pathname of the file containing the parser class. This directive
is overridden by the --class-header or -c command-line
options.
- o
- %class-name parser-class-name
Declares the name of the parser class. It defines the name of the C++
class that is generated. If no %class-name is specified the default
class name Parser is used.
- o
- %debug
Provide parse and its support functions with debugging code, showing
the actual parsing process on the standard output stream. When included,
the debugging output is active by default, but its activity may be
controlled using the setDebug(bool on-off) member. No
#ifdef DEBUG macros are used anymore. By rerunning bisonc++
without the --debug option or %debug declaration an
equivalent parser is generated not containing the debugging code.
- o
- %error-verbose
This directive can be specified to dump the parser’s state stack to
the standard output stream when the parser encounters a syntactic error.
The stack dump shows on separate lines a stack index followed by the state
stored at the indicated stack element. The first stack element is the
stack’s top element.
- o
- %expect number
This directive specifies the exact number of shift/reduce and reduce/reduce
conflicts for which no warnings are to be generated. Details of the
conflicts are reported in the verbose output file (e.g.,
grammar.output). If the number of encountered conflicts deviates
from ` number’, then this directive is ignored.
- o
- %filenames header
Defines the generic name of all generated files, unless overridden by
specific names. This directive is overridden by the --filenames or
-f command-line options.
- o
- %flex
When provided, the scanner matched text function is called as
d_scanner.YYText(), and the scanner token function is called as
d_scanner.yylex(). This directive is only interpreted if the
%scanner directive is also provided.
- o
- %implementation-header header
Defines the pathname of the file containing the implementation header. This
directive is overridden by the --implementation-header or -i
command-line options.
- o
- %include path
This directive is used to switch to path while processing the grammar
description. Unless path is an absolute file-path, path is
searched relative to the location of bisonc++’s grammar
specification file. This directive can be used to split long grammar
specification files in shorter, meaningful units. After processing
path processing continues beyond the %include path
directive.
- o
- %left terminal ...
Defines the names of symbolic terminal tokens that must be treated as
left-associative. I.e., in case of a shift/reduce conflict, a reduction is
preferred over a shift. Sequences of %left, %nonassoc, %right and
%token directives may be used to define the precedence of
operators. In expressions, the first used directive will have the lowest
precedence, the last used the highest. See also %token below.
- o
- %locationstruct struct-definition
Defines the organization of the location-struct data type LTYPE__.
This struct should be specified analogously to the way the parser’s
stacktype is defined using %union (see below). The location struct
is named LTYPE__. If neither locationstruct nor
LTYPE__ is specified, the aforementioned default struct is
used.
- o
- %lsp-needed
Defining this causes bisonc++ to include code into the generated
parser using the standard location stack. The token-location type defaults
to the following struct, defined in the parser’s base class when
this directive is specified:
struct LTYPE__
{
int timestamp;
int first_line;
int first_column;
int last_line;
int last_column;
char *text;
};
Bisonc++ does not provide the elements of the LTYPE__
struct with values. Action blocks of production rules may refer to the
location stack element associated with a production element using @
variables, like @1.timestamp, @3.text, @5. The rule’s
location struct itself may be referred to as either d_loc__ or
@@.
- o
- %ltype typename
Specifies a user-defined token location type. If %ltype is used,
typename should be the name of an alternate (predefined) type
(e.g., size_t). It should not be used if a %locationstruct
specification is defined (see below). Within the parser class, this type
is available as the type ` LTYPE__’. All text on the line
following %ltype is used for the typename specification. It
should therefore not contain comment or any other characters that are not
part of the actual type definition.
- o
- %namespace namespace
Define all of the code generated by bisonc++ in the name space
namespace. By default no name space is defined. If this options is
used the implementation header is provided with a commented out using
namespace declaration for the specified name space. This directive is
overridden by the --namespace command-line option.
- o
- %negative-dollar-indices
Do not generate warnings when zero- or negative dollar-indices are used in
the grammar’s action blocks. Zero or negative dollar-indices are
commonly used to implement inherited attributes, and should normally be
avoided. When used, they can be specified like $-1 or
$<type>-1, where type is empty, an STYPE__ tag
or a field-name.
- o
- %no-lines
By default #line preprocessor directives are inserted just before
action statements in the file containing the parser’s parse
function. These directives are suppressed by the %no-lines
directive.
- o
- %nonassoc terminal ...
Defines the names of symbolic terminal tokens that should be treated as
non-associative. I.e., in case of a shift/reduce conflict, a reduction is
preferred over a shift. Sequences of %left, %nonassoc, %right and
%token directives may be used to define the precedence of
operators. In expressions, the first used directive has the lowest
precedence, the last one used the highest. See also %token
below.
- o
- %parsefun-source source
Defines the pathname of the file containing the parser member parse.
This directive is overridden by the --parse-source or -p
command-line options.
- o
- %polymorphic polymorphic-specification(s)
Bison’s traditional way of handling multiple semantic values is to use
a %union specification (see below). Although %union is also
supported by bisonc++, a polymorphic semantic value class is
preferred due to its improved type safety.
- The %polymorphic directive defines a polymorphic
semantic value class and can be used instead of a %union
specification. Refer to section POLYMORPHIC SEMANTIC VALUES below
or to bisonc++’s user manual for a detailed description of
the specification, characteristics, and use of polymorphic semantic
values.
- o
- %prec token
Overrules the defined precendence of an operator for a particular
grammatical rule. A well known application of %prec is:
expression:
’-’ expression %prec UMINUS
{
...
}
Here, the default priority and precedence of the ` -’ token as
the subtraction operator is overruled by the precedence and priority of
the UMINUS token, which is commonly defined as
%right UMINUS
(see below) following, e.g., the ’*’ and
’/’ operators.
- o
- %print-tokens
The print directive provides an implementation of the Parser
class’s print__ function displaying the current token value
and the text matched by the lexical scanner as received by the generated
parse function.
- o
- %required-tokens number
Following a syntactic error, require at least number successfully
processed tokens before another syntactic error can be reported. By
default number is zero.
- o
- %right terminal ...
Defines the names of symbolic terminal tokens that should be treated as
right-associative. I.e., in case of a shift/reduce conflict, a shift is
preferred over a reduction. Sequences of %left, %nonassoc, %right
and %token directives may be used to define the precedence of
operators. In expressions, the first used directive will have the lowest
precedence, the last used the highest. See also %token below.
- o
- %scanner header
Use header as the pathname to the file pre-included in the
parser’s class header. See the description of the --scanner
option for details about this option. Similar to the convention adopted
for this argument, header by default is surrounded by double
quotes. However, when the argument is surrounded by pointed brackets
#include <header> is included. This directive results in the
definition of a composed Scanner d_scanner data member into the
generated parser, and in the definition of a int lex() member,
returning d_scanner.lex().
- By specifying the %flex directive the function
d_scanner.yylex() is called. Any other function to call can be
specified using the --scanner-token-function option (or
%scanner-token-function directive).
- o
- %scanner-matched-text-function function-call
The scanner function returning the text that was matched by the lexical
scanner after its token function (see below) has returned. A complete
function call expression should be provided (including a scanner object,
if used). Example:
%scanner-matched-text-function myScanner.matchedText()
By specifying the %flex directive the function
d_scanner.YYText() is called.
- If the function call contains white space
scanner-token-function should be surrounded by double quotes.
- o
- %scanner-token-function function-call
The scanner function returning the next token, called from the generated
parser’s lex function. A complete function call expression
should be provided (including a scanner object, if used). Example:
%scanner-token-function d_scanner.lex()
If the function call contains white space scanner-token-function
should be surrounded by double quotes.
- o
- %start non-terminal
The non-terminal non-terminal should be used as the grammar’s
start-symbol. If omitted, the first grammatical rule is used as the
grammar’s starting rule. All syntactically correct sentences must be
derivable from this starting rule.
- o
- %stype typename
The type of the semantic value of non-terminal tokens. By default it is
int. %stype, %union, and %polymorphic are mutually
exclusive directives.
- Within the parser class, the semantic value type is
available by the type name ` STYPE__’. All text on the line
following %stype is used for the typename specification. It
should therefore not contain comment or any other characters that are not
part of the actual type definition.
- o
- %target-directory directory
Specifies the directory where generated files should be written. By default
this is the directory of bisonc++’s input file. The
%target-directory directive does not affect files that were
explicitly named (either as option or as directive).
- o
- %token terminal ...
Defines the names of symbolic terminal tokens. Sequences of %left,
%nonassoc, %right and %token directives may be used to define
the precedence of operators. In expressions, the first used directive has
the lowest precedence, the last used the highest.
NOTE: Symbolic tokens are defined as enum-values in the
parser’s base class. The names of symbolic tokens may not be equal
to the names of the members and types defined by bisonc++ itself
(see the next sections). This requirement is not currently enforced
by bisonc++, but compilation errors may result if this requirement
is violated.
- o
- %type <type> non-terminal ...
In combination with %polymorphic or %union: associate the
semantic value of a non-terminal symbol with a polymorphic semantic value
tag or union field defined by these directives.
- o
- %union union-definition
Acts identically to the identically named bison and bison++
declaration. Bisonc++ generates a union, named STYPE__, as
its semantic type.
- o
- %weak-tags
This directive is ignored unless the %polymorphic directive was
specified. It results in the declaration of enum Tag__
rather than enum class Tag__. When in doubt, this directive should
not be used.
5. POLYMORPHIC SEMANTIC VALUES¶
The
%polymorphic directive results in
bisonc++ generating a parser
using polymorphic semantic values. The various semantic values are specified
by pairs, consisting of
tags, which are
C++ identifiers, and
C++ type names. Tags and type names are separated from each other by
colons. Multiple tag and type name combinations are separated from each other
by semicolons, and an optional semicolon ends the final tag/type
specification.
Here is an example, defining three semantic values: an
int, a
std::string and a
std::vector<double>:
%polymorphic INT: int; STRING: std::string;
VECT: std::vector<double>
The identifier to the left of the colon is called the
tag-identifier (or
simply
tag), and the type name to the right of the colon is called the
type-name. The type-names must be built-in types or must offer default
constructors.
If type-names refer to types declared in header files that were not already
included by the parser’s base class header, then these header
s
must be inserted using the
%baseclass-preinclude directive.
The
%type directive is used to associate (non-)terminals with semantic
value types.
Semantic values may also be associated with terminal tokens. In that case it is
the lexical scanner’s responsibility to assign a properly typed value to
the parser’s
STYPE__ d_val__ data member.
Non-terminals may be automatically be associated with polymorphic semantic
values using
%type directives. E.g., following
%polymorphic INT: int; TEXT: std::string
%type <INT> expr
the
expr non-terminal returns
int semantic values. In this case, a
rule like
expr:
expr ’+’ expr
{
$$ = $1 + $3;
}
automatically associates $$, $1 and $3 with
int values. $$ is an lvalue
(representing the semantic value associated with the
expr: rule), while
$1 and $3 represent the
int semantic value associated with the
expr non-terminal in the production rule
’-’ expr
(rvalues).
When negative dollar indices (like $-1) are used, any pre-defined associations
between non-terminals and semantic types are ignored. However, semantic value
types can explicitly be specified (or provided
%type associations can
be overruled) using the common `$<type>$’ or
`$<type>1’ syntax (in this and following examples index number 1
represents any valid positive index (i.e., not exceeding the number of
production rule elements seen so far); -1 represents any valid negative
index).
The type-overruling syntax does not allow blanks to be used (so $<INT>$ is
OK, $< INT >$ isn’t).
Various combinations of type-associations and type specifications may be
encountered:
- o
- $-1: %type associations are ignored, and the
semantic value type STYPE__ is used instead. A warning is issued
unless the %negative-dollar-indices directive was specified.
- o
- $<tag>-1: error: <tag>
specifications are not allowed for negative dollar indices.
%type<TAG> and $$ or $1 specifications:
$$ or $1 specifications |
|
|
|
|
%type<TAG> |
|
|
|
|
|
absent |
|
no <tag> |
|
STYPE__ is used |
|
|
|
|
|
|
|
|
|
$<id> |
|
|
|
|
|
|
|
|
|
$<> |
|
|
|
|
|
|
|
|
|
$<STYPE__> |
|
STYPE__ |
|
no <tag> |
|
STYPE__ is used |
|
|
|
|
|
|
|
|
|
$<id> |
|
|
|
|
|
|
|
|
|
$<> |
|
|
|
|
|
|
|
|
|
$<STYPE__> |
|
(existing) tag |
|
no <tag> |
|
auto-tag |
|
|
|
|
|
|
|
|
|
$<id> |
|
|
|
|
|
|
|
|
|
$<> |
|
|
|
|
|
|
|
|
|
$<STYPE__> |
|
(undefined) tag |
|
no <tag> |
|
tag-error |
|
|
|
|
|
|
|
|
|
$<id> |
|
|
|
|
|
|
|
|
|
$<> |
|
|
|
|
|
|
|
|
|
$<STYPE__> |
|
|
|
|
|
|
auto-tag: $$ and $1 represent, respectively,
$$.get<tag>() and
$1.get<tag>();
tag-error:
error: tag undefined;
tag-override: if
id is a defined tag, then $<tag>$ and
$<tag>1 represent the tag’s type. Otherwise:
error (using
undefined tag
id).
Member calls (`$$.’, `$1.’): When using `$$.’ or
`$1.’ default tags are ignored. Instead, a warning is issued that the
default tag is ignored. This syntax allows members of the semantic value type
(
STYPE__) to be called explicitly. The default tag is only ignored if
there are no additional characters (e.g., blanks, closing parentheses) between
the dollar-expressions and the member selector operator. The opposite,
ignoring default tag associations, is accomplished using tconstructions like
$<STYPE__>$ and $<STYPE__>1.
The
%polymorphic directive adds the following definitions and
declarations to the generated base class header and parser source file (if the
%namespace directive was used then all declared/defined elements are
placed within the name space specified by that directive):
- o
- Three additional headers are included by the parser’s
base class header:
#include <memory>
#include <stdexcept>
#include <type_traits>
- o
- All semantic value type identifiers are collected in a
strongy typed enumeration ` Tag__’. E.g.,
enum class Tag__
{
INT,
STRING,
VECT
};
- o
- The namespace Meta__ contains almost all of the code
implementing polymorphic values.
The name space
Meta__ contains the following elements:
- o
- A polymorphic base class Base. This class is
normally not explicitly referred to by user-defined code. Refer to by
bisonc++’s user manual for a detailed description of this
class.
- o
- For each of the tag-identifiers specified with the
%polymorphic directive a class template
Semantic<Tag__> is defined, containing a data element of the
type-name matching the Tag__ for which Semantic<Tag__>
was derived. The Semantic<Tag__> classes are normally not
explicitly referred to by user-defined code. Refer to by
bisonc++’s user manual for a detailed description of these
classes.
- o
- A class SType, derived from
std::shared_ptr<Base>. This class becomes the parser’s
semantic value type. It offers the following members:
Constructors: default, copy and move constructors;
Assignment operators: copy and move assignment operators declaring
SType or any of the %polymorphic type-names as their
right-hand side operands;
Tag__ tag() const, returning Semantic<Tag__>’s
Tag__ value;
ReturnType get<Tag__>() const. ReturnType refers to the
semantic value stored inside Semantic<Tag__>. If the
type-name is a built-in type a copy of the value is returned, otherwise a
reference to a constant object is returned;
This member checks for 0-pointers and for Tag__ mismatches between
the requested and actual Tag__, throwing a std::logic_error
if found. DataType &get<Tag__>() returns a reference to
the (modifiable) semantic value stored inside
Semantic<Tag__>.
This member checks for 0-pointers and for Tag__ mismatches between
the requested and actual Tag__, in that case replacing the current
Semantic object pointed to by a new Semantic<Tag__>
object of the requested Tag__. ReturnType data<Tag__>()
const. ReturnType refers to the semantic value stored inside
Semantic<Tag__>. If the type-name is a built-in type a copy
of the value is returned, otherwise a reference to a constant object is
returned;
This is a (partially) unchecking variant of the corresponing
get member, resulting in a Segfault if used when the
shared_ptr holds a 0-pointer, and throwing a std::bad_cast
in case of a mismatch between the requested and actual Tag__.
DataType &data<Tag__>() returns a reference to the
(modifiable) semantic value stored inside Semantic<Tag__>.
This is a (partially) unchecking variant of the corresponing
get member, resulting in a Segfault if used when the
shared_ptr holds a 0-pointer, and throwing a std::bad_cast
in case of a mismatch between the requested and actual Tag__. Since
bisonc++ declares typedef Meta__::SType STYPE__, polymorphic
semantic values can be used without referring to the name space
Meta__.
6. PUBLIC MEMBERS AND -TYPES¶
The following public members and types are available to by users of the parser
classes generated by
bisonc++ (parser class-name prefixes (e.g.,
Parser::) prefixes are silently implied):
- o
- LTYPE__:
The parser’s location type (user-definable). Available only when
either %lsp-needed, %ltype or %locationstruct has been
declared.
- o
- STYPE__:
The parser’s stack-type (user-definable), defaults to int.
- o
- Tokens__:
The enumeration type of all the symbolic tokens defined in the grammar file
(i.e., bisonc++’s input file). The scanner should be prepared
to return these symbolic tokens Note that, since the symbolic tokens are
defined in the parser’s class and not in the scanner’s class,
the lexical scanner must prefix the parser’s class name to the
symbolic token names when they are returned. E.g., return
Parser::IDENT should be used rather than return IDENT.
- o
- int parse():
The parser’s parsing member function. It returns 0 when parsing has
completed successfully, 1 if errors were encountered while parsing the
input.
- o
- void setDebug(bool mode):
This member can be used to activate or deactivate the debug-code compiled
into the parsing function. It is always available but is only operational
if the %debug directive or --debug option was specified.When
debugging code has been compiled into the parsing function, it is
not active by default. To activate the debugging code, use
setDebug(true). This member can be used to activate or deactivate
the debug-code compiled into the parsing function. It is available but has
no effect if no debug code has been compiled into the parsing function.
When debugging code has been compiled into the parsing function, it is
active by default, but debug-code is suppressed by calling
setDebug(false).
When the
%polymorphic directive is used:
- o
- Meta__:
Templates and classes that are required for implementing the polymorphic
semantic values are all declared in the Meta__ namespace. The
Meta__ namespace itself is nested under the namespace that may have
been declared by the %namespace directive.
- o
- Tag__:
The (strongly typed) enum class Tag__ contains all the
tag-identifiers specified by the %polymorphic directive. It is
declared outside of the Parser’s class, but within the namespace
that may have been declared by the %namespace directive.
7. PRIVATE ENUMS AND -TYPES¶
The following enumerations and types can be used by members of parser classes
generated by
bisonc++. They are actually protected members inherited
from the parser’s base class.
- o
- Base::ErrorRecovery__:
This enumeration defines two values:
DEFAULT_RECOVERY_MODE__,
UNEXPECTED_TOKEN__
The DEFAULT_RECOVERY_MODE__ terminates the parsing process. The
non-default recovery procedure is available once an error token is
used in a production rule. When the parsing process throws
UNEXPECTED_TOKEN__ the recovery procedure is started (i.e., it is
started whenever a syntactic error is encountered or ERROR()
is called).
- The recovery procedure consists of (1) looking for the
first state on the state-stack having an error-production, followed by (2)
handling all state transitions that are possible without retrieving a
terminal token. Then, in the state requiring a terminal token and starting
with the initial unexpected token (3) all subsequent terminal tokens are
ignored until a token is retrieved which is a continuation token in that
state.
- If the error recovery procedure fails (i.e., if no
acceptable token is ever encountered) error recovery falls back to the
default recovery mode (i.e., the parsing process is terminated).
- o
- Base::Return__:
This enumeration defines two values:
PARSE_ACCEPT = 0,
PARSE_ABORT = 1
(which are of course the parse function’s return values).
8. PRIVATE MEMBER FUNCTIONS¶
The following members can be used by members of parser classes generated by
bisonc++. When prefixed by
Base:: they are actually protected
members inherited from the parser’s base class. Members for which the
phrase ``Used internally’’ is used should not be called by
user-defined code.
- o
- Base::ParserBase():
The default base-class constructor. Used internally.
- o
- void Base::ABORT() const throw(Return__):
This member can be called from any member function (called from any of the
parser’s action blocks) to indicate a failure while parsing thus
terminating the parsing function with an error value 1. Note that this
offers a marked extension and improvement of the macro YYABORT
defined by bison++ in that YYABORT could not be called from
outside of the parsing member function.
- o
- void Base::ACCEPT() const throw(Return__):
This member can be called from any member function (called from any of the
parser’s action blocks) to indicate successful parsing and thus
terminating the parsing function. Note that this offers a marked extension
and improvement of the macro YYACCEPT defined by bison++ in
that YYACCEPT could not be called from outside of the parsing
member function.
- o
- void Base::clearin():
This member replaces bison(++)’s macro yyclearin and
causes bisonc++ to request another token from its lex()
member, even if the current token has not yet been processed. It is a
useful member when the parser should be reset to its initial state, e.g.,
between successive calls of parse. In this situation the scanner
will probably be reloaded with new information too (in the context of a
flex-generated scanner by, e.g., calling the scanner’s
yyrestart member.
- o
- bool Base::debug() const:
This member returns the current value of the debug variable.
- o
- void Base::ERROR() const throw(ErrorRecovery__):
This member can be called from any member function (called from any of the
parser’s action blocks) to generate an error, and thus initiate the
parser’s error recovery code. Note that this offers a marked
extension and improvement of the macro YYERROR defined by
bison++ in that YYERROR could not be called from outside of
the parsing member function.
- o
- void error(char const *msg):
By default implemented inline in the parser.ih internal header file,
it writes a simple message to the standard error stream. It is called when
a syntactic error is encountered, and its default implementation may
safely be altered.
- o
- void errorRecovery__():
Used internally.
- o
- void Base::errorVerbose__():
Used internally.
- o
- void executeAction(int):
Used internally.
- o
- int lex():
By default implemented inline in the parser.ih internal header file,
it can be pre-implemented by bisonc++ using the scanner
option or directive (see above); alternatively it must be
implemented by the programmer. It interfaces to the lexical scanner, and
should return the next token produced by the lexical scanner, either as a
plain character or as one of the symbolic tokens defined in the
Parser::Tokens__ enumeration. Zero or negative token values are
interpreted as `end of input’.
- o
- int lookup(bool):
Used internally.
- o
- void nextToken():
Used internally.
- o
- void Base::pop__():
Used internally.
- o
- void Base::popToken__():
Used internally.
- o
- void print__()():
Used internally.
- o
- void print()):
By default implemented inline in the parser.ih internal header file,
this member calls print__ to display the last received token and
corrseponding matched text. The print__ member is only implemented
if the --print-tokens option or %print-tokens directive was
used when the parsing function was generated. Calling print__ from
print is unconditional, but can easily be controlled by the using
program, by defining, e.g., a command-line option.
- o
- void Base::push__():
Used internally.
- o
- void Base::pushToken__():
Used internally.
- o
- void Base::reduce__():
Used internally.
- o
- void Base::symbol__():
Used internally.
- o
- void Base::top__():
Used internally.
9. PRIVATE DATA MEMBERS¶
The following data members can be used by members of parser classes generated by
bisonc++. All data members are actually protected members inherited
from the parser’s base class.
- o
- size_t d_acceptedTokens__:
Counts the number of accepted tokens since the start of the parse()
function or since the last detected syntactic error. It is initialized to
d_requiredTokens__ to allow an early error to be detected as
well.
- o
- bool d_debug__:
When the debug option has been specified, this variable (true
by default) determines whether debug information is actually
displayed.
- o
- LTYPE__ d_loc__:
The location type value associated with a terminal token. It can be used by,
e.g., lexical scanners to pass location information of a matched token to
the parser in parallel with a returned token. It is available only when
%lsp-needed, %ltype or %locationstruct has been defined.
Lexical scanners may be offered the facility to assign a value to this
variable in parallel with a returned token. In order to allow a scanner
access to d_loc__, d_loc__’s address should be passed
to the scanner. This can be realized, for example, by defining a member
void setLoc(STYPE__ *) in the lexical scanner, which is then called
from the parser’s constructor as follows:
d_scanner.setSLoc(&d_loc__);
Subsequently, the lexical scanner may assign a value to the parser’s
d_loc__ variable through the pointer to d_loc__ stored
inside the lexical scanner.
- o
- LTYPE__ d_lsp__:
The location stack pointer. Do not modify.
- o
- size_t d_nErrors__:
The number of errors counted by parse. It is initialized by the
parser’s base class initializer, and is updated while parse
executes. When parse has returned it contains the total number of
errors counted by parse. Errors are not counted if suppressed
(i.e., if d_acceptedTokens__ is less than
d_requiredTokens__).
- o
- size_t d_nextToken__:
A pending token. Do not modify.
- o
- size_t d_requiredTokens__:
Defines the minimum number of accepted tokens that the parse function
must have processed before a syntactic error can be generated.
- o
- int d_state__:
The current parsing state. Do not modify.
- o
- int d_token__:
The current token. Do not modify.
- o
- STYPE__ d_val__:
The semantic value of a returned token or non-terminal symbol. With
non-terminal tokens it is assigned a value through the action rule’s
symbol $$. Lexical scanners may be offered the facility to assign a
semantic value to this variable in parallel with a returned token. In
order to allow a scanner access to d_val__, d_val__’s
address should be passed to the scanner. This can be realized, for
example, by passing d_val__’s address to the lexical
scanner’s constructor. Subsequently, the lexical scanner may assign
a value to the parser’s d_val__ variable through the pointer
to d_val__ stored in a data member of the lexical scanner. Note
that in some cases this approach must be used to make available the
correct semantic value to the parser. In particular, when a grammar state
defines multiple reductions, depending on the next token, the
reduction’s action only takes place following the retrieval of the
next token, thus losing the initially matched token text.
- o
- LTYPE__ d_vsp__:
The semantic value stack pointer. Do not modify.
10. TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE¶
In the file defining the
parse function the following types and variables
are defined in the anonymous namespace. These are mentioned here for the sake
of completeness, and are not normally accessible to other parts of the parser.
- o
- char const author[]:
Defining the name and e-mail address of Bisonc++’s author.
- o
- ReservedTokens:
This enumeration defines some token values used internally by the parsing
functions. They are:
PARSE_ACCEPT = 0,
_UNDETERMINED_ = -2,
_EOF_ = -1,
_error_ = 256,
These tokens are used by the parser to determine whether another token
should be requested from the lexical scanner, and to handle
error-conditions.
- o
- StateType:
This enumeration defines several moe token values used internally by the
parsing functions. They are:
NORMAL,
ERR_ITEM,
REQ_TOKEN,
ERR_REQ, // ERR_ITEM | REQ_TOKEN
DEF_RED, // state having default reduction
ERR_DEF, // ERR_ITEM | DEF_RED
REQ_DEF, // REQ_TOKEN | DEF_RED
ERR_REQ_DEF // ERR_ITEM | REQ_TOKEN | DEF_RED
These tokens are used by the parser to define the types of the various
states of the analyzed grammar.
- o
- PI__ (Production Info):
The type defines a struct containing information about the production rules
that were used by a grammar.
Its first field contains the identification number of a production’s
defining non-terminal;
Its second field defines the number of elements of a production
- o
- SR__ (Shift-Reduce Info):
This struct provides the shift/reduce information for the various
grammatic states. SR__ values are collected in arrays, one array
per grammatic state. These array, named s_<nr>, where
tt<nr> is a state number are defined in the anonymous namespace as
well. The SR__ elements consist of two unions, defining fields that
are applicable to, respectively, the first, intermediate and the last
array elements.
The first element of each array consists of (1st field) a StateType
and (2nd field) the index of the last array element; intermediate elements
consist of (1st field) a symbol value and (2nd field) (if negative) the
production rule number reducing to the indicated symbol value or (if
positive) the next state when the symbol given in the 1st field is the
current token; the last element of each array consists of (1st field) a
placeholder for the current token and (2nd field) the (negative) rule
number to reduce to by default or the (positive) number of an error-state
to go to when an erroneous token has been retrieved. If the 2nd field is
zero, no error or default action has been defined for the state, and
error-recovery is attepted.
- o
- STACK_EXPANSION:
An enumeration value specifying the number of additional elements that are
added to the state- and semantic value stacks when full.
- o
- PI__ (Production Info):
This struct provides information about production rules. It has two
fields: d_nonTerm is the identification number of the
production’s non-terminal, d_size represents the number of
elements of the productin rule.
- o
- static PI__ s_productionInfo:
Used internally by the parsing function.
- o
- static SR__ s_<nr>[]:
Here, <nr> is a numerical value representing a state number.
Used internally by the parsing function.
- o
- static SR__ *s_state[]:
Used internally by the parsing function.
11. RESTRICTIONS ON TOKEN NAMES¶
To avoid collisions with names defined by the parser’s (base) class, the
following identifiers should not be used as token names:
- o
- Identifiers ending in two underscores;
- o
- Any of the following identifiers: ABORT, ACCEPT, ERROR,
clearin, debug, or setDebug.
12. OBSOLETE SYMBOLS¶
All
DECLARATIONS and
DEFINE symbols not listed above but defined
in
bison++ are obsolete with
bisonc++. In particular, there is
no
%header{ ... %} section anymore. Also, all
DEFINE
symbols related to member functions are now obsolete. There is no need for
these symbols anymore as they can simply be declared in the class header file
and defined elsewhere.
13. EXAMPLE¶
Using a fairly worn-out example, we’ll construct a simple calculator
below. The basic operators as well as parentheses can be used to specify
expressions, and each expression should be terminated by a newline. The
program terminates when a
q is entered. Empty lines result in a mere
prompt.
First an associated grammar is constructed. When a syntactic error is
encountered all tokens are skipped until then next newline and a simple
message is printed using the default
error function. It is assumed that
no semantic errors occur (in particular, no divisions by zero). The grammar is
decorated with actions performed when the corresponding grammatical production
rule is recognized. The grammar itself is rather standard and straightforward,
but note the first part of the specification file, containing various other
directives, among which the
%scanner directive, resulting in a composed
d_scanner object as well as an implementation of the member function
int lex. In this example, a common
Scanner class construction
strategy was used: the class
Scanner was derived from the class
yyFlexLexer generated by
flex++(1). The actual process of
constructing a class using
flex++(1) is beyond the scope of this
man-page, but
flex++(1)’s specification file is mentioned below,
to further complete the example. Here is
bisonc++’s input file:
%filenames parser
%scanner ../scanner/scanner.h
// lowest precedence
%token NUMBER // integral numbers
EOLN // newline
%left ’+’ ’-’
%left ’*’ ’/’
%right UNARY
// highest precedence
%%
expressions:
expressions
evaluate
|
prompt
;
evaluate:
alternative
prompt
;
prompt:
{
prompt();
}
;
alternative:
expression
EOLN
{
cout << $1 << endl;
}
|
’q’
done
|
EOLN
|
error
EOLN
;
done:
{
cout << "Done.\n";
ACCEPT();
}
;
expression:
expression
’+’
expression
{
$$ = $1 + $3;
}
|
expression
’-’
expression
{
$$ = $1 - $3;
}
|
expression
’*’
expression
{
$$ = $1 * $3;
}
|
expression
’/’
expression
{
$$ = $1 / $3;
}
|
’-’
expression %prec UNARY
{
$$ = -$2;
}
|
’+’
expression %prec UNARY
{
$$ = $2;
}
|
’(’
expression
’)’
{
$$ = $2;
}
|
NUMBER
{
$$ = atoi(d_scanner.YYText());
}
;
Next,
bisonc++ processes this file. In the process,
bisonc++
generates the following files from its skeletons:
- o
- The parser’s base class, which is not modified by the
programmer at all:
#ifndef ParserBase_h_included
#define ParserBase_h_included
#include <vector>
#include <iostream>
namespace // anonymous
{
struct PI__;
}
class ParserBase
{
public:
// $insert tokens
// Symbolic tokens:
enum Tokens__
{
NUMBER = 257,
EOLN,
UNARY,
};
// $insert STYPE
typedef int STYPE__;
private:
int d_stackIdx__;
std::vector<size_t> d_stateStack__;
std::vector<STYPE__> d_valueStack__;
protected:
enum Return__
{
PARSE_ACCEPT__ = 0, // values used as parse()’s return values
PARSE_ABORT__ = 1
};
enum ErrorRecovery__
{
DEFAULT_RECOVERY_MODE__,
UNEXPECTED_TOKEN__,
};
bool d_debug__;
size_t d_nErrors__;
size_t d_requiredTokens__;
size_t d_acceptedTokens__;
int d_token__;
int d_nextToken__;
size_t d_state__;
STYPE__ *d_vsp__;
STYPE__ d_val__;
ParserBase();
void ABORT() const;
void ACCEPT() const;
void ERROR() const;
void checkEOF__() const;
void clearin();
bool debug() const;
void pop__(size_t count = 1);
void push__(size_t nextState);
void popToken__();
void pushToken__(int token);
void reduce__(PI__ const &productionInfo);
void errorVerbose__();
size_t top__() const;
public:
void setDebug(bool mode);
};
inline bool ParserBase::debug() const
{
return d_debug__;
}
inline void ParserBase::setDebug(bool mode)
{
d_debug__ = mode;
}
inline void ParserBase::ABORT() const
{
throw PARSE_ABORT__;
}
inline void ParserBase::ACCEPT() const
{
throw PARSE_ACCEPT__;
}
inline void ParserBase::ERROR() const
{
throw UNEXPECTED_TOKEN__;
}
// As a convenience, when including ParserBase.h its symbols are available as
// symbols in the class Parser, too.
#define Parser ParserBase
#endif
- o
- The parser class parser.h itself. In the grammar
specification various member functions are used (e.g., done) and
prompt. These functions are so small that they can very well be
implemented inline. Note that done calls ACCEPT to terminate
further parsing. ACCEPT and related members (e.g., ABORT)
can be called from any member called by parse. As a consequence,
action blocks could contain mere function calls, rather than several
statements, thus minimizing the need to rerun bisonc++ when an
action is modified.
- Once bisonc++ had created parser.h it was
augmented with the required additional members, resulting in the following
final version:
#ifndef Parser_h_included
#define Parser_h_included
// $insert baseclass
#include "parserbase.h"
// $insert scanner.h
#include "../scanner/scanner.h"
#undef Parser
class Parser: public ParserBase
{
// $insert scannerobject
Scanner d_scanner;
public:
int parse();
private:
void error(char const *msg); // called on (syntax) errors
int lex(); // returns the next token from the
// lexical scanner.
void print(); // use, e.g., d_token, d_loc
void prompt();
void done();
// support functions for parse():
void executeAction(int ruleNr);
void errorRecovery();
int lookup(bool recovery);
void nextToken();
};
inline void Parser::error(char const *msg)
{
std::cerr << msg << ’\n’;
}
// $insert lex
inline int Parser::lex()
{
return d_scanner.yylex();
}
inline void Parser::print() // use d_token, d_loc
{}
inline void Parser::prompt()
{
std::cout << "? " << std::flush;
}
inline void Parser::done()
{
std::cout << "Done\n";
ACCEPT();
}
#endif
- o
- To complete the example, the following lexical scanner
specification was used:
%{
#define _SKIP_YYFLEXLEXER_
#include "scanner.ih"
#include "../parser/parser.h"
%}
%option yyclass="Scanner" outfile="yylex.cc"
%option c++ 8bit warn noyywrap yylineno
%%
[ \t]+ // skip white space
\n return Parser::EOLN;
[0-9]+ return Parser::NUMBER;
. return yytext[0];
%%
- o
- Since no member functions other than parse were
defined in separate source files, only parse includes
parser.ih. Since cerr is used in the grammar’s
actions, a using namespace std or comparable statement is
required. This was effectuated from parser.ih Here is the
implementation header declaring the standard namespace:
// include this file in the sources of the class Calculator,
// and add any includes etc. that are only needed for
// the compilation of these sources.
// include the file defining the parser class:
#include "parser.h"
// UN-comment if you don’t want to prefix std::
// for every symbol defined in the std. namespace:
using namespace std;
The implementation of the parsing member function parse is basically
irrelevant, since it should not be modified by the programmer. It was
written on the file parse.cc.
- o
- Finally, here is the program offering our simple
calculator:
#include "parser/parser.h"
int main()
{
Parser calculator;
return calculator.parse();
}
14. USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS¶
Note here that although the file
parserbase.h, defining the parser
class’ base-class, rather than the header file
parser.h defining
the parser class is included, the lexical scanner may simply return tokens of
the class
Calculator (e.g.,
Calculator::NUMBER rather than
CalculatorBase::NUMBER). In fact, using a simple
#define -
#undef pair generated by the
bisonc++ respectively at the end of
the base class header the file and just before the definition of the parser
class itself it is the possible to assume in the lexical scanner that all
symbols defined in the the parser’s base class are actually defined in
the parser class itself. It the should be noted that this feature can only be
used to access base class the
enum and types. The actual parser class
is not available by the time the the lexical scanner is defined, thus avoiding
circular class dependencies.
15. FILES¶
- o
- bisonc++base.h: skeleton of the parser’s base
class;
- o
- bisonc++.h: skeleton of the parser class;
- o
- bisonc++.ih: skeleton of the implementation
header;
- o
- bisonc++.cc: skeleton of the member
parse;
- o
- bisonc++polymorphic: skeleton of the declarations
used by %polymorphic;
- o
- bisonc++polymorphic.inline: skeleton of the inline
implementations of the members declared in bisonc++polymorphic.
16. SEE ALSO¶
bison(1),
bison++(1),
bison.info (using texinfo),
flex++(1)
Lakos, J. (2001)
Large Scale C++ Software Design, Addison Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986)
Compilers, Addison Wesley.
17. BUGS¶
Discontinued option:
- o
- --include-only
To avoid collisions with names defined by the parser’s (base) class, the
following identifiers should not be used as token nams:
- o
- Identifiers ending in two underscores;
- o
- Any of the following identifiers: ABORT, ACCEPT, ERROR,
clearin, debug, error, or setDebug.
When re-using files generated by
bisonc++ before version 2.0.0, minor
hand-modification might be necessary. The identifiers in the following list
(defined in the parser’s base class) now have two underscores affixed to
them:
LTYPE, STYPE and
Tokens. When using classes derived from
the generated parser class, the following identifiers are available in such
derived classes:
DEFAULT_RECOVERY_MODE, ErrorRecovery, Return,
UNEXPECTED_TOKEN, d_debug, d_loc, d_lsp, d_nErrors, d_nextToken,
d_state, d_token, d_val, and
d_vsp. When used in derived classes,
they too need two underscores affixed to them.
The member function
void lookup (< 1.00) was replaced by
int
lookup. When regenerating parsers created by early versions of
bisonc++ (versions before version 1.00),
lookup’s
prototype should be corrected by hand, since
bisonc++ will not by
itself rewrite the parser class’s header file.
The
Semantic parser, mentioned in
bison++(1) is not implemented in
bisonc++(1). According to
bison++(1) the semantic parser was not
available in
bison++ either. It is possible that the
Pure parser
is now available through the
--thread-safe option.
18. ABOUT bisonc++¶
Bisonc++ was based on
bison++, originally developed by Alain
Coetmeur (coetmeur@icdc.fr), R&D department (RDT), Informatique-CDC,
France, who based his work on
bison, GNU version 1.21.
Bisonc++ version 0.98 and beyond is a complete rewrite of an LALR-1
parser generator, closely following the construction process as described in
Aho, Sethi and Ullman’s (1986) book
Compilers (i.e., the
Dragon book). It the uses same grammar specification as
bison
and
bison++, and it uses practically the same options and directives as
bisonc++ versions earlier than 0.98. Variables, declarations and macros
that are obsolete were removed.
AUTHOR¶
Frank B. Brokken (f.b.brokken@rug.nl).