bisonc++input(7) | bisonc++ grammar file organization | bisonc++input(7) |
NAME¶
bisonc++input - Organization of bisonc++’s grammar file(s)
DESCRIPTION¶
Bisonc++ derives from bison++(1), originally derived from bison(1). Like these programs bisonc++ generates a parser for an LALR(1) grammar. Bisonc++ generates C++ code: an expandable C++ class.
Refer to bisonc++(1) for a general overview. This manual page covers the structure and organization of bisonc++’s grammar file(s).
Bisonc++’s grammar file has the following generic outline:
directives (see the next section)
%%
grammar rules
Grammar rules have the following generic form:
nonterminal:
production-rules
;
Production rules consist of zero or more sequences of terminal tokens, nonterminal tokens and/or action blocks. When multiple production rules are used they must be separated from each other by vertical bars. Action blocks are C++ compound statements.
This manual page contains the following sections:
- o
- DESCRIPTION: this section;
- o
- DIRECTIVES: bisonc++’s grammar-specification directives;
- o
- POLYMORPHIC SEMANTIC VALUES: how to use polymorphic semantic values in parsers generated by bisonc++;
- o
- DOLLAR NOTATIONS: available $-shorthand notations with single, union, and polymorphic semantic value types.
- o
- RESTRICTIONS ON TOKEN NAMES: name restrictions for user-defined symbols;
- o
- OBSOLETE SYMBOLS: symbols available to bison(1), but not to bisonc++;
- o
- USING SYMBOLIC TOKENS IN CLASSES OTHER THAN THE PARSER CLASS; how to refer to tokens defined in the grammar;
- o
- EXAMPLE: an example of using bisonc++;
- o
- SEE ALSO: references to other programs and documentation;
- o
- AUTHOR: at the end of this man-page.
UNDERSCORES¶
Starting with version 6.02.00 bisonc++ reserved identifiers no longer end in two underscore characters, but in one. This modification was necessary because according to the C++ standard identifiers having two or more consecutive underscore characters are reserved by the language. In practice this could require some minor modifications of existing source files using bisonc++’s facilities, most likely limited to changing Tokens__ into Tokens_ and changing Meta__ into Meta_.
The complete list of affected names is:
DIRECTIVES¶
Quite a few directives can be specified in the initial section of the grammar specification file. If command-line options for directives are available, then their specifications take precedence over the corresponding directives in the grammar file. Once class header or implementation header files exist directives affecting those files are ignored.
Directives accepting a `filename’ do not accept path names, i.e., they cannot contain directory separators (/); directives accepting a ’pathname’ may contain directory separators. A ’pathname’ using blank characters should be surrounded by double quotes.
Some directives may generate errors. This happens when their specifications conflict with the contents of files bisonc++ cannot modify (e.g., a parser class header file exists, but doesn’t define a namespace, but in a later run the a %namespace directive was provided).
To resolve such errors the offending directive could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the directive’s specification.
- o
- %baseclass-header filename
- Filename defines the name of the file to contain the parser’s base class. This class defines, e.g., the parser’s symbolic tokens. Defaults to the name of the parser class plus the suffix base.h. This directive is overruled by the --baseclass-header (-b) command-line option.
- It is an error if this directive is used and an already existing parser class header file does not contain #include "filename".
- o
- %baseclass-preinclude pathname
- Pathname defines the path to the file preincluded by the parser’s base-class header. See the description of the --baseclass-preinclude option for details about this directive. By default, bisonc++ surrounds header by double quotes. However, when header itself is surrounded by pointed brackets #include <header> is included.
- o
- %class-header filename
- Filename defines the name of the file to contain the parser class. Defaults to the name of the parser class plus the suffix .h This directive is overruled by the --class-header (-c) command-line option.
- It is an error if this directive is used and an already existing implementation header file does not contain #include "filename".
- o
- %class-name parser-class-name
- Declares the name of the parser class. It defines the name of the C++ class that is generated. If no %class-name is specified the default class name Parser is used.
- It is an error if this directive is used and an already existing parser-class header file does not define class `className’ and/or if an already existing implementation header file does not define members of the class `className’.
- o
- %debug
- Add debugging code to the generated parse and its support functions, which can show (on the standard output stream) the steps performed by the parsing function while it parses input streams. When this directive is specified then the parsing steps are shown by default. The setDebug members can be used to suppress outputting these parsing steps. #ifdef DEBUG macros are not used. Existing debugging code can be removed by rerunning bisonc++ without specifying the debug option or directive.
- o
- %default-actions(d)(off|quiet|warn|std)
- By default, bisonc++ adds a $$ = $1 action block to rules not having final action blocks, but not to empty production rules. This default behavior can also explicitly be configured using the default-actions std option or directive.
- Bisonc++ also supports alternate ways of handling rules not having final action blocks. When off is specified, bisonc++ does not add $$ = $1 action blocks; when polymorphic semantic values are used, then specifying
- - warn adds specialized action blocks, using the semantic types of the first elements of the production rules, while issuing a warning;
- - quiet adds these action blocks without issuing warnings.
- When either warn or quiet are specified the types of $$ and $1 must match. When bisonc++ detects a type mismatches it issues errors.
- o
- %error-verbose
- This directive can be specified to dump the parser’s state stack to the standard output stream when the parser encounters a syntactic error. The stack dump shows on separate lines a stack index followed by the state stored at the indicated stack element. The first stack element is the stack’s top element.
- o
- %expect number
- This directive specifies the exact number of shift/reduce and reduce/reduce conflicts for which no warnings are to be generated. Details of the conflicts are reported in the verbose output file (e.g., grammar.output). If the number of actually encountered conflicts deviates from `number’, then this directive is ignored.
- o
- %filenames filename
- Filename is a generic filename that is used for all header files generated by bisonc++. Options defining specific filenames are also available (which then, in turn, overrule the name specified by this directive). This directive is overruled by the --filenames (-f) command-line option.
- o
- %flex
- When provided, the scanner member returning the matched text is called as d_scanner.YYText(), and the scanner member returning the next lexical token is called as d_scanner.yylex(). This directive is only interpreted if the %scanner directive is also provided.
- o
- %implementation-header filename
- Filename defines the name of the file to contain the implementation header. It defaults to the name of the generated parser class plus the suffix .ih.
- The implementation header should contain all directives and declarations that are only used by the parser’s member functions. It is the only header file that is included by the source file containing parse’s implementation. User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the parser class in one header file.
- o
- %include pathname
- This directive is used to switch to pathname while processing a grammar specification. Unless pathname defines an absolute file-path, pathname is searched relative to the location of bisonc++’s main grammar specification file (i.e., the grammar file that was specified as bisonc++’s command-line option). This directive can be used to split long grammar specification files in shorter, meaningful units. After processing pathname processing continues beyond the %include pathname directive.
- o
- %left terminal ...
- Defines the names of symbolic terminal tokens that must be treated as left-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
- o
- %locationstruct struct-definition
- Defines the organization of the location-struct data type LTYPE_. This struct should be specified analogously to the way the parser’s stacktype is defined using %union (see below). The location struct is named LTYPE_. By default (if neither locationstruct nor LTYPE_ is specified) the standard location struct (see the next directive) is used:
- o
- %lsp-needed
- This directive results in bisonc++ generating a parser using the
standard location stack. This stack’s default type is:
Bisonc++ does not provide the elements of the LTYPE_ struct with values. Action blocks of production rules may refer to the location stack element associated with a production element using @ variables, like @1.timestamp, @3.text, @5. The rule’s location struct itself may be referred to as either d_loc_ or @@.
struct LTYPE_
{
int timestamp;
int first_line;
int first_column;
int last_line;
int last_column;
char *text;
};
- o
- %ltype typename
- Specifies a user-defined token location type. If %ltype is used, typename should be the name of an alternate (predefined) type (e.g., size_t). It should not be used if a %locationstruct specification is defined (see below). Within the parser class, this type is available as the type `LTYPE_’. All text on the line following %ltype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
- o
- %namespace namespace
- Define all of the code generated by bisonc++ in the namespace namespace. By default no namespace is defined. If this directive is used the implementation header is provided with a commented out using namespace declaration for the specified namespace. In addition, the parser and parser base class header files also use the specified namespace to define their include guard directives.
- It is an error if this directive is used and an already existing parser-class header file and/or implementation header file does not define namespace identifier.
- o
- %negative-dollar-indices
- Do not generate warnings when zero- or negative dollar-indices are used in the grammar’s action blocks. Zero or negative dollar-indices are commonly used to implement inherited attributes, and should normally be avoided. When used, they can be specified like $-1, or like $<type>-1, where type is empty; an STYPE_ tag; or a field-name. However, note that in combination with the %polymorphic directive (see below) only the $-i format can be used.
- o
- %no-lines
- By default #line preprocessor directives are inserted just before action statements in the file containing the parser’s parse function. These directives are suppressed by the %no-lines directive.
- o
- %nonassoc terminal ...
- Defines the names of symbolic terminal tokens that should be treated as non-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
- o
- %parsefun-source filename
- Filename defines the name of the file to contain the parser member function parse. Defaults to parse.cc. This directive is overruled by the --parse-source (-p) command-line option.
- o
- %polymorphic polymorphic-specification(s)
- Bison’s traditional way of handling multiple semantic values is to use a %union specification (see below). Although %union is supported by bisonc++, a polymorphic semantic value class is preferred due to its improved type safety.
- The %polymorphic directive defines a polymorphic semantic value class and can be used instead of a %union specification. Refer to section POLYMORPHIC SEMANTIC VALUES below or to bisonc++’s user manual for a detailed description of the specification, characteristics, and use of polymorphic semantic values.
- o
- %prec token
- Defines the precedence of a production rule. By default, production rules
have priorities that are equal to the priorities of their first terminal
tokens, or they receive the maximum possible priority if they don’t
contain terminal tokens. To change a production rule’s default
priority the %prec directive is used, which assigns the
directive’s token’s priority to the production rule’s
priority. A well known application of %prec is:
Here, the default priority and precedence of the `-’ token as the subtraction operator is overruled by the precedence and priority of the UMINUS token, which is commonly defined as
expression:
’-’ expression %prec UMINUS
{
...
}
(see below) following, e.g., the ’*’ and ’/’ operators.
%right UMINUS
- Refer to bisonc++’s user manual for a more elaborate coverage of the %prec directive.
- o
- %print-tokens
- The print directive provides an implementation of the Parser class’s print_ function displaying the current token value and the text matched by the lexical scanner as received by the generated parse function.
- o
- %prompt
- When adding debugging code (using the debug option or directive) the debug information is displayed continuously while the parser processes its input. When using the prompt directive the generated parser displays a prompt (a question mark) at each step of the parsing process. Caveat: when using this option the parser’s input cannot be provided at the parser’s standard input stream.
- o
- %required-tokens number
- Following a syntactic error, require at least number successfully processed tokens before another syntactic error can be reported. By default number is zero.
- o
- %right terminal ...
- Defines the names of symbolic terminal tokens that should be treated as right-associative. I.e., in case of a shift/reduce conflict, a shift is preferred over a reduction. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
- o
- %scanner pathname
- Use pathname as the path name to the file pre-included in the parser’s class header. See the description of the --scanner option for details about this directive. Similar to the convention adopted for this argument, pathname by default is surrounded by double quotes. However, when the argument is surrounded by pointed brackets #include <pathname> is included. This directive results in the definition of a composed Scanner d_scanner data member into the generated parser, and in the definition of a int lex() member, returning d_scanner.lex().
- By specifying the %flex directive the function d_scanner.yylex() is called. Any other function to call can be specified using the --scanner-token-function option (or %scanner-token-function directive).
- It is an error if this directive is used and an already existing parser class header file does not include `pathname’.
- o
- %scanner-class-name scannerClassName
- Defines the name of the scanner class, declared by the pathname header file that is specified at the scanner option or directive. By default the class name Scanner is used.
- It is an error if this directive is used and either the scanner directive was not provided, or the parser class interface in an already existing parser class header file does not declare a scanner class d_scanner object.
- o
- %scanner-matched-text-function function-call
- The scanner function returning the text that was matched by the lexical
scanner after its token function (see below) has returned. A complete
function call expression should be provided (including a scanner object,
if used). Example:
By specifying the %flex directive the function d_scanner.YYText() is called.
%scanner-matched-text-function myScanner.matchedText()
- If the function call contains white space scanner-token-function should be surrounded by double quotes.
- o
- %scanner-token-function function-call
- The scanner function returning the next token, called from the generated
parser’s lex function. A complete function call expression
should be provided (including a scanner object, if used). Example:
If the function call contains white space scanner-token-function should be surrounded by double quotes.
%scanner-token-function d_scanner.lex()
- It is an error if this directive is used and the scanner token function is not called from the code in an already existing implementation header.
- o
- %stack-expansion size Defines the number of elements to be added to the generated parser’s semantic value stack when it must be enlarged. By default 10 elements are added to the stack. This option/directive is interpreted only once, and only if size at least equals the default stack expansion size of 10.
- o
- %start nonterminal
- The nonterminal nonterminal should be used as the grammar’s start-symbol. If omitted, the first grammatical rule is used as the grammar’s starting rule. All syntactically correct sentences must be derivable from this starting rule.
- o
- %stype typename
- The type of the semantic value of nonterminal tokens. By default it is int. %stype, %union, and %polymorphic are mutually exclusive directives.
- Within the parser class, the semantic value type is available as the type `STYPE_’. All text on the line following %stype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
- o
- %tag-mismatches on|off
- This directive is only interpreted when polymorphic semantic values are used. When on is specified (which is used by default) the parse member of the generated parser dynamically checks that the tag that is used when calling a semantic value’s get member matches the actual tag of the semantic value.
- If a mismatch is observed, then the parsing function aborts after displaying a fatal error message. If this happens, and if the option/directive debug was specified when bisonc++ created the parser’s parsing function, then the program can be rerun, specifying parser.setDebug(Parser::ACTIONCASES) before calling the parsing function. As a result the case-entry numbers of the switch, defined in the parser’s executeAction member, are inserted into the standard output stream. The action case number reported just before the program displays the fatal error message tells you in which of the grammar’s action block the error was encountered.
- o
- %target-directory pathname
- Pathname defines the directory where generated files should be written. By default this is the directory where bisonc++ is called. This directive is overruled by the --target-directory command-line option.
- o
- %thread-safe
- Only used with polymorphic semantic values, and then only required when the parser is used in multiple threads: it ensures that each thread’s polymorphic code only accesses its own parser’s error counting variable.
- o
- %token terminal ...
- Defines the names of symbolic terminal tokens. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
- NOTE: Symbolic tokens are defined as enum-values in the parser’s base class. The names of symbolic tokens may not be equal to the names of the members and types defined by bisonc++ itself (see the next sections). This requirement is not enforced by bisonc++, but compilation errors may result if this requirement is violated.
- o
- %token-class classname
- Classname defines the name of the Tokens class that is defined when the %token-path directive or option (see below) is specified. If token-path isn’t specified then this directive is ignored. By default the class name Tokens is used.
- o
- %token-namespace namespace
- If token-path is specified (see below) then namespace defines the namespace of the Tokens class. By default no namespace is used.
- o
- %token-path pathname
- Pathname defines the path name of the file to contain the struct Tokens defining the enumeration Tokens_ containing the symbolic tokens of the generated grammar. If this option is specified the ParserBase class is derived from it, thus making the tokens available to the generated parser class. The name of the struct Tokens can be altered using the token-class directive or option. By default (if token_path is not specified) the tokens are defined as the enum Tokens_ in the ParserBase class. If pathname doesn’t exist it is created by bisonc++. If the file pathname already exists it is rewritten at each new run of bisonc++.
- o
- %type <type> nonterminal ...
- In combination with %polymorphic or %union: associate the semantic value of a nonterminal symbol with a polymorphic semantic value tag or union field defined by these directives.
- o
- %union union-definition
- Acts identically to the identically named bison and bison++ declaration. Bisonc++ generates a union, named STYPE_, as its semantic type.
- o
- %weak-tags
- This directive is ignored unless the %polymorphic directive was specified. It results in the declaration of enum Tag_ rather than enum class Tag_. When in doubt, don’t use this directive.
POLYMORPHIC SEMANTIC VALUES¶
Like bison(1), bisonc++ by default uses int semantic values, and also supports the %stype and %union directives for using single-type or traditional C-type unions as semantic values. These types of semantic values are covered in bisonc++’s manual.
In addition, the %polymorphic directive can be specified to generate a parser using `polymorphic’ semantic values. In this case semantic values are specified as pairs, consisting of tags (which are C++ identifiers), and C++ (pointer or value) type names. Tags and type names are separated by colons. Multiple tag and type name combinations are separated by semicolons, and an optional semicolon ends the final tag/type pair.
Here is an example, defining three semantic values: an int, a std::string and a std::vector<double>:
The identifier to the left of the colon is called the tag-identifier (or simply tag), and the type name to the right of the colon is called the type-name. Starting with bisonc++ version 4.12.00 the types no longer have to provide default constructors.
%polymorphic INT: int; STRING: std::string;
VECT: std::vector<double>
When polymorphic type-names refer to types that have not yet been declared by the parser’s base class header, then these types must be (directly or indirectly) declared in a header file whose location is specified using the %baseclass-preinclude directive.
%type directives are used to associate (non-)terminals with semantic value types. E.g., after:
the expr nonterminal returns int semantic values. In a rule like:
%polymorphic INT: int; TEXT: std::string
%type <INT> expr
symbols $$, $1, and $3 represent int values, and can be used that way in the C++ action block.
expr:
expr ’+’ expr
{
// Action block: C++ statements here.
}
Definitions and declarations
The %polymorphic directive adds the following definitions and declarations to the generated base class header and parser source file (if the %namespace directive was used then all declared/defined elements are placed inside the namespace that is specified by the %namespace directive):
- o
- All semantic value type identifiers are collected in a strongly typed
`Tag_’ enumeration. E.g.,
enum class Tag_
{
INT,
STRING,
VECT
};
- o
- An anonymous enum defining the symbolic constant sizeofTag_ equal to the number of tags in the Tag_ enumeration.
- o
- The namespace Meta_ contains almost all of the code implementing polymorphic values.
The namespace Meta_ contains, among other classes the class SType. The parser’s semantic value type STYPE_ is equal to Meta_::SType.
STYPE_ equals Meta_::SType
Meta_::SType provides the standard user interface for using polymorphic semantic data types. It declares the following public interface:
- o
- Constructors: Default, copy and move constructors. No data can be retrieved from SType objects that were constructed by SType’s default constructors, but they can accept values of defined polymorphic types, which may then be retrieved from those objects.
- o
- Operators: The standard overloaded assignment operators (copy and move assignment operators) are available.
- In addition the members
are defined for each of the polymorphic semantic value types. Up to version 6.03.00 these members were defined as member templates, but sometimes awkward compilation errors were encountered as with member templates Type must exactly match one of the defined polymorphic semantic types since Type is used to determine the appropriate Meta_::Tag_ value. As a consequence, if, e.g., a polymorphic type %polymorphic INT: int is defined then an assignment like $$ = true fails, since the inferred type is bool and no matching polymorphic type is available. Now that the assignment operators are defined as plain member functions this problem isn’t encountered anymore because standard type conversions may then be applied by the compiler. Note that ambiguities may still be encountered. If, e.g., polymorphic types are defined for int and char and an expression like $$ = 30U is used the compiler cannot tell whether $$ refers to the int or to the char semantic value. A standard (static) cast, or explicitly calling the assign member (see the next item) solves these kind of ambiguities.
SType &operator=(Type const &value) and
SType &operator=(Type &&tmp)
- When operator=(Type const &value) is used, the left-hand side SType object receives a copy of value; when operator=(Type &&tmp) is used, tmp is move-assigned to the left-hand side SType object;
- o
- void assign<tag>(Args &&...args) The tag template argument must be a Tag_ value. This member function constructs a semantic value of the type matching tag from the arguments that are passed to this member (zero arguments are OK if the type associated with tag supports default construction). The constructed value (not a copy of this value) is then stored in the STYPE_ object for which assign has been called.
- As a Meta_::Tag_ value must be specified when using assign the compiler can use the explicit tag to convert assign’s arguments to an SType object of the type matching the specified tag.
- The member assign can be used to store a specific polymorphic semantic value in an STYPE_ object. It differs from the set of operator=(Type) members in that assign accepts multiple arguments to construct the requested SType value from, whereas the operator= members only accept single arguments of defined polymorphic types.
- To initialize an STYPE_ object with a default STYPE_ value,
direct assignment can be used (e.g., d_lval_ = STYPE_{}). To assign
a semantic value to a production rule using assign the _$$
notation must be used, as $$ is interpreted as the polymorphic
value type that is associated with the production rule:
_$$.assign<Tag_::CHAR>(30U);
- o
- DataType &get<tag>(), and DataType const &get<tag>() const These members return references to the object’s semantic values. The tag must be a Tag_ value: its specification tells the compiler which semantic value type it must use.
- When the option/directive tag-mismatches on was specified then get, when called from the generated parse function, performs a run-time check to confirm that the specified tag corresponds to object’s actual Tag_ value. If a mismatch is observed, then the parsing function aborts with a fatal error message. When shorthand notations (like $$ and $1) are used in production rules’ action blocks, then bisonc++ can determine the correct tag, preventing the run-time check from failing.
- But once a fatal error is encountered, it can be difficult to
determine which action block generated the error. If this happens, then
consider regenerating the parser specifying the --debug option,
calling
parser.setDebug(Parser::ACTIONCASES)
before calling the parser’s parse function. - Following this the case-entry numbers of the switch which is defined in the parser’s executeAction member are inserted into the standard output stream just before the matching statements are executed. The action case number that’s reported just before the program reports the fatal error tells you in which of the grammar’s action block the error was encountered.
- o
- Tag_ tag() const The tag matching the semantic value’s polymorphic type is returned. The returned value is a valid Tag_ value when the SType object’s valid member returns true;
- By default, or after assigning a plain (default) STYPE_ object to an STYPE_ object (e.g., using a statement like $$ = STYPE_{}), valid returns false, and the tag member returns Meta_::sizeofTag_.
- o
- bool valid() const
- The value true is returned if the object contains a semantic value. Otherwise false is returned. Note that default STYPE_ values can be assigned to STYPE_ objects, but they do not represent valid semantic values. See also the previous description of the tag member.
DOLLAR NOTATIONS¶
Inside action blocks dollar-notations can be used to retrieve and assign values from/to the elements of production rules. Type directives are used to associates dollar-notations with semantic types.
When %stype is specified (and with the default int semantic value type) the following dollar-notations are available:
- o
- $$ =
- A value is assigned to the rule’s nonterminal’s semantic value. The right-hand side (rhs) of the assignment expression must be an expression of a type that can be assigned to the STYPE_ type.
- o
- $$(expr)
- Same as the previous dollar-notation: expr’s value is assigned to the rule’s nonterminal’s semantic value.
- o
- _$$
- This refers to the semantic value of the rule’s nonterminal.
- o
- $$
- Same as the previous item: this refers to the semantic value of the rule’s nonterminal.
- o
- $$.
- If STYPE_ is a class-type then this dollar-notation is shorthand for the member selector operator, applied to the rule’s nonterminal’s semantic value.
- o
- $$->
- If STYPE_ is a class-type then this dollar-notation is shorthand for the pointer to member operator, applied to the rule’s nonterminal’s semantic value.
- o
- _$1
- This refers to the current production rule’s first component’s semantic value.
- o
- $1
- Same as the previous dollar-notation: this refers to the current production rule’s first component’s semantic value.
- o
- $1.
- If STYPE_ is a class-type then this dollar-notation is shorthand for the member selector operator, applied to the current production rule’s first component’s semantic value.
- o
- $1->
- If STYPE_ is a class-type then this dollar-notation is shorthand for the pointer to member operator, applied to the current production rule’s first component’s semantic value.
- o
- _$-1
- This refers to the semantic value of a component in a production rule, listed immediately before the current rule’s nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
- o
- $-1
- Same as the previous item: this refers to the semantic value of a component in a production rule, listed immediately before the current rule’s nonterminal.
- o
- $-1.
- If STYPE_ is a class-type then this dollar-notation is shorthand for the member selector operator, applied to the semantic value of some production rule element, 1 element before the current rule’s nonterminal.
- o
- $-1->
- If STYPE_ is a class-type then this dollar-notation is shorthand for the pointer to member operator, applied to the semantic value of some production rule element, 1 element before the current rule’s nonterminal.
When %union is specified these dollar-notations are available:
- o
- $$ =
- A value is assigned to the rule’s nonterminal’s semantic value. If the rule’s nonterminal was associated with one of the union’s field types, then the matching union field receives the value of the assignment expression’s right-hand side. If no association was defined then the variable representing the nonterminal’s semantic value is a plain union (i.e., STYPE_) variable.
- o
- $$(expr)
- Expr’s value is assigned to the rule’s nonterminal’s plain union (i.e., STYPE_) type. Any association that may have been defined between the nonterminal and a union field is ignored.
- o
- _$$
- This refers to the rule’s nonterminal’s plain union (i.e., STYPE_) type. Any association that may have been defined between the nonterminal and a union field is ignored.
- o
- $$
- This refers to the rule’s nonterminal’s semantic value. If it was associated with one of the union’s types, then $$ refers to the associated union field. If no association was defined then $$ represents a plain union (i.e., STYPE_) type of variable.
- o
- $$.
- If the rule’s nonterminal’s semantic value was associated with one of the union’s types, then $$. is shorthand for the member selector operator, applied to the associated union field type. If no association was defined then $$. is shorthand for the field selector operator, applied to the nonterminal’s semantic value’s plain union (i.e., STYPE_) type.
- o
- $$->
- If the rule’s nonterminal’s semantic value was associated with one of the union’s types, then $$-> is shorthand for the pointer to member operator, applied to the associated union field type. If no association was defined then an error message is issued, as the pointer to member operator is not defined for plain union types.
- o
- _$1
- This refers to the current production rule’s first component’s plain union (STYPE_) value.
- o
- $1
- This shorthand refers to the semantic value of the production rule’s first element. If it was associated with one of the union’s types, then $1 refers to the associated union field. If no association was defined then $1 represents a plain union (i.e., STYPE_) type of variable.
- o
- $1.
- If the production rule’s first component’s semantic value was associated with one of the union’s types, then $1. is shorthand for the member selector operator, applied to the associated union field type. If no association was defined then $1. is shorthand for the field selector operator, applied to the first component’s semantic value’s plain union (i.e., STYPE_) type.
- o
- $1->
- If the production rule’s first component’s semantic value was associated with one of the union’s types, then $1-> is shorthand for the pointer to member operator, applied to the associated union field type. If no association was defined then an error message is issued, as the pointer to member operator is not defined for plain union types.
- o
- _$-1
- This refers to the plain union (STYPE_) value of a component in a production rule, listed immediately before the current rule’s nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
- o
- $-1
- Same: this refers to the plain union (STYPE_) value of a component in a production rule, listed immediately before the current rule’s nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
- o
- $-1.
- This is shorthand for the field selector operator applied to to the plain union (STYPE_) value of some production rule element, 1 element before the current rule’s nonterminal.
- o
- $-1->
- This shorthand refers to tho pointer to member operator applied to the plain union (STYPE_) value of some production rule element, 1 element before the current rule’s nonterminal. Its use results in an error message, as the pointer to member operator is not defined for plain union types.
- o
- $<field>-1
- This refers to the field union field of a component in a production rule, listed immediately before the current rule’s nonterminal. Note that the validity of the specified field for that particular component cannot be verified by bisonc++.
- o
- $<field>-1.
- This refers to the member selector operator of the field union field of a component in a production rule, listed immediately before the current rule’s nonterminal. Note that the validity of the specified field for that particular component cannot be verified by bisonc++.
- o
- $<field>-1-> This refers to the pointer to member operator of the field union field of a component in a production rule, listed immediately before the current rule’s nonterminal. Note that the validity of the specified field for that particular component cannot be verified by bisonc++.
When %polymorphic is specified these dollar-notations can be used:
- o
- $$ =
- A semantic value is assigned to the rule’s nonterminal’s semantic value. The right-hand side (rhs) of the assignment expression must be an expression of the type that is associated with $$. This assignment operation assumes that the type of the rhs-expression equals $$’s semantic value type. If the types don’t match the compiler issues a compilation error when compiling parse.cc. Casting the rhs to the correct value type is possible, but in that case the function call operator (see the next item) is preferred, as it does not require casting. If no semantic value type was associated with $$ then the assignment $$ = STYPE_{} can be used.
- o
- $$(expr)
- A value is assigned to the rule’s nonterminal’s semantic value. Expr must be of a type that can be statically cast to $$’s semantic value type. The required static_cast is generated by bisonc++ and doesn’t have to be specified for expr.
- o
- _$$
- This refers to the rule’s nonterminal’s semantic value, disregarding any polymorphic type that might have been associated with the rule’s nonterminal.
- o
- $$
- If no polymorphic type was associated with the rule’s nonterminal then this is shorthand for a reference to the rule’s plain STYPE_ value. If a polymorphic value type was associated with the rule’s nonterminal then this shorthand represents a reference to a value of that particular type.
- o
- $$.
- If no polymorphic type was associated with the rule’s nonterminal then this is shorthand for the member selector operator, applied to a reference to the rule’s nonterminal’s STYPE_ value. If a polymorphic value type was associated with the rule’s nonterminal then this shorthand represents the member selector operator, applied to a reference of that particular type.
- o
- $$->
- If no polymorphic type was associated with the rule’s nonterminal then this is shorthand for the pointer to member operator, applied to a reference to the rule’s nonterminal’s STYPE_ value. If a polymorphic value type was associated with the rule’s nonterminal then this shorthand represents the pointer to member operator, applied to a reference of that particular type.
- o
- _$1
- This refers to the current production rule’s first component’s generic STYPE_ value.
- o
- $1
- This shorthand refers to the semantic value of the production rule’s first element. If it was associated with a polymorphic type, then $1 refers to a value of that particular type. If no association was defined then $1 represents a generic STYPE_ value.
- o
- $1.
- If the production rule’s first component’s semantic value was associated with a polymorphic type, then $1. is shorthand for the member selector operator, applied to the value of the associated polymorphic type. If no association was defined then $1. is shorthand for the member selector operator, applied to the first component’s generic STYPE_ value.
- o
- $1->
- If the production rule’s first component’s semantic value was associated with a polymorphic type, then $1-> is shorthand for the pointer to member operator, applied to the value of the associated polymorphic type. If no association was defined then $1. is shorthand for the pointer to member operator, applied to the first component’s generic STYPE_ value.
- o
- _$-1
- This refers to the generic (STYPE_) value of a component in a production rule, listed immediately before the current rule’s nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
- o
- $-1
- Same: this refers to the generic (STYPE_) value of a component in a production rule, listed immediately before the current rule’s nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
- o
- $-1.
- This is shorthand for the member selector operator applied to to the generic STYPE_ value of some production rule element, 1 element before the current rule’s nonterminal.
- o
- $-1->
- This is shorthand for the pointer to member operator applied to to the generic STYPE_ value of some production rule element, 1 element before the current rule’s nonterminal.
- o
- $<tag>-1
- This shorthand represents a reference to the semantic value of the polymorphic type associated with tag of some production rule element, 1 element before the current rule’s nonterminal.
- If, when using the generated parser’s class parse function, the polymorphic type of that element turns out not to match the type that is associated with tag then a run-time fatal error results.
- If that happens, and the debug option/directive had been specified when bisonc++ was run, then rerun the program after specifying parser.setDebug(Parser::ACTIONCASES) to locate the parse function’s action block where the fatal error was encountered.
- o
- $<tag>-1.
- This shorthand represents the member selector operator, applied to the semantic value of the polymorphic type associated with tag of some production rule element, 1 element before the current rule’s nonterminal.
- If, when using the generated parser’s class parse function, the polymorphic type of that element turns out not to match the type that is associated with tag then a run-time fatal error results. The procedure suggested at the previous ($<tag>-1) item for solving such errors can be applied here as well.
- o
- $<tag>-1->
- This shorthand represents the pointer to member selector operator, applied to the semantic value of the polymorphic type associated with tag of some production rule element, 1 element before the current rule’s nonterminal.
- If, when using the generated parser’s class parse function, the polymorphic type of that element turns out not to match the type that is associated with tag then a run-time fatal error results. The procedure suggested at the previous ($<tag>-1) item for solving such errors can be applied here as well.
RESTRICTIONS ON TOKEN NAMES¶
To avoid collisions with names defined by the parser’s (base) class, the following identifiers should not be used as token names:
OBSOLETE SYMBOLS¶
All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.
USING SYMBOLIC TOKENS IN CLASSES OTHER THAN THE PARSER CLASS¶
The tokens defined in the grammar files processed by bisonc++ must usually also be available to the lexical scanner, returning those tokens when certain regular expressions are matched. E.g., a NUMBER token may be used in the grammar and the lexical scanner may be expected to return that token when the input matches the [0-9]+ regular expression. To avoid circular dependencies among classes the tokens can be written to a separate file using the token-path directive or option. The location and name of this file is specified by the token-path specification, and is generated from scratch at every run of bisonc++. By default the grammar’s symbolic tokens are made available in the class Tokens, and classes may refer to its tokens using the Tokens class scope (e.g., Tokens::NUMBER).
Before bisonc++ version 6.04.00 tokens were made available by including the file parserbase.h, using a simple #define suggesting that the tokens were in fact defined by the parser class itself. Using this scheme lexical scanner specifications returned, e.g., Parser::NUMBER when [0-9]+ was matched. Unless the token-path directive or option is used this approach is still available, but its use is deprecated.
EXAMPLE¶
Using a fairly traditional example, we construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.
First an associated grammar is constructed. When a syntactic error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex, and the %token-path directive, defining the class Tokens in he file ../scanner/tokens.h. In this example, the Scanner class is generated by flexc++(1). The details of constructing a class using flexc++ is beyond the scope of this man-page, but flexc++’s specification file is shown below.
Here is bisonc++’s input file:
%filenames parser %scanner ../scanner/scanner.h %token-path ../tokens/tokens.h
// lowest precedence %token NUMBER // integral numbers
EOLN // newline %left ’+’ ’-’ %left ’*’ ’/’ %right UNARY
// highest precedence %% expressions:
expressions evaluate |
prompt ; evaluate:
alternative prompt ; prompt:
{
prompt();
} ; alternative:
expression EOLN
{
cout << $1 << endl;
} |
’q’ done |
EOLN |
error EOLN ; done:
{
cout << "Done.\n";
ACCEPT();
} ; expression:
expression ’+’ expression
{
$$ = $1 + $3;
} |
expression ’-’ expression
{
$$ = $1 - $3;
} |
expression ’*’ expression
{
$$ = $1 * $3;
} |
expression ’/’ expression
{
$$ = $1 / $3;
} |
’-’ expression %prec UNARY
{
$$ = -$2;
} |
’+’ expression %prec UNARY
{
$$ = $2;
} |
’(’ expression ’)’
{
$$ = $2;
} |
NUMBER
{
$$ = stoul(d_scanner.matched());
} ;
Bisonc++ processes this file, generating the following files:
- o
- The parser’s base class, which should not be modified by the programmer:
-
// hdr/includes #ifndef ParserBase_h_included #define ParserBase_h_included #include <exception> #include <vector> #include <iostream> // $insert preincludes #include "../tokens/tokens.h" // hdr/baseclass namespace // anonymous {
struct PI_; } // $insert parserbase class ParserBase: public Tokens {
public:
enum DebugMode_
{
OFF = 0,
ON = 1 << 0,
ACTIONCASES = 1 << 1
}; // $insert tokens // $insert STYPE using STYPE_ = int;
private:
// state semval
using StatePair = std::pair<size_t, STYPE_>;
// token semval
using TokenPair = std::pair<int, STYPE_>;
int d_stackIdx = -1;
std::vector<StatePair> d_stateStack;
StatePair *d_vsp = 0; // points to the topmost value stack
size_t d_state = 0;
TokenPair d_next;
int d_token;
bool d_terminalToken = false;
bool d_recovery = false;
protected:
enum Return_
{
PARSE_ACCEPT_ = 0, // values used as parse()’s return values
PARSE_ABORT_ = 1
};
enum ErrorRecovery_
{
UNEXPECTED_TOKEN_,
};
bool d_actionCases_ = false; // set by options/directives
bool d_debug_ = true;
size_t d_requiredTokens_;
size_t d_nErrors_; // initialized by clearin()
size_t d_acceptedTokens_;
STYPE_ d_val_;
ParserBase();
void ABORT() const;
void ACCEPT() const;
void ERROR() const;
STYPE_ &vs_(int idx); // value stack element idx
int lookup_() const;
int savedToken_() const;
int token_() const;
size_t stackSize_() const;
size_t state_() const;
size_t top_() const;
void clearin_();
void errorVerbose_();
void lex_(int token);
void popToken_();
void pop_(size_t count = 1);
void pushToken_(int token);
void push_(size_t nextState);
void redoToken_();
bool recovery_() const;
void reduce_(int rule);
void shift_(int state);
void startRecovery_();
public:
void setDebug(bool mode);
void setDebug(DebugMode_ mode); }; // hdr/abort inline void ParserBase::ABORT() const {
throw PARSE_ABORT_; } // hdr/accept inline void ParserBase::ACCEPT() const {
throw PARSE_ACCEPT_; } // hdr/error inline void ParserBase::ERROR() const {
throw UNEXPECTED_TOKEN_; } // hdr/savedtoken inline int ParserBase::savedToken_() const {
return d_next.first; } // hdr/opbitand inline ParserBase::DebugMode_ operator&(ParserBase::DebugMode_ lhs,
ParserBase::DebugMode_ rhs) {
return static_cast<ParserBase::DebugMode_>(
static_cast<int>(lhs) & rhs); } // hdr/opbitor inline ParserBase::DebugMode_ operator|(ParserBase::DebugMode_ lhs,
ParserBase::DebugMode_ rhs) {
return static_cast<ParserBase::DebugMode_>(static_cast<int>(lhs) | rhs); }; // hdr/recovery inline bool ParserBase::recovery_() const {
return d_recovery; } // hdr/stacksize inline size_t ParserBase::stackSize_() const {
return d_stackIdx + 1; } // hdr/state inline size_t ParserBase::state_() const {
return d_state; } // hdr/token inline int ParserBase::token_() const {
return d_token; } // hdr/vs inline ParserBase::STYPE_ &ParserBase::vs_(int idx) {
return (d_vsp + idx)->second; } #endif - o
- The parser class parser.h itself. In the grammar specification various member functions are used (e.g., done) and prompt. These functions are so small that they can very well be implemented inline. Note that done calls ACCEPT to terminate further parsing. ACCEPT and related members (e.g., ABORT) can be called from any member called by parse. As a consequence, action blocks could contain mere function calls, rather than several statements, thus minimizing the need to rerun bisonc++ when an action is modified.
- Once bisonc++ has created parser.h additionally required members can be added to it (bisonc++ itself won’t modify parser.h anymore once it is created), resulting in the following final version:
-
// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:49:17 +0200 #ifndef Parser_h_included #define Parser_h_included // $insert baseclass #include "parserbase.h" // $insert scanner.h #include "../scanner/scanner.h" #undef Parser class Parser: public ParserBase {
// $insert scannerobject
Scanner d_scanner;
public:
int parse();
private:
void error(); // called on (syntax) errors
int lex(); // returns the next token from the
// lexical scanner.
void print(); // use, e.g., d_token, d_loc
void prompt();
void done();
// support functions for parse():
void executeAction_(int ruleNr);
void errorRecovery_();
void nextCycle_();
void nextToken_();
void print_();
void exceptionHandler(std::exception const &exc); }; inline void Parser::prompt() {
std::cout << "? " << std::flush; } inline void Parser::done() {
std::cout << "Done\n";
ACCEPT(); } #endif - o
- The file ../tokens/tokens.h is generated because of the %token-path directive. To avoid circular dependencies the tokens are made available in a separate file, allowing classes used by the parser to use the grammar’s tokens as well. Here is the file specifying the grammar’s tokens:
-
#ifndef INCLUDED_TOKENS_ #define INCLUDED_TOKENS_ struct Tokens {
// Symbolic tokens:
enum Tokens_
{
NUMBER = 257,
EOLN,
UNARY,
}; }; #endif
For the program no additional members had to be defined in the class Parser. The member function parse is defined by bisonc++ in the source file parse.cc, and it includes parser.ih.
As cerr is used in the grammar’s actions, a using namespace std or comparable directive is required. It is specified in parser.ih. Here is the implementation header declaring the standard namespace:
// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:51:26 +0200
// Include this file in the sources of the class Parser. // $insert class.h #include "parser.h" inline void Parser::error() {
std::cerr << "Syntax error\n"; } // $insert lex inline int Parser::lex() {
return d_scanner.lex(); } inline void Parser::print() {
print_(); // displays tokens if --print was specified } inline void Parser::exceptionHandler(std::exception const &exc) {
throw; // re-implement to handle exceptions thrown by actions }
// Add here includes that are only required for the compilation
// of Parser’s sources.
// UN-comment the next using-declaration if you want to use
// int Parser’s sources symbols from the namespace std without
// specifying std:: using namespace std;
In the current context the member function parse’s implementation is not very relevant (it should not be modified by the programmer anyway). It is not shown here, but is available as calculator/parser/parse.cc in the distribution’s demos/ directory after building the calculator using the there provided build script.
The lexical scanner is generated by flexc++(1) from the following specification file, using the command flexc++ lexer:
// see also regression/calculator/scanner %interactive %filenames scanner %% [ \t]+ // skip white space \n return Tokens::EOLN; [0-9]+ return Tokens::NUMBER; . return matched()[0]; %%
Finally, here is the program’s main function:
#include "parser/parser.h" int main() {
Parser calculator;
return calculator.parse(); }
SEE ALSO¶
bison(1), bison++(1), bisonc++(1), bisonc++api(3), bison.info (using texinfo), flexc++(1), https://fbb-git.gitlab.io/bisoncpp/
Lakos, J. (2001) Large Scale C++ Software Design, Addison
Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison
Wesley.
AUTHOR¶
Frank B. Brokken (f.b.brokken@rug.nl).
2005-2023 | bisonc++.6.05.00 |