NAME¶
flexc++ - Generate a C++ scanner class and parsing function
SYNOPSIS¶
flexc++ [options]
rules-file
DESCRIPTION¶
Flexc++(1) was designed after
flex(1) and
flex++(1). Like
these latter two programs
flexc++ generates code performing
pattern-matching on text, possibly executing actions when certain
regular
expressions are recognized.
Flexc++, contrary to
flex and
flex++, generates code that
is explicitly intended for use by
C++ programs. The well-known
flex(1) program generates
C source-code and
flex++(1)
merely offers a
C++-like shell around the
yylex function
generated by
flex(1) and hardly supports present-day ideas about
C++ software development.
Contrary to this,
flexc++ creates a
C++ class offering a
predefined member function
lex matching input against regular
expressions and possibly executing
C++ code once regular expressions
were matched. The code generated by
flexc++ is pure
C++,
allowing its users to apply all of the features offered by that language.
Below, the following sections may be consulted for specific details:
- o
- 1. QUICK START: a quick start overview about how to
use flexc++.
- o
- 2. QUICK START: FLEXC++ and BISONC++: a quick start
overview about how to use flexc++ in combination with
bisonc++(1)
- o
- 3. GENERATED FILES: files generated by
flexc++ and their purposes
- o
- 4. OPTIONS: options available for
flexc++
- o
- 5. INTERACTIVE SCANNERS: how to create an
interactive scanner
- o
- 6. SPECIFICATION FILE(S): the format and contents of
flexc++ input files, specifying the Scanner’s
characteristics
- o
- 6.1. FILE SWITCHING: how to switch to another input
specification file
- o
- 6.2. DIRECTIVES: directives that can be used in
input specification files
- o
- 6.3. MINI SCANNERS: how to declare
mini-scanners
- o
- 6.4. DEFINITIONS: how to define symbolic names for
regular expressions
- o
- 6.5. %% SEPARATOR: the separator between the input
specification sections
- o
- 6.6. REGULAR EXPRESSIONS: regular expressions
supported by flexc++
- o
- 6.7. SPECIFICATION EXAMPLE: an example of a
specification file
- o
- 7. THE CLASS INTERFACE: SCANNER.H: Constructors and
members of the scanner class generated by flexc++
- o
- 7.1. NAMING CONVENTION: symbols defined by
flexc++ in the scanner class.
- o
- 7.2 CONSTRUCTORS: constructors defined in the
scanner class.
- o
- 7.3 PUBLIC MEMBER FUNCTION: public member declared
in the scanner class.
- o
- 7.4. PRIVATE MEMBER FUNCTIONS: private members
declared in the scanner class.
- o
- 7.5. SCANNER CLASS HEADER EXAMPLE: an example of a
generated scanner class header
- o
- 8.1. THE SCANNER BASE CLASS: the scanner class is
derived from a base class. The base class is described in this
section
- o
- 8.2. PUBLIC ENUMS AND -TYPES: enums and types
declared by the base class
- o
- 8.3. PROTECTED ENUMS AND -TYPES: enumerations and
types used by the scanner and scanner base classes
- o
- 8.4. NO PUBLIC CONSTRUCTORS: the scanner base class
does not offer public constructors.
- o
- 8.5. PUBLIC MEMBER FUNCTIONS: several members
defined by the scanner base class have public access rights.
- o
- 8.6. PROTECTED CONSTRUCTORS: the base class can be
constructed by a derived class. Usually this is the scanner class
generated by flexc++.
- o
- 8.7. PROTECTED MEMBER FUNCTIONS: this section covers
the base class member functions that can only be used by scanner class or
scanner base class members
- o
- 8.8. PROTECTED DATA MEMBERS: this section covers the
base class data members that can only be used by scanner class or scanner
base class members
- o
- 8.9. FLEX++ TO FLEXC++ MEMBERS: a short overview of
frequently used flex(1) members that received different names in
flexc++.
- o
- 9.1 THE CLASS INPUT: the scanner’s job is
completely decoupled from the actual input stream. The class Input,
nested within the scanner base class handles the communication with the
input streams. The class Input, is described in this section.
- o
- 9.2. CONSTRUCTORS: the class Input can easily
be replaced by another class. The constructor-requirements are described
in this section.
- o
- 9.3. REQUIRED PUBLIC MEMBER FUNCTIONS: this section
covers the required public members of a self-made Input class
1. QUICK START¶
A bare-bones, no-frills scanner is generated as follows:
- o
- Create a file lexer defining the regular expressions
to recognize, and the tokens to return. Use token values exceeding 0xff if
plain ascii character values can also be used as token values. Example
(assume capitalized words are token-symbols defined in an enum defined by
the scanner class):
%%
[ \t\n]+ // skip white space chars.
[0-9]+ return NUMBER;
[[:alpha:]_][[:alpha:][:digit:]_]* return IDENTIFIER;
. return matched()[0];
- o
- Execute:
flexc++ lexer
This generates four files: scanner.h, scanner.ih, scannerbase.h, and
lex.cc
- o
- Edit scanner.h, add the enum defining the
token-symbols in (usually) the public section of the class Scanner.
E.g.,
class Scanner: public ScannerBase
{
public:
enum Tokens
{
IDENTIFIER = 0x100,
NUMBER
};
// ... (etc, as generated by flexc++)
- o
- Create a file defining int main, e.g.:
#include <iostream>
#include "scanner.h"
using namespace std;
int main()
{
Scanner scanner; // define a Scanner object
while (int token = scanner.lex()) // get all tokens
{
string const &text = scanner.matched();
switch (token)
{
case IDENTIFIER:
cout << "identifier: " << text << ’\n’;
break;
case NUMBER:
cout << "number: " << text << ’\n’;
break;
default:
cout << "char. token: `" << text << "’\n";
break;
}
}
}
- o
- Compile all .cc files:
g++ --std=c++0x *.cc
- o
- To `tokenize’ main.cc, execute:
a.out < main.cc
)
QUICK START: FLEXC++ and BISONC++¶
To interface
flexc++ to the
bisonc++(1) parser generator proceed
as follows:
- o
- Specify a grammar that can be processed by
bisonc++(1). Assuming that the scanner and parser are developed in,
respectively, the sub-directories scanner and parser, then a
simple grammar specification that can be used with the scanner developed
in the previous section is, e.g., write the file parser/grammar:
%scanner ../scanner/scanner.h
%scanner-token-function d_scanner.lex()
%token IDENTIFIER NUMBER CHAR
%%
startrule:
startrule tokenshow
|
tokenshow
;
tokenshow:
token
{
std::cout << "matched: " << d_scanner.matched() << ’\n’;
}
;
token:
IDENTIFIER
|
NUMBER
|
CHAR
;
- o
- Write a scanner specification file. E.g.,
%%
[ \t\n]+ // skip white space chars.
[0-9]+ return Parser::NUMBER;
[[:alpha:]_][[:alpha:][:digit:]_]* return Parser::IDENTIFIER;
. return Parser::CHAR;
This causes the scanner to return Parser tokens to the generated
parser.
- o
- Add the line
#include "../parser/Parserbase.h"
to the file scanner/scanner.ih
- o
- Write a simple main function in the file
main.cc. E.g.,
#include "parser/Parser.h"
int main(int argc, char **argv)
{
Parser parser;
parser.parse();
}
- o
- Generate a scanner in the scanner subdirectory:
flexc++ lexer
- o
- Generate a parser in the parser subdirectory:
bisonc++ grammar
- o
- Compile all sources:
g++ --std=c++0x *.cc */*.cc
- o
- Execute the program, providing it some source file to be
processed:
a.out < main.cc
3. GENERATED FILES¶
Flexc++ generates four files from a well-formed input file:
- o
- A file containing the implementation of the lex
member function and its support functions. By default this file is named
lex.cc.
- o
- A file containing the scanner’s class interface. By
default this file is named scanner.h. The scanner class itself is
generated once and is thereafter `owned’ by the programmer, who may
change it ad-lib. Newly added members (data members, function
members) will survive future flexc++ runs as flexc++ will
never rewrite an existing scanner class interface file, unless explicitly
ordered to do so. (see also scanner.h(3flexc++)).
- o
- A file containing the interface of the scanner
class’s base class. The scanner class is publicly
derived from this base class. It is used to minimize the size of the
scanner interface itself. The scanner base class is `owned’ by
flexc++ and should never be hand-modified. By default the
scanner’s base class is provided in the file scannerbase.h.
At each new flexc++ run this file is rewritten unless
flexc++ is explicitly ordered not to do so (see also
scannerbase.h(3flexc++)).
- o
- A file containing the implementation header. This
file should contain includes and declarations that are only required when
compiling the members of the scanner class. By default this file is named
scanner.ih. This file, like the file containing the scanner
class’s interface is never rewritten by flexc++ unless
flexc++ is explicitly ordered to do so (see also
implementationheader(3flexc++)).
4. OPTIONS¶
If available, single letter options are listed between parentheses following
their associated long-option variants. Single letter options require arguments
if their associated long options require arguments as well.
- o
- --baseclass-header=header (-b)
Use header as the pathname of the file containing the scanner
class’s base class. Defaults to the name of the scanner class plus
base.h
- o
- --baseclass-skeleton=skeleton (-C)
Use skeleton as the pathname of the file containing the skeleton of
the scanner class’s base class. Its filename defaults to
flexc++base.h.
- o
- --case-insensitive
Use this option to generate a scanner case insensitively matching
regular expressions. All regular expressions specified in
flexc++’s input file are interpreted case insensitively and
the resulting scanner object will case insensitively interpret its
input.
- When this option is specified the resulting scanner does
not distinguish between the following rules:
First // initial F is transformed to f
first
FIRST // all capitals are transformed to lower case chars
With a case-insensitive scanner only the first rule can be matched, and
flexc++ will issue warnings for the second and third rule about
rules that cannot be matched.
- Input processed by a case-insensitive scanner is also
handled case insensitively. The above mentioned First rule is
matched for all of the following input words: first First FIRST
firST.
- Although the matching process proceeds case insensitively,
the matched text (as returned by the scanner’s matched()
member) always contains the original, unmodified text. So, with the above
input matched() returns, respectively first, First, FIRST
and firST, while matching the rule First.
- o
- --class-header=header (-c)
Use header as the pathname of the file containing the scanner class.
Defaults to the name of the scanner class plus the suffix .h
- o
- --class-name=class
Use class (rather than Scanner) as the name of the scanner
class. Unless overridden by other options generated files will be given
the (transformed to lower case) class* name instead of
scanner*.
- o
- --class-skeleton=skeleton (-C)
Use skeleton as the pathname of the file containing the skeleton of
the scanner class. Its filename defaults to flexc++.h.
- o
- --construction (-K)
Write details about the lexical scanner to the file
`rules-file’.output. Details cover the used character ranges,
information about the regexes, the raw NFA states, and the final
DFAs.
- o
- --debug (-d)
Provide lex and its support functions with debugging code, showing
the actual parsing process on the standard output stream. When included,
the debugging output is active by default, but its activity may be
controlled using the setDebug(bool on-off) member. Note that
#ifdef DEBUG macros are not used anymore. By rerunning
flexc++ without the --debug option an equivalent scanner is
generated not containing the debugging code.
- o
- --filenames=genericName (-f)
Generic name of generated files (header files, not the lex-function
source file, see the --lex-source option for that). By default the
header file names will be equal to the name of the generated class.
- o
- --force-class-header
By default the generated class header is not overwritten once it has been
created. This option can be used to force the (re)writing of the file
containing the scanner’s class.
- o
- --force-implementation-header
By default the generated implementation header is not overwritten once it
has been created. This option can be used to force the (re)writing of the
implementation header file.
- o
- --help (-h)
Write basic usage information to the standard output stream and
terminate.
- o
- --implementation-header=header (-i)
Use header as the pathname of the file containing the implementation
header. Defaults to the name of the generated scanner class plus the
suffix .ih. The implementation header should contain all directives
and declarations only used by the implementations of the
scanner’s member functions. It is the only header file that is
included by the source file containing lex()’s implementation
. User defined implementation of other class members may use the same
convention, thus concentrating all directives and declarations that are
required for the compilation of other source files belonging to the
scanner class in one header file.
- o
- --implementation-skeleton=skeleton
(-I)
Use skeleton as the pathname of the file containing the skeleton of
the implementation header. Its filename defaults to
flexc++.ih.
- o
- --interactive
Generate an interactive scanner. An interactive scanner reads lines from the
input stream, and then returns the tokens encountered on that line. The
interactive scanner implemented by flexc++ only predefines the
Scanner(std::istream &in, std::ostream &out) constructor,
by default assuming that input is read from std::cin.
- o
- --lex-skeleton=skeleton (-L)
Use skeleton as the pathname of the file containing the lex()
member function’s skeleton. Its filename defaults to
flexc++.cc.
- o
- --lex-function-name=funname
Use funname rather than lex as the name of the member function
performing the lexical scanning.
- o
- --lex-source=source (-l)
Define source as the name of the source file containing the scanner
member function lex. Defaults to lex.cc.
- o
- --matched-rules (-’R’)
The generated scanner will write the numbers of matched rules to the
standard output. It is implied by the --debug option. Displaying
the matched rules can be suppressed by calling the generated
scanner’s member setDebug(false) (or, of course, by
re-generating the scanner without using specifying
--matched-rules).
- o
- --max-depth=depth (-m)
Set the maximum inclusion depth of the lexical scanner’s specification
files to depth. By default the maximum depth is set to 10. When
more than depth specification files are used the scanner throws a
Max stream stack size exceeded std::length_error
exception.
- o
- --namespace=namespace (-n)
Define the scanner base class, the paser class and the scanner implentations
in the namespace namespace. By default no namespace is defined. If
this options is used the implementation header will contain a commented
out using namespace declaration for the requested
namespace.
- o
- --no-baseclass-header
Do not write the file containing the scanner’s base class interface
even if it doesn’t yet exist. By default the file containing the
scanner’s base class interface is (re)written each time
flexc++ is called.
- o
- --no-lines
Do not put #line preprocessor directives in the file containing the
scanner’s lex function. By default #line directives
are entered at the beginning of the action statements in the generated
lex.cc file, allowing the compiler and debuggers to associate
errors with lines in your grammar specification file, rather than with the
source file containing the lex function itself.
- o
- --no-lex-source
Do not write the file containing the scanner’s predefined scanner
member functions, even if that file doesn’t yet exist. By default
the file containing the scanner’s lex member function is
(re)written each time flexc++ is called. This option should
normally be avoided, as this file contains parsing tables which are
altered whenever the grammar definition is modified.
- o
- --own-tokens (-T)
The tokens returned as well as the text matched when flexc++ reads
its input files(s) are shown when this option is used.
- This option does not result in the generated program
displaying returned tokens and matched text. If that is what you want, use
the --print-tokens option.
- o
- --print-tokens (-t)
The tokens returned as well as the text matched by the generated lex
function are displayed on the standard output stream, just before
returning the token to lex’s caller. Displaying tokens and
matched text is suppressed again when the lex.cc file is generated
without using this option. The function showing the tokens (
ScannerBase::print__) is called from Scanner::printTokens,
which is defined in-line in scanner.h. Calling
ScannerBase::print__, therefore, can also easily be controlled by
an option controlled by the program using the scanner object.
- This option does not show the tokens returned and
text matched by flexc++ itself when reading its input s. If
that is what you want, use the --own-tokens option.
- o
- --show-filenames (-F)
Write the names of the files that are generated to the standard error
stream.
- o
- --skeleton-directory=directory (-S)
Specifies the directory containing the skeleton files to use. This option
can be overridden by the specific skeleton-specifying options ( -B -C,
-H, and -I).
- o
- --target-directory=directory
Specifies the directory where generated files should be written. By default
this is the directory of flexc++’s input file. The
--target-directory option does not affect files that were
explicitly named (either as option or as directive).
- o
- --usage (-h)
Write basic usage information to the standard output stream and
terminate.
- o
- --verbose(-V)
The verbose option generates on the standard output stream various pieces of
additional information, not covered by the --construction and
--show-filenames options.
- o
- --version (-v)
Display flexc++’s version number and terminate.
5. INTERACTIVE SCANNERS¶
An interactive scanner is characterized by the fact that scanning is postponed
until an end-of-line character has been received, followed by reading all
information on the line, read so far.
Flexc++ supports the
--interactive option (or the equivalent
%interactive directive),
generating an interactive scanner. Here it is assumed that
Scanner is
the name of the scanner class generated by
flexc++.
The interactive scanner generated by
flexc++ has the following
characteristics:
- o
- The Scanner class is derived privately from
std::istringstream and (as usual) publicly from
ScannerBase.
- o
- The istringstream base class is constructed by its
default constructor.
- o
- The function lex’s default implementation is
removed from scanner.h and is implemented in the generated
lex.cc source file. It performs the following tasks:
- - If the token returned by the scanner is not equal to 0 it
is returned as then next token;
- - Otherwise the next line is retrieved from the input
stream passed to the Scanner’s constructor (by default
std::cin). If this fails, 0 is returned.
- - A ’\n’ character is appended to the
just read line, and the scanner’s std::istringstream base
class object is re-initialized with that line;
- - The member lex__ returns the next token. This
implementation allows code calling Scanner::lex() to conclude, as
usual, that the input is exhausted when lex returns 0.
Here is an example of how such a scanner could be used:
// scanner generated with: ’flexc++ --interactive lexer’ or with
// ’flexc++ lexer’ if lexer contains the %interactive directive
int main()
{
Scanner scanner; // by default: read from std::cin
while (true)
{
cout << "? "; // prompt at each line
while (true) // process all the line’s tokens
{
int token = scanner.lex();
if (token == ’\n’) // end of line: new prompt
break;
if (token == 0) // end of input: done
return 0;
// process other tokens
cout << scanner.matched() << ’\n’;
if (scanner.matched()[0] == ’q’)
return 0;
}
}
}
6. SPECIFICATION FILE(S)¶
Flexc++ expects an input file containing directives and the regular
expressions that should be recognized by objects of the scanner class
generated by
flexc++. In this man page the elements and organization of
flexc++’s input file is described.
Flexc++’s input file consists of two sections, separated from each
other by a line merely containing two consecutive percent characters:
%%
The section before this separator contains directives; the section following
this separator contains regular expressions and possibly actions to perform
when these regular expressions are matched by the object of the scanner class
generated by
flexc++.
White space is usually ignored, as is comment, which may be of the traditional
C form (i.e.,
/*, followed by (possibly multi-line) comment
text, followed by
*/, and it may be
C++ end-of-line comment: two
consecutive slashes (
//) start the comment, which continues up to the
next newline character.
6.1. FILE SWITCHING¶
Flexc++’s input file may be split into multiple files. This allows
for the definition of logically separate elements of the specifications in
different files. Include directives must be specified on a line of their own.
To switch to another specification file the following stanza is used:
//include file-location
The
//include directive starts in the line’s first column. File
locations can be absolute or relative to the location of the file containing
the
//include directive. White space characters following
//include and before the end of the line are ignored. The file
specification may be surrounded by double quotes, but these double quotes are
not required and are ignored (removed) if present. All remaining characters
are expected to define the name of the file where
flexc++’s rules
specifications continue. Once end of file of a sub-file has been reached,
processing continues at the line beyond the
//include directive of the
previously scanned file. The end-of-file of the file that was initially
specified when
flexc++ was called indicates the end of
flexc++’s rules specification.
6.2. DIRECTIVES¶
The first section of
flexc++’s input file consists of directives.
In addition it may associate regular expressions with symbolic names, allowing
you to use these identifiers in the rules section. Each directive is defined
on a line of its own. When available, directives are overridden by
flexc++ command line options.
Some directives require arguments, which are usually provided following
separating (but optional)
= characters. Arguments of directives, are
text, surrounded by double quotes (strings). If a string must itself contain a
double quote or a backslash, then precede these characters by a backslash. The
exceptions are the
%s and
%x directives, which are immediately
followed by name lists, consisting of identifiers separated by blanks. Here is
an example of the definition of a directive:
%class-name = "MyScanner"
The following directives are available:
- o
- %baseclass-header = "header"
Defines the pathname of the file containing the scanner class’s base
class interface. Corresponding command-line option:
--baseclass-header.
- o
- %case-insensitive
Generates a scanner case insensitively matching regular expressions.
All regular expressions specified in flexc++’s input file are
interpreted case insensitively and the resulting scanner object will case
insensitively interpret its input.
- Corresponding command-line option:
--cases-insensitive.
- When this directive is specified the resulting scanner does
not distinguish between the following rules:
First // initial F is transformed to f
first
FIRST // all capitals are transformed to lower case chars
With a case-insensitive scanner only the first rule can be matched, and
flexc++ will issue warnings for the second and third rule about
rules that cannot be matched.
- Input processed by a case-insensitive scanner is also
handled case insensitively. The above mentioned First rule is
matched for all of the following input words: first First FIRST
firST.
- Although the matching process proceeds case insensitively,
the matched text (as returned by the scanner’s matched()
member) always contains the original, unmodified text. So, with the above
input matched() returns, respectively first, First, FIRST
and firST, while matching the rule First.
- o
- %class-header = "header"
Defines the pathname of the file containing the scanner class’s
interface. Corresponding command-line option: --class-header.
- o
- %class-name = "class-name"
Declares the name of the scanner class generated by flexc++. This
directive corresponds to the %name directive used by
flex++(1). Contrary to flex++’s %name
declaration, class-name may appear anywhere in the first section of
the grammar specification file. It may be defined only once. If no
class-name is specified the default class name ( Scanner) is
used. Corresponding command-line option: --class-name.
- o
- %debug
Provide lex and its support functions with debugging code, showing
the actual parsing process on the standard output stream. When included,
the debugging output is active by default, but its activity may be
controlled using the setDebug(bool on-off) member. Note that
no #ifdef DEBUG macros are used in the generated code.
- o
- %implementation-header = "header"
Defines the pathname of the file to contain the implementation header.
Corresponding command-line option: --implementation-header.
- o
- %input-implementation =
"sourcefile"
Defines the pathname of the file containing the implementation of a
user-defined Input class.
- o
- %input-interface = "interface"
Defines the pathname of the file containing the interface of a user-defined
Input class. See input(3flexc++) for additional information
about user-defined Input classes.
- o
- %interactive
Generate an interactive scanner. An interactive scanner reads lines from the
input stream, and then returns the tokens encountered on that line. The
interactive scanner implemented by flexc++ only predefines the
Scanner(std::istream &in, std::ostream &out) constructor,
by default assuming that input is read from std::cin. See also the
INTERACTIVE SCANNER section in flexc++(1).
- o
- %lex-function-name = "funname"
Defines the name of the scanner class’s member to perform the lexical
scanning. If this directive is omitted the default name ( lex) is
used. Corresponding command-line option: --lex-function-name.
- o
- %lex-source = "source"
Defines the pathname of the file to contain the scanner member lex.
Corresponding command-line option: --lex-source.
- o
- %no-lines
Do not put #line preprocessor directives in the file containing the
scanner’s lex function. If omitted #line directives
are added to this file, unless overridden by the command line options
--lines and --no-lines.
- o
- %namespace = "namespace"
Define the scanner class in the namespace namespace. By default no
namespace is used. If this options is used the implementation header is
provided with a commented out using namespace declaration
for the requested namespace. This directive is overridden by the
--namespace command-line option.
- o
- %print-tokens
This option results in the tokens as well as the matched text to be
displayed on the standard output stream, just before returning the token
to lex’s caller. Displaying is suppressed again when the
lex.cc file is generated without using this directive. The function
showing the tokens ( ScannerBase::print__) is called from
Scanner::print(), which is defined in-line in scanner.h.
Calling ScannerBase::print__, therefore, can also easily be
controlled by an option controlled by the program using the scanner
object. This option does not show the tokens returned and text
matched by flexc++ itself when reading its input s. If that
is what you want, use the --own-tokens option.
- o
- %s namelist
The %s directive is followed by a list of one or more identifiers,
separated by blanks. Each identifier is the name of an inclusive mini
scanner.
- o
- %skeleton-directory = "path"
Use path rather than the default (e.g., /usr/share/flexc++)
path when looking for flexc++’s skeleton files. Corresponding
command-line option: --skeleton-directory.
- o
- %target-directory = "path"
Generate files in path rather than in flexc++’s input
file’s directory. The %target-directory option does not
affect files that were explicitly named (either as option or as
directive).
- o
- %x namelist
The %x directive is followed by a list of one or more identifiers,
separated by blanks. Each identifier is the name of an exclusive mini
scanner.
6.3. MINI SCANNERS¶
Mini scanners come in two flavors: inclusive mini scanners and exclusive mini
scanners. The rules that apply to an inclusive mini scanner are the mini
scanner’s own rules as well as the rules which apply to no mini scanners
in particular (i.e., the rules that apply to the default (or
INITIAL)
mini scanner). Exclusive mini scanners only use the rules that were defined
for them.
To define an inclusive mini scanner use
%s, followed by one or more
identifiers specifying the name(s) of the mini-scanner(s). To define an
exclusive mini scanner use
%x, followed by or more identifiers
specifying the name(s) of the mini-scanner(s). The following example defines
the names of two mini scanners:
string and
comment:
%x string comment
Following this, rules defined in the context of the
string mini scanner
(see below) will only be used when that mini scanner is active.
A
flexc++ input file may contain multiple
%s and
%x
specifications.
6.4. DEFINITIONS¶
Definitions are of the form
identifier regular-expression
Each definition must be entered on a line of its own. Definitions associate
identifiers with regular expressions, allowing the use of
${identifier}
as synonym for its regular expression in the rules section of the
flexc++ input file. One defined, the identifiers representing regular
expressions can also be used in subsequent definitions.
Example:
FIRST [A-Za-z_]
NAME {FIRST}[-A-Za-z0-9_]*
6.5. %% SEPARATOR¶
Following directives and definitions a line merely containing two consecutive
% characters is expected. Following this line the rules are defined.
Rules consist of regular expressions which should be recognized, possibly
followed by actions to be executed once a rule’s regular expression has
been matched.
6.6. REGULAR EXPRESSIONS¶
The regular expressions defined in
flexc++’s rules files are
matched against the information passed to the scanner’s
lex
function.
Regular expressions begin as the first non-blank character on a line. Comment is
interpreted as comment as long as it isn’t part of the regular
expresssion. To define a regular expression starting with two slashes (at
least) the first slash can be escaped or double quoted. (E.g.,
"//".* defines
C++ comment to end-of-line).
Regular expressions end at the first blank character (to add a blank character,
e.g., a space character, to a regular expression, prefix it by a backslash or
put it in a double-quoted string).
Actions may be associated with regular expressions. At a match the action that
is associated with the regular expression is executed, after which scanning
continues when the lexical scanning function (e.g.,
lex) is called
again. Actions are not required, and regular expressions can be defined
without any actions at all. If such action-less regular expressions are
matched then the match is performed silently, after which processing
continues.
Flexc++ tries to match as many characters of the input file as possible
(i.e., it uses `greedy matching’). Non-greedy matching is accomplished
by a combination of a scanner and parser and/or by using the `lookahead’
operator (
/).
The following regular expression `building blocks’ are available. More
complex regular expressions are created by combining them:
- x
- the character `x’
- .
- any character (byte) except newline
- [xyz]
- a character class; in this case, the pattern matches either
an `x’, a `y’, or a `z’
- [abj-oZ]
- a character class containing a range; matches an `a’,
a `b’, any letter from `j’ through `o’, or a
`Z’
- [^A-Z]
- a negated character class, i.e., any character except for
those in the class. In this example, any non-capital character.
- "[xyz]\"foo"
- text between double quotes matches the literal string:
[xyz]"foo.
- \X
- if X is `a’, `b’, `f’, `n’,
`r’, `t’, or `v’, then the ANSI-C interpretation of
`\x’ is matched. Otherwise, a literal `X’ is matched (this is
used to escape operators such as `*’).
- \0
- a NUL character (ASCII code 0).
- \123
- the character with octal value 123.
- \x2a
- the character with hexadecimal value 2a.
- (r)
- the regular expression `r’; parentheses are used to
override precedence (see below)
- {name}
- the expansion of the `name’ definition.
- r*
- zero or more regular expressions `r’. This also
matches the empty string.
- r+
- one or more regular expressions `r’.
- r?
- zero or one regular expression `r’. This also matches
the empty string.
- rs
- the regular expression `r’ followed by the regular
expression `s’; called concatenation
- r{m, n}
- regular expression `r’ at least m, but at most n
times ( 1 <= m <= n).
- r{m,}
- regular expression `r’ m or more times (1 <=
m).
- r{m}
- regular expression `r’ exactly m times (1 <=
m).
- r|s
- either regular expression `r’ or regular expression
`s’
- r/s
- regular expression `r’ if it is followed by regular
expression `s’. The text matched by `s’ is included when
determining whether this rule results in the longest match, but `s’
is then returned to the input before the rule’s action (if defined)
is executed.
- ^r
- a regular expression `r’ at the beginning of a line
or file.
- r$
- a regular expression `r’, occurring at the end of a
line. This pattern is identical to `r/\n’.
- <s>r
- a regular exprression `r’ in start condition
`s’
- <s1,s2,s3>r
- a regular exprression `r’ in start conditions s1, s2,
or s3.
- <*>r
- a regular exprression `r’ in all start
conditions.
- <<EOF>>
- an end-of-file.
- <s1,s2><<EOF>>
- an end-of-file when in start conditions s1 or s2
Inside a character class all regular expression operators lose their special
meanings, except for the escape character (
\) and the character class
operators
-,
]], and, at the beginning of the class,
^.
To add a closing bracket to a character class use
[]. To add a closing
bracket to a negated character class use
[^]. Once a character class
has started, all subsequent character (ranges) are added to the set, until the
final closing bracket (
]) has been reached.
The regular expressions listed above are grouped according to precedence, from
highest precedence at the top to lowest at the bottom. From lowest to highest
precedence, the operators are:
- o
- |: the or-operator at the end of a line (instead of
an action) indicates that this expression’s action is identical to
the action of the next rule.
- o
- /: the look-ahead operator;
- o
- |: the or-operator withn a regular expression;
- o
- CHAR: individual elements of the regular expression:
characters, strings, quoted characters, escaped characters, character sets
etc. are all considered CHAR elements. Multiple CHAR
elements can be combined by enclosing them in parentheses (e.g.,
(abc)+ indicates sequences of abc characters, like
abcabcabc);
- o
- *, ?, +, {: multipliers:
?: zero or one occurrence of the previous element;
+: one or more repetitions of the previous element;
*: zero or more repetitions of the previous element;
{...}: interval specification: a specified number of repetitions of
the previous element (see above for specific forms of the interval
specification)
- o
- {+}, {-}: set operators ({+} computing the
union of two sets, {-} computing the difference of the left-hand
side set minus the elements in the right-hand side set);
The lex standard defines concatenation as having a higher precedence than the
interval expression. This is different from many other regular expression
engines, and
flexc++ follows these latter engines, giving all
`multiplication operators’ equal priority.
Name expansion has the same precedence as grouping (using parentheses to
influence the precedence of the other operators in the regular expression).
Since the name expansion is treated as a group in
flexc++, it is not
allowed to use the lookahead operator in a name definition (a named pattern,
defined in the definition section).
Character classes can also contain character class expressions. These are
expressions enclosed inside
[: and
:] delimiters (which
themselves must appear between the
[ and
] of the character
class. Other elements may occur inside the character class as well). The
character class expressions are:
[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]
Character class expressions designate a set of characters equivalent to the
corresponding standard
C isXXX function. For example,
[:alnum:]
designates those characters for which
isalnum returns true - i.e., any
alphabetic or numeric character. For example, the following character classes
are all equivalent:
[[:alnum:]]
[[:alpha:][:digit:]]
[[:alpha:][0-9]]
[a-zA-Z0-9]
A negated character class such as the example
[^A-Z] above will match a
newline unless
\n (or an equivalent escape sequence) is one of the
characters explicitly present in the negated character class (e.g.,
[^A-Z\n]). This differs from the way many other regular expression
tools treat negated character classes, but unfortunately the inconsistency is
historically entrenched. Matching newlines means that a pattern like
[^"]* can match the entire input unless there’s another
quote in the input.
Flexc++ allows negation of character class expressions by prepending
^ to the POSIX character class name.
[:^alnum:] [:^alpha:] [:^blank:]
[:^cntrl:] [:^digit:] [:^graph:]
[:^lower:] [:^print:] [:^punct:]
[:^space:] [:^upper:] [:^xdigit:]
The
{-} operator computes the difference of two character classes. For
example,
[a-c]{-}[b-z] represents all the characters in the class
[a-c] that are not in the class
[b-z] (which in this case, is
just the single character
a). The
{-} operator is left
associative, so
[abc]{-}[b]{-}[c] is the same as
[a].
The
{+} operator computes the union of two character classes. For
example,
[a-z]{+}[0-9] is the same as
[a-z0-9]. This operator is
useful when preceded by the result of a difference operation, as in,
[[:alpha:]]{-}[[:lower:]]{+}[q], which is equivalent to
[A-Zq]
in the
C locale.
A rule can have at most one instance of trailing context (the
/ operator
or the
$ operator). The start condition,
^, and
<<EOF>> patterns can only occur at the beginning of a
pattern, and cannot be surrounded by parentheses. The characters
^ and
$ only have their special properties at, respectively, the beginning
and end of regular expressions. In all other cases they are treated as a
normal characters.
6.7. SPECIFICATION EXAMPLE¶
%option debug
%x comment
NAME [[:alpha:]][_[:alnum:]]*
%%
"//".* // ignore
"/*" begin(comment);
<comment>.|\n // ignore
<comment>"*/" begin(INITIAL);
^a return 1;
a return 2;
a$ return 3;
{NAME} return 4;
.|\n // ignore
)
7. THE CLASS INTERFACE: SCANNER.H¶
By default,
flexc++ generates a file
scanner.h containing the
initial interface of the scanner class performing the lexical scan according
to the specifications given in
flexc++’s input file. The name of
the file that is generated can easily be changed using
flexc++’s
--class-header option. In this man-page we’ll stick to using the
default name.
The file
scanner.h is generated only once, unless an explicit request is
made to rewrite it (using
flexc++’s
--force-class-header
option).
The provided interface is very light-weight, primarily offering a link to the
scanner’s base class (see
scannerbase.h(3flexc++).
Many of the facilities offered by the scanner class are inherited from
the ScannerBase base class, and the reader should
consult scannerbase.h(3flexc++) for an
overview of additional facilities offered by the Scanner
class.
7.1. NAMING CONVENTION¶
All symbols that are required by the generated scanner class end in two
consecutive underscore characters (e.g.,
executeAction__). These names
should not be redefined. As they are part of the
Scanner and
ScannerBase class their scope is immediately clear and confusion with
identically named identifiers elsewhere is unlikely.
Some member functions do not use the underscore convention. These are the
scanner class’s constructors, or names that are similar or equal to
names that have historically been used (e.g.,
length). Also, some
functions are offered offering hooks into the implementation (like
preCode). The latter category of function also have names that
don’t end in underscores.
7.2 CONSTRUCTORS¶
- o
- explicit Scanner(std::istream &in = std::cin,
std::ostream &out = std::cout) This constructor by default
reads information from the standard input stream and writes to the
standard output stream. When the Scanner object goes out of scope
the input and output files are closed.
- With interactive scanners input stream switching or
stacking is not available; switching output streams, however, is.
- o
- Scanner(std::string const &infile, std::string const
&outfile) This constructor opens the input and output streams
whose file names were specified. When the Scanner object goes out
of scope the input and output files are closed. If outfile ==
"-" then the standard output stream is used as the
scanner’s output medium; if outfile == "" then the
standard error stream is used as the scanner’s output medium.
- This constructor is not available with interactive
scanners.
7.3. PUBLIC MEMBER FUNCTIONS¶
- o
- int lex() The lex function performs the
lexical scanning of the input file specified at construction time (but
also see scannerbase.h(3flexc++) for information about intermediate
stream-switching facilities). It returns an int representing the
token associated with the matched regular expression. The returned
value 0 indicates end-of-file. Considering its default implementation, it
could be redefined by the user. Lex’s default implementation
merely calls lex__:
inline int Scanner::lex()
{
return lex__();
}
- Caveat: with interactive scanners the lex
function is defined in the generated lex.cc file. Once
flexc++ has generated the scanner class header file this scanner
class header file isn’t automatically rewritten by flexc++.
If, at some later stage, an interactive scanner must be generated, then
the inline lex implementation must be removed `by hand’ from
the scanner class header file. Likewise, a lex member
implementation (like the above) must be provided `by hand’ if a
non-interactive scanner is required after first having generated files
implementing an interactive scanner.
7.4. PRIVATE MEMBER FUNCTIONS¶
- o
- int lex__() This function is used internally by
lex and should not otherwise be used.
- o
- int executeAction__() This function is used
internally by lex and should not otherwise be used.
- o
- void preCode() By default this function has an
empty, inline implementation in scanner.h. It can safely be
replaced by a user-defined implementation. This function is called by
lex__, just before it starts to match input characters against its
rules: preCode is called by lex__ when lex__ is
called and also after having executed the actions of a rule which did not
execute a return statement. The outline of lex__’s
implementation looks like this:
int Scanner::lex__()
{
...
preCode();
while (true)
{
size_t ch = get__(); // fetch next char
...
switch (actionType__(range)) // determine the action
{
... maybe return
}
... no return, continue scanning
preCode();
} // while
}
- o
- void print() When the --print-tokens or
%print-tokens directive is used this function is called to display,
on the standard output stream, the tokens returned and text matched by the
scanner generated by flexc++.
- Displaying is suppressed when the lex.cc file is
(re)generated without using this directive. The function actually showing
the tokens ( ScannerBase::print__) is called from print,
which is defined in-line in scanner.h. Calling
ScannerBase::print__, therefore, can also easily be controlled by
an option controlled by the program using the scanner object.
#ifndef Scanner_H_INCLUDED_
#define Scanner_H_INCLUDED_
// $insert baseclass_h
#include "scannerbase.h"
class Scanner: public ScannerBase
{
public:
explicit Scanner(std::istream &in = std::cin,
std::ostream &out = std::cout);
Scanner(std::string const &infile, std::string const &outfile);
// $insert lexFunctionDecl
int lex();
private:
int lex__();
int executeAction__(size_t ruleNr);
void preCode(); // re-implement this function for code to be
// exec’ed before the pattern matching starts
};
inline void Scanner::preCode()
{
// optionally replace by your own code
}
inline Scanner::Scanner(std::istream &in, std::ostream &out)
:
ScannerBase(in, out)
{}
inline Scanner::Scanner(std::string const &infile,
std::string const &outfile)
:
ScannerBase(infile, outfile)
{}
// $insert inlineLexFunction
inline int Scanner::lex()
{
return lex__();
}
#endif // Scanner_H_INCLUDED_
8.1. THE SCANNER BASE CLASS()¶
By default,
flexc++ generates a file
scannerbase.h containing the
interface of the base class of the scanner class also generated by
flexc++. The name of the file that is generated can easily be changed
using
flexc++’s
--baseclass-header option. In this
man-page we use the default name.
The file
scanner.h is generated at each new
flexc++ run. It
contains no user-serviceable or extensible parts. Rewriting can be prevented
by specifying
flexc++’s
--no-baseclass-header option"
.
8.2. PUBLIC ENUMS AND -TYPES¶
- o
- enum class StartCondition__ This strongly typed
enumeration defines the names of the start conditions (i.e., mini
scanners). It at least contains INITIAL, but when the %s or
%x directives were used it also contains the identifiers of the
mini scanners declared by these directives. Since StartCondition__
is a strongly typed enum its values must be preceded by its enum name.
E.g.,
begin(StartCondition__::INITIAL);
8.3. PROTECTED ENUMS AND -TYPES¶
- o
- enum class ActionType__ This strongly typed
enumeration is for internal use only.
- o
- enum Leave__ This enumeration is for internal use
only.
8.4. NO PUBLIC CONSTRUCTORS¶
There are no public constructors.
ScannerBase is a base class for the
Scanner class generated by
flexc++.
ScannerBase only
offers protected constructors.
8.5. PUBLIC MEMBER FUNCTIONS¶
- o
- bool debug() const returns true if
--debug or %debug was specified, otherwise
false.
- o
- std::string const &filename() const returns the
name of the file currently processed by the scanner object.
- o
- size_t length() const returns the length of the text
that was matched by lex. With flex++ this function was
called leng.
- o
- size_t lineNr() const returns the line number of the
currently scanned line. This function is always available (note:
flex++ only offered a similar function (called lineno) after
using the %lineno option).
- o
- std::string const &matched() const returns the
text matched by lex (note: flex++ offers a similar member
called YYText).
- o
- void setDebug(bool onOff) Switches on/off debugging
output by providing the argument true or false. Switching on
debugging output only has visible effects if the debug option was
specified.
- o
- void switchIstream(std::string const
&infilename) The currently processed input stream is closed, and
processing continues at the stream whose name is specified as the
function’s argument. This is not a stack-operation: after
processing infilename processing does not return to the original
stream.
- This member is not available with interactive
scanners.
- o
- void switchOstream(std::ostream &out) The
currently processed output stream is closed, and new output is written to
out.
- o
- void switchOstream(std::string const
&outfilename)
- The current output stream is closed, and output is written
to outfilename. If this file already exists, it is rewritten.
- o
- void switchStreams(std::istream &in,
std::ostream &out = std::cout) The currently processed input
and output streams are closed, and processing continues at in,
writing output to out. This is not a stack-operation: after
processing in processing does not return to the original
stream.
- This member is not available with interactive
scanners.
- o
- void switchStreams(std::string const
&infilename, std::string const &outfilename) The
currently processed input and output streams are closed, and processing
continues at the stream whose name is specified as the function’s
first argument, writing output to the file whose name is specified as the
function’s second argument. This latter file is rewritten. This is
not a stack-operation: after processing infilename
processing does not return to the original stream. If outfilename ==
"-" then the standard output stream is used as the
scanner’s output medium; if outfilename == "" then
the standard error stream is used as the scanner’s output
medium.
- If outfilename == "-" then the standard
output stream is used as the scanner’s output medium; if
outfilename == "" then the standard error stream is used
as the scanner’s output medium.
- This member is not available with interactive
scanners.
8.6. PROTECTED CONSTRUCTORS¶
- o
- ScannerBase(std::string const &infilename,
std::string const &outfilename) The scanner object opens and
reads infilename and opens (rewrites) and writes
outfilename. It is called from the corresponding Scanner
constructor.
- This member is not available for interactive
scanners.
- o
- ScannerBase(std::istream &in, std::ostream
&out) The in and out parameters are, respectively,
the derived class constructor’s input stream and output streams.
8.7. PROTECTED MEMBER FUNCTIONS¶
All member functions ending in two underscore characters are for internal use
only and should not be called by user-defined members of the
Scanner
class.
The following members, however, can safely be called by members of the generated
Scanner class:
- o
- void accept(size_t nChars = 0) accept(n)
returns all but the first `nChars’ characters of the current token
back to the input stream, where they will be rescanned when the scanner
looks for the next match. So, it matches `nChars’ of the characters
in the input buffer, rescanning the rest. This function effectively sets
length’s return value to nChars (note: with
flex++ this function was called less);
- o
- void begin(StartCondition__ startCondition) activate
the regular expression rules associated with StartCondition__
startCondition. As this enumeration is a strongly typed enum the
StartCondition__ scope must be specified as well. E.g.,
begin(StartCondition__::INITIAL);
- o
- void echo() const The currently matched text (i.e.,
the text returned by the member matched) is inserted into the
scanner object’s output stream;
- o
- void leave(int retValue) actions defined in the
lexical scanner specification file may or may not return. This frequently
results in complicated or overlong compound statements, blurring the
readability of the specification file. By encapsulating the actions in a
member function readability is enhanced. However, frequently a compound
statement is still required, as in:
regex-to-match {
if (int ret = memberFunction())
return ret;
}
The member leave removes the need for constructions like the above.
The member leave can be called from within member functions
encapsulating actions performed when a regular expression has been
matched. It ends lex, returning retValue to its caller. The
above rule can now be written like this:
regex-to-match memberFunction();
and memberFunction could be implemented as follows:
void memberFunction()
{
if (someCondition())
{ // any action, e.g.,
// switch mini-scanner
begin(StartCondition__::INITIAL);
leave(Parser::TOKENVALUE); // lex returns TOKENVALUE
// this point is never reached
}
pushStream(d_matched); // switch to the next stream
// lex continues
}
The member leave should only (indirectly) be called (usually nested)
from actions defined in the scanner’s specification s;
calling leave outside of this context results in undefined
behavior.
- o
- void more() the matched text is kept and will be
prefixed to the text that is matched at the next lexical scan;
- o
- std::ostream &out() returns a reference to the
scanner’s output stream;
- o
- bool popStream() closes the currently processed
input stream and continues to process the most recently stacked input
stream (removing it from the stack of streams). If this switch was
successfully performed true is returned, otherwise (e.g., when the
stream stack is empty) false is returned;
- o
- void push(size_t ch) character ch is pushed
back onto the input stream. I.e., it will be the character that is
retrieved at the next attempt to obtain a character from the input
stream;
- o
- void push(std::string const &txt) the characters
in the string txt are pushed back onto the input stream. I.e., they
will be the characters that are retrieved at the next attempt to obtain
characters from the input stream. The characters in txt are
retrieved from the first character to the last. So if txt ==
"hello" then the ’h’ will be the
character that’s retrieved next, followed by ’e’,
etc, until ’o’;
- o
- void pushStream(std::istream &curStream) this
function pushes curStream on the stream stack;
- This member is not available with interactive
scanners.
- o
- void pushStream(std::string const &curName)
same, but the stream curName is opened first, and the resulting
istream is pushed on the stream stack;
- This member is not available with interactive
scanners.
- o
- void redo(size_t nChars = 0) this member acts like
accept but its argument counts backward from the end of the matched
text. All but these nChars characters are kept and the last
nChar characters are rescanned. This function effectively reduces
length’s return value by nChars;
- o
- void setFilename(std::string const &name) this
function sets the name of the stream returned by filename to
name;
- o
- void setMatched(std::string const &text) this
function stores text in the matched text buffer. Following a call
to this function matched returns text.
- o
- StartCondition__ startCondition() const returns the
currently active start condition (mini scanner);
- o
- std::vector<StreamStruct> const &streamStack()
const returns the vector of currently stacked input streams. The
vector’s size equals 0 unless pushStream has been used. So
flexc++’s input file is not counted here. The
StreamStruct is a struct only having one accessible member:
std::string const &pushedName, which holds the name of the
pushed stream. The vector is used internally as a stack: the stream that
was first pushed is found at index position 0, the most recently pushed
stream is found at streamStack().back().
- This member is not available with interactive
scanners.
8.8. PROTECTED DATA MEMBERS¶
All protected data members are for internal use only, allowing
lex__ to
access them. All of them end in two underscore characters.
8.9. FLEX++ TO FLEXC++ MEMBERS¶
Flex++ (old) |
|
Flexc++ (new) |
lineno() |
|
lineNr() |
|
YYText() |
|
matched() |
less() |
|
accept() |
|
|
|
Flexc++ generates a file
scannerbase.h defining the scanner
class’s base class, by default named
ScannerBase (which is the
name used in this man-page). The base class
ScannerBase contains a
nested class
Input whose interface looks like this:
class Input
{
public:
Input();
Input(std::istream *iStream, size_t lineNr = 1);
size_t get();
size_t lineNr() const;
void reRead(size_t ch);
void reRead(std::string const &str, size_t fmIdx);
void close();
};
The members of this class are all required and offer a level in between the
operations of
ScannerBase and
flexc++’s actual input file
that’s being processed.
By default,
flexc++ provides an implementation for all of
Input’s required members. Therefore, in most situations this
man-page can safely be ignored.
However, users may define and extend their own
Input class and provide
flexc++’s base class with that
Input class. To do so
flexc++’s rules file must contain the following two directives:
%input-implementation = "sourcefile"
%input-interface = "interface"
Here,
interface is the name of a file containing the class
Input’s interface. This interface is then inserted into
ScannerBase’s interface instead of the default class
Input’s interface. This interface must
at least
offer the above-mentioned members and constructors (their functions are
described below). The class may contain additional members if required by the
user-defined implementation. The implementation itself is expected in
sourcefile. The contents of this file are inserted in the generated
lex.cc file instead of
Input’s default implementation. The
file
sourcefile should probably not have a
.cc extension to
prevent its compilation by a program maintenance utility.
When the lexical scanner generated by
flexc++ switches streams using the
//include directive (see the
rules(3flexc++) man-page) the input
stream that’s currently processed is pushed on an
Input stack
maintained by
ScannerBase, and processing continues at the file named
at the
//include directive. Once the latter file has been processed,
the previously pushed stream is popped off the stack, and processing of the
popped stream continues. This implies that
Input objects must be
`stack-able’. The required interface is designed to satisfy this
requirement.
9.2. CONSTRUCTORS¶
- o
- Input() The default constructor is used by
ScannerBase to prepare the stack for Input objects. It must
make sure that a default (empty) Input object is in a valid state
and can be destroyed. It serves no further purpose. Input objects,
however, must support the default (or overloaded) assignment
operator.
- o
- Input(std::istream *iStream, size_t lineNr = 1) This
constructor receives a pointer to a dynamically allocated istream
object. The Input constructor should preserve this pointer when the
Input object is pushed on and popped off the stack. A
shared_ptr probably comes in handy here. The Input object
becomes the owner of the istream object, albeit that its destructor
is not supposed to destroy the istream object. Destruction
remains the responsibility of the ScannerBase object, which calls
the Input::close member (see below) when it’s time to destroy
(close) the stream.
- The new input stream’s line counter is set to
lineNr, by default 1.
9.3. REQUIRED PUBLIC MEMBER FUNCTIONS¶
- o
- size_t get() returns the next character to be
processed by the lexical scanner. Usually it will be the next character
from the istream passed to the Input class at construction
time. It is never called by the ScannerBase object for Input
objects defined using Input’s default constructor. It should
return 0x100 once istream’s end-of-file has been
reached.
- o
- size_t lineNr() const should return the (1-based)
number of the istream object passed to the Input object. At
construction time the istream has just been opened and so at that
point lineNr should return 1.
- o
- void reRead(size_t ch) if provided with a value
smaller than 0x100 ch should be pushed back onto the
istream, where it becomes the character next to be returned.
Physically the character doesn’t have to be pushed back. The default
implementation uses a deque onto which the character is
pushed-front. Only when this deque is exhausted characters are
retrieved from the Input object’s istream.
- o
- void reRead(std::string const &str, size_t
fmIdx) the characters in str from fmIdx until the
string’s final character are pushed back onto the istream
object so that the string’s first character is retrieved first and
the string’s last character is retrieved last.
- o
- void close() the istream object initially
passed to the Input object is deleted by close, thereby not
only freeing the stream’s memory, but also closing the stream if the
stream in fact was an ifstream. Note that the Input’s
destructor should not destroy the Input’s
istream object.
FILES¶
Flexc++’s default skeleton files are in
/usr/share/flexc++.
By default,
flexc++ generates the following files:
- o
- scanner.h: the header file containing the scanner
class’s interface.
- o
- scannerbase.h: the header file containing the
interface of the scanner class’s base class.
- o
- scanner.ih: the internal header file that is meant
to be included by the scanner class’s source files (e.g., it is
included by lex.cc, see the next file), and that should contain all
declarations required for compiling the scanner class’s
sources.
- o
- lex.cc: the source file implementing the scanner
class member function lex (and support functions), performing the
lexical scan.
SEE ALSO¶
bisonc++(1)
BUGS¶
- o
- The priority of interval expressions ({...}) equals
the priority of other multiplicative operators (like *).
- o
- All INITIAL rules apply to inclusive mini scanners,
also those INITIAL rules that were explicitly associated with the
INITIAL mini scanner.
ABOUT flexc++¶
Flexc++ was originally started as a programming project by Jean-Paul van
Oosten and Richard Berendsen in the 2007-2008 academic year. After graduating,
Richard left the project and moved to Amsterdam. Jean-Paul remained in
Groningen, and after on-and-off activities on the project, in close
cooperation with Frank B. Brokken, Frank undertook a rewrite of the
project’s code around 2010. During the development of
flexc++,
the lookahead-operator handling continuously threatened the completion of the
project. By now, the project has evolved to a level that we feel it’s
defensible to publish the program, although we still tend to consider the
program in its experimental stage; it will remain that way until we decide to
move its version from the 0.9x.xx series to the 1.xx.xx series.
COPYRIGHT¶
This is free software, distributed under the terms of the GNU General Public
License (GPL).
AUTHOR¶
Frank B. Brokken (
f.b.brokken@rug.nl),
Jean-Paul van Oosten (
j.p.van.oosten@rug.nl),
Richard Berendsen (
richardberendsen@xs4all.nl) (until 2010).