NAME¶
flexc++ - Generate a C++ scanner class and parsing function
SYNOPSIS¶
flexc++ [options]
rules-file
DESCRIPTION¶
Flexc++(1) was designed after
flex(1) and
flex++(1). Like
these latter two programs
flexc++ generates code performing
pattern-matching on text, possibly executing actions when certain
regular
expressions are recognized.
Flexc++, contrary to
flex and
flex++, generates code that
is explicitly intended for use by
C++ programs. The well-known
flex(1) program generates
C source-code and
flex++(1)
merely offers a
C++-like shell around the
yylex function
generated by
flex(1) and hardly supports present-day ideas about
C++ software development.
Contrary to this,
flexc++ creates a
C++ class offering a
predefined member function
lex matching input against regular
expressions and possibly executing
C++ code once regular expressions
were matched. The code generated by
flexc++ is pure
C++,
allowing its users to apply all of the features offered by that language.
Not every aspect of
flexc++ is covered by the man-pages. In addition to
what’s summarized by the man-pages the
flexc++ manual offers a
chapter covering pre-loading of input lines (allowing you to, e.g, display
lines in which errors are observed even though not all of the line’s
tokens have already been scanned), as well as a chapter covering technical
documentation about the inner working of
flexc++.
From version 0.92.00 Until version 1.07.00
flexc++ offered one big manual
page. The advantage of that being that you never had to look for which manual
page contained which information. At the same time,
flexc++’s
man-page grew into a huge man-page, in which it was hard to find your way.
Starting with release 1.08.00 we reverted back to using multiple man-pages.
The following index relates manual pages to their specific contents:
This man-page
This man-page offers the following sections:
- o
- 1. QUICK START: a quick start overview about how to use
flexc++.
- o
- 2. QUICK START: FLEXC++ and BISONC++: a quick start overview about
how to use flexc++ in combination with bisonc++(1)
- o
- 3. GENERATED FILES: files generated by flexc++ and their
purposes
- o
- 4. OPTIONS: options available for flexc++
The
flexc++api(3) man-page:
This man-page describes the classes generated by
flexc++, describing
flexc++’s actions from the programmer’s point of view.
- o
- 1. INTERACTIVE SCANNERS: how to create an interactive scanner
- o
- 2. THE CLASS INTERFACE: SCANNER.H: Constructors and members of the
scanner class generated by flexc++
- o
- 3. NAMING CONVENTION: symbols defined by flexc++ in the
scanner class.
- o
- 4. CONSTRUCTORS: constructors defined in the scanner class.
- o
- 5. PUBLIC MEMBER FUNCTION: public member declared in the scanner
class.
- o
- 6. PRIVATE MEMBER FUNCTIONS: private members declared in the
scanner class.
- o
- 7. SCANNER CLASS HEADER EXAMPLE: an example of a generated scanner
class header
- o
- 8. THE SCANNER BASE CLASS: the scanner class is derived from a base
class. The base class is described in this section
- o
- 9. PUBLIC ENUMS AND -TYPES: enums and types declared by the base
class
- o
- 10. PROTECTED ENUMS AND -TYPES: enumerations and types used by the
scanner and scanner base classes
- o
- 11. NO PUBLIC CONSTRUCTORS: the scanner base class does not offer
public constructors.
- o
- 12. PUBLIC MEMBER FUNCTIONS: several members defined by the scanner
base class have public access rights.
- o
- 13. PROTECTED CONSTRUCTORS: the base class can be constructed by a
derived class. Usually this is the scanner class generated by
flexc++.
- o
- 14. PROTECTED MEMBER FUNCTIONS: this section covers the base class
member functions that can only be used by scanner class or scanner base
class members
- o
- 15. PROTECTED DATA MEMBERS: this section covers the base class data
members that can only be used by scanner class or scanner base class
members
- o
- 16. FLEX++ TO FLEXC++ MEMBERS: a short overview of frequently used
flex(1) members that received different names in
flexc++.
- o
- 17. THE CLASS INPUT: the scanner’s job is completely
decoupled from the actual input stream. The class Input, nested
within the scanner base class handles the communication with the input
streams. The class Input, is described in this section.
- o
- 18. INPUT CONSTRUCTORS: the class Input can easily be
replaced by another class. The constructor-requirements are described in
this section.
- o
- 19. REQUIRED PUBLIC MEMBER FUNCTIONS: this section covers the
required public members of a self-made Input class
The
flexc++input(7) man-page:
This man-page describes how
flexc++’s input
s should be
organized. It contains the following sections:
- o
- 1. SPECIFICATION FILE(S): the format and contents of flexc++
input files, specifying the Scanner’s characteristics
- o
- 2. FILE SWITCHING: how to switch to another input specification
file
- o
- 3. DIRECTIVES: directives that can be used in input specification
files
- o
- 4. MINI SCANNERS: how to declare mini-scanners
- o
- 5. DEFINITIONS: how to define symbolic names for regular
expressions
- o
- 6. %% SEPARATOR: the separator between the input specification
sections
- o
- 7. REGULAR EXPRESSIONS: regular expressions supported by
flexc++
- o
- 8. SPECIFICATION EXAMPLE: an example of a specification file
1. QUICK START¶
A bare-bones, no-frills scanner is generated as follows:
- o
- Create a file lexer defining the regular expressions to recognize,
and the tokens to return. Use token values exceeding 0xff if plain ascii
character values can also be used as token values. Example (assume
capitalized words are token-symbols defined in an enum defined by the
scanner class):
%%
[ \t\n]+ // skip white space chars.
[0-9]+ return NUMBER;
[[:alpha:]_][[:alpha:][:digit:]_]* return IDENTIFIER;
. return matched()[0];
- o
- Execute:
flexc++ lexer
This generates four files: Scanner.h, Scanner.ih, Scannerbase.h, and
lex.cc
- o
- Edit Scanner.h, add the enum defining the token-symbols in
(usually) the public section of the class Scanner. E.g.,
class Scanner: public ScannerBase
{
public:
enum Tokens
{
IDENTIFIER = 0x100,
NUMBER
};
// ... (etc, as generated by flexc++)
- o
- Create a file defining int main, e.g.:
#include <iostream>
#include "Scanner.h"
using namespace std;
int main()
{
Scanner scanner; // define a Scanner object
while (int token = scanner.lex()) // get all tokens
{
string const &text = scanner.matched();
switch (token)
{
case Scanner::IDENTIFIER:
cout << "identifier: " << text << ’\n’;
break;
case Scanner::NUMBER:
cout << "number: " << text << ’\n’;
break;
default:
cout << "char. token: `" << text << "’\n";
break;
}
}
}
- o
- Compile all .cc files:
g++ --std=c++11 *.cc
- o
- To `tokenize’ main.cc, execute:
a.out < main.cc
)
2. QUICK START: FLEXC++ and BISONC++¶
To interface
flexc++ to the
bisonc++(1) parser generator proceed
as follows:
- o
- Specify a grammar that can be processed by bisonc++(1). Assuming
that the scanner and parser are developed in, respectively, the
sub-directories scanner and parser, then a simple grammar
specification that can be used with the scanner developed in the previous
section is, e.g., write the file parser/grammar:
%scanner ../scanner/Scanner.h
%scanner-token-function d_scanner.lex()
%token IDENTIFIER NUMBER CHAR
%%
startrule:
startrule tokenshow
|
tokenshow
;
tokenshow:
token
{
std::cout << "matched: " << d_scanner.matched() << ’\n’;
}
;
token:
IDENTIFIER
|
NUMBER
|
CHAR
;
- o
- Write a scanner specification file. E.g.,
%%
[ \t\n]+ // skip white space chars.
[0-9]+ return Parser::NUMBER;
[[:alpha:]_][[:alpha:][:digit:]_]* return Parser::IDENTIFIER;
. return Parser::CHAR;
This causes the scanner to return Parser tokens to the generated
parser.
- o
- Add the line
#include "../parser/Parserbase.h"
to the file scanner/Scanner.ih
- o
- Write a simple main function in the file main.cc. E.g.,
#include "parser/Parser.h"
int main(int argc, char **argv)
{
Parser parser;
parser.parse();
}
- o
- Generate a scanner in the scanner subdirectory:
flexc++ lexer
- o
- Generate a parser in the parser subdirectory:
bisonc++ grammar
- o
- Compile all sources:
g++ --std=c++0x *.cc */*.cc
- o
- Execute the program, providing it some source file to be processed:
a.out < main.cc
3. GENERATED FILES¶
Flexc++ generates four files from a well-formed input file:
- o
- A file containing the implementation of the lex member function and
its support functions. By default this file is named lex.cc.
- o
- A file containing the scanner’s class interface. By default this
file is named Scanner.h. The scanner class itself is generated once
and is thereafter `owned’ by the programmer, who may change it
ad-lib. Newly added members (data members, function members) will
survive future flexc++ runs as flexc++ will never rewrite an
existing scanner class interface file, unless explicitly ordered to do
so.
- o
- A file containing the interface of the scanner class’s base
class. The scanner class is publicly derived from this base class.
It is used to minimize the size of the scanner interface itself. The
scanner base class is `owned’ by flexc++ and should never be
hand-modified. By default the scanner’s base class is provided in
the file Scannerbase.h. At each new flexc++ run this file is
rewritten unless flexc++ is explicitly ordered not to do
so.
- o
- A file containing the implementation header. This file should
contain includes and declarations that are only required when compiling
the members of the scanner class. By default this file is named
Scanner.ih. This file, like the file containing the scanner
class’s interface is never rewritten by flexc++ unless
flexc++ is explicitly ordered to do so.
4. OPTIONS¶
Where available, single letter options are listed between parentheses following
their associated long-option variants. Single letter options require arguments
if their associated long options require arguments as well. Options affecting
the class header or implementation header file are ignored if these files
already exist. Options accepting a `filename’ do not accept path names,
i.e., they cannot contain directory separators (
/); options accepting
a ’pathname’ may contain directory separators.
Some options may generate errors. This happens when an option conflicts with the
contents of an existing file which
flexc++ cannot modify (e.g., a
scanner class header file exists, but doesn’t define a name space, but
a
--namespace option was provided). To solve the error the offending
option could be omitted, the existing file could be removed, or the existing
file could be hand-edited according to the option’s specification. Note
that
flexc++ currently does not handle the opposite error condition: if
a previously used option is omitted, then
flexc++ does not detect the
inconsistency. In those cases you may encounter compilation errors.
- o
- --baseclass-header=filename (-b)
Use filename as the name of the file to contain the scanner
class’s base class. Defaults to the name of the scanner class plus
base.h
- It is an error if this option is used and an already existing
scanner-class header file does not include `filename’.
- o
- --baseclass-skeleton=pathname (-C)
Use pathname as the path to the file containing the skeleton of the
scanner class’s base class. Its filename defaults to
flexc++base.h.
- o
- --case-insensitive
Use this option to generate a scanner case insensitively matching
regular expressions. All regular expressions specified in
flexc++’s input file are interpreted case insensitively and
the resulting scanner object will case insensitively interpret its
input.
- When this option is specified the resulting scanner does not distinguish
between the following rules:
First // initial F is transformed to f
first
FIRST // all capitals are transformed to lower case chars
With a case-insensitive scanner only the first rule can be matched, and
flexc++ will issue warnings for the second and third rule about
rules that cannot be matched.
- Input processed by a case-insensitive scanner is also handled case
insensitively. The above mentioned First rule is matched for all of
the following input words: first First FIRST firST.
- Although the matching process proceeds case insensitively, the matched
text (as returned by the scanner’s matched() member) always
contains the original, unmodified text. So, with the above input
matched() returns, respectively first, First, FIRST and
firST, while matching the rule First.
- o
- --class-header=filename (-c)
Use filename as the name of the file to contain the scanner class.
Defaults to the name of the scanner class plus the suffix .h
- o
- --class-name=className
Use className (rather than Scanner) as the name of the scanner
class. Unless overridden by other options generated files will be given
the (transformed to lower case) className* name instead of
scanner*.
- It is an error if this option is used and an already existing
scanner-class header file does not define class
`className’
- o
- --class-skeleton=pathname (-C)
Use pathname as the path to the file containing the skeleton of the
scanner class. Its filename defaults to flexc++.h.
- o
- --construction (-K)
Write details about the lexical scanner to the file
`rules-file’.output. Details cover the used character
ranges, information about the regexes, the raw NFA states, and the final
DFAs.
- o
- --debug (-d)
Provide lex and its support functions with debugging code, showing
the actual parsing process on the standard output stream. When included,
the debugging output is active by default, but its activity may be
controlled using the setDebug(bool on-off) member. Note that
#ifdef DEBUG macros are not used anymore. By rerunning
flexc++ without the --debug option an equivalent scanner is
generated not containing the debugging code.
- o
- --filenames=genericName (-f)
Generic name of generated files (header files, not the lex-function
source file, see the --lex-source option for that). By default the
header file names will be equal to the name of the generated class.
- o
- --help (-h)
Write basic usage information to the standard output stream and
terminate.
- o
- --implementation-header=filename (-i)
Use filename as the name of the file to contain the implementation
header. Defaults to the name of the generated scanner class plus the
suffix .ih. The implementation header should contain all directives
and declarations only used by the implementations of the
scanner’s member functions. It is the only header file that is
included by the source file containing lex()’s
implementation. User defined implementation of other class members may use
the same convention, thus concentrating all directives and declarations
that are required for the compilation of other source files belonging to
the scanner class in one header file.
- It is an error if this option is used and an already
’filename’ file does not include the scanner class
header file.
- o
- --implementation-skeleton=pathname (-I)
Use pathname as the path to the file containing the skeleton of the
implementation header. Its filename defaults to flexc++.ih.
- o
- --lex-skeleton=pathname (-L)
Use pathname as the path to the file containing the lex()
member function’s skeleton. Its filename defaults to
flexc++.cc.
- o
- --lex-function-name=funname
Use funname rather than lex as the name of the member function
performing the lexical scanning.
- o
- --lex-source=filename (-l)
Define filename as the name of the source file to contain the scanner
member function lex. Defaults to lex.cc.
- o
- --matched-rules (-’R’)
The generated scanner will write the numbers of matched rules to the
standard output. It is implied by the --debug option. Displaying
the matched rules can be suppressed by calling the generated
scanner’s member setDebug(false) (or, of course, by
re-generating the scanner without using specifying
--matched-rules).
- o
- --max-depth=depth (-m)
Set the maximum inclusion depth of the lexical scanner’s
specification files to depth. By default the maximum depth is set
to 10. When more than depth specification files are used the
scanner throws a Max stream stack size exceeded
std::length_error exception.
- o
- --namespace=identifier
Define the scanner class in the namespace identifier. By default no
namespace is used. If this options is used the implementation header is
provided with a commented out using namespace declaration
for the requested namespace. In addition, the scanner and scanner base
class header files also use the specified namespace to define their
include guard directives.
- It is an error if this option is used and an already scanner-class header
file does not define namespace identifier.
- o
- --no-baseclass-header
Do not write the file containing the scanner’s base class interface
even if it doesn’t yet exist. By default the file containing the
scanner’s base class interface is (re)written each time
flexc++ is called.
- o
- --no-lines
Do not put #line preprocessor directives in the file containing the
scanner’s lex function. By default #line directives
are entered at the beginning of the action statements in the generated
lex.cc file, allowing the compiler and debuggers to associate
errors with lines in your grammar specification file, rather than with the
source file containing the lex function itself.
- o
- --no-lex-source
Do not write the file containing the scanner’s predefined scanner
member functions, even if that file doesn’t yet exist. By default
the file containing the scanner’s lex member function is
(re)written each time flexc++ is called. This option should
normally be avoided, as this file contains parsing tables which are
altered whenever the grammar definition is modified.
- o
- --own-tokens (-T)
The tokens returned as well as the text matched when flexc++ reads
its input files(s) are shown when this option is used.
- This option does not result in the generated program displaying
returned tokens and matched text. If that is what you want, use the
--print-tokens option.
- o
- --print-tokens (-t)
The tokens returned as well as the text matched by the generated lex
function are displayed on the standard output stream, just before
returning the token to lex’s caller. Displaying tokens and
matched text is suppressed again when the lex.cc file is generated
without using this option. The function showing the tokens (
ScannerBase::print__) is called from Scanner::printTokens,
which is defined in-line in Scanner.h. Calling
ScannerBase::print__, therefore, can also easily be controlled by
an option controlled by the program using the scanner object.
- This option does not show the tokens returned and text matched by
flexc++ itself when reading its input s. If that is what you
want, use the --own-tokens option.
- o
- --regex-calls
Show the function call order when parsing regular expressions (this option
is normally not required. Its main purpose is to help developers
understand what happens when regular expressions are parsed).
- o
- --show-filenames (-F)
Write the names of the files that are generated to the standard error
stream.
- o
- --skeleton-directory=pathname (-S)
Defines the directory containing the skeleton files. This option can be
overridden by the specific skeleton-specifying options ( -B -C, -H,
and -I).
- o
- --target-directory=pathname
Specifies the directory where generated files should be written. By default
this is the directory where flexc++ is called.
- o
- --usage (-h)
Write basic usage information to the standard output stream and
terminate.
- o
- --verbose(-V)
The verbose option generates on the standard output stream various pieces of
additional information, not covered by the --construction and
--show-filenames options.
- o
- --version (-v)
Display flexc++’s version number and terminate.
FILES¶
Flexc++’s default skeleton files are in
/usr/share/flexc++.
By default,
flexc++ generates the following files:
- o
- Scanner.h: the header file containing the scanner class’s
interface.
- o
- Scannerbase.h: the header file containing the interface of the
scanner class’s base class.
- o
- Scanner.ih: the internal header file that is meant to be included
by the scanner class’s source files (e.g., it is included by
lex.cc, see the next item’s file), and that should contain
all declarations required for compiling the scanner class’s
sources.
- o
- lex.cc: the source file implementing the scanner class member
function lex (and support functions), performing the lexical scan.
SEE ALSO¶
bisonc++(1),
flexc++api(3),
flexc++input(7)
BUGS¶
None reported
ABOUT flexc++¶
Flexc++ was originally started as a programming project by Jean-Paul van
Oosten and Richard Berendsen in the 2007-2008 academic year. After graduating,
Richard left the project and moved to Amsterdam. Jean-Paul remained in
Groningen, and after on-and-off activities on the project, in close
cooperation with Frank B. Brokken, Frank undertook a rewrite of the
project’s code around 2010. During the development of
flexc++,
the lookahead-operator handling continuously threatened the completion of the
project. By now, the project has evolved to a level that we feel it’s
defensible to publish the program, although we still tend to consider the
program in its experimental stage; it will remain that way until we decide to
move its version from the 0.9x.xx series to the 1.xx.xx series.
COPYRIGHT¶
This is free software, distributed under the terms of the GNU General Public
License (GPL).
AUTHOR¶
Frank B. Brokken (
f.b.brokken@rug.nl),
Jean-Paul van Oosten (
j.p.van.oosten@rug.nl),
Richard Berendsen (
richardberendsen@xs4all.nl) (until 2010).