.TH "flexc++" "1" "2008\-2012" "flexc++\&.0\&.98\&.00\&.tar\&.gz" "flexc++ scanner generator" .PP .SH "NAME" flexc++ \- Generate a C++ scanner class and parsing function .PP .SH "SYNOPSIS" \fBflexc++\fP [options] \fIrules\-file\fP .PP .SH "DESCRIPTION" .PP \fBFlexc++\fP(1) was designed after \fBflex\fP(1) and \fBflex++\fP(1)\&. Like these latter two programs \fBflexc++\fP generates code performing pattern\-matching on text, possibly executing actions when certain \fIregular expressions\fP are recognized\&. .PP \fBFlexc++\fP, contrary to \fBflex\fP and \fBflex++\fP, generates code that is explicitly intended for use by \fBC++\fP programs\&. The well\-known \fBflex\fP(1) program generates \fBC\fP source\-code and \fBflex++\fP(1) merely offers a \fBC++\fP\-like shell around the \fIyylex\fP function generated by \fBflex\fP(1) and hardly supports present\-day ideas about \fBC++\fP software development\&. .PP Contrary to this, \fBflexc++\fP creates a \fBC++\fP class offering a predefined member function \fBlex\fP matching input against regular expressions and possibly executing \fBC++\fP code once regular expressions were matched\&. The code generated by \fBflexc++\fP is pure \fBC++\fP, allowing its users to apply all of the features offered by that language\&. .PP Below, the following sections may be consulted for specific details: .IP o \fB1\&. QUICK START\fP: a quick start overview about how to use \fBflexc++\fP\&. .IP o \fB2\&. QUICK START: FLEXC++ and BISONC++\fP: a quick start overview about how to use \fBflexc++\fP in combination with \fBbisonc++\fP(1) .IP o \fB3\&. GENERATED FILES\fP: files generated by \fBflexc++\fP and their purposes .IP o \fB4\&. OPTIONS\fP: options available for \fBflexc++\fP .IP o \fB5\&. INTERACTIVE SCANNERS\fP: how to create an interactive scanner .IP .IP o \fB6\&. SPECIFICATION FILE(S)\fP: the format and contents of \fBflexc++\fP input files, specifying the Scanner\(cq\&s characteristics .IP o \fB6\&.1\&. FILE SWITCHING\fP: how to switch to another input specification file .IP o \fB6\&.2\&. DIRECTIVES\fP: directives that can be used in input specification files .IP o \fB6\&.3\&. MINI SCANNERS\fP: how to declare mini\-scanners .IP o \fB6\&.4\&. DEFINITIONS\fP: how to define symbolic names for regular expressions .IP o \fB6\&.5\&. %% SEPARATOR\fP: the separator between the input specification sections .IP o \fB6\&.6\&. REGULAR EXPRESSIONS\fP: regular expressions supported by \fBflexc++\fP .IP o \fB6\&.7\&. SPECIFICATION EXAMPLE\fP: an example of a specification file .IP .IP o \fB7\&. THE CLASS INTERFACE: SCANNER\&.H\fP: Constructors and members of the scanner class generated by \fBflexc++\fP .IP o \fB7\&.1\&. NAMING CONVENTION\fP: symbols defined by \fBflexc++\fP in the scanner class\&. .IP o \fB7\&.2 CONSTRUCTORS\fP: constructors defined in the scanner class\&. .IP o \fB7\&.3 PUBLIC MEMBER FUNCTION\fP: public member declared in the scanner class\&. .IP o \fB7\&.4\&. PRIVATE MEMBER FUNCTIONS\fP: private members declared in the scanner class\&. .IP o \fB7\&.5\&. SCANNER CLASS HEADER EXAMPLE\fP: an example of a generated scanner class header .IP .IP o \fB8\&.1\&. THE SCANNER BASE CLASS\fP: the scanner class is derived from a base class\&. The base class is described in this section .IP o \fB8\&.2\&. PUBLIC ENUMS AND \-TYPES\fP: enums and types declared by the base class .IP o \fB8\&.3\&. PROTECTED ENUMS AND \-TYPES\fP: enumerations and types used by the scanner and scanner base classes .IP o \fB8\&.4\&. NO PUBLIC CONSTRUCTORS\fP: the scanner base class does not offer public constructors\&. .IP o \fB8\&.5\&. PUBLIC MEMBER FUNCTIONS\fP: several members defined by the scanner base class have public access rights\&. .IP o \fB8\&.6\&. PROTECTED CONSTRUCTORS\fP: the base class can be constructed by a derived class\&. Usually this is the scanner class generated by \fBflexc++\fP\&. .IP o \fB8\&.7\&. PROTECTED MEMBER FUNCTIONS\fP: this section covers the base class member functions that can only be used by scanner class or scanner base class members .IP o \fB8\&.8\&. PROTECTED DATA MEMBERS\fP: this section covers the base class data members that can only be used by scanner class or scanner base class members .IP o \fB8\&.9\&. FLEX++ TO FLEXC++ MEMBERS\fP: a short overview of frequently used \fBflex\fP(1) members that received different names in \fBflexc++\fP\&. .IP .IP o \fB9\&.1 THE CLASS INPUT\fP: the scanner\(cq\&s job is completely decoupled from the actual input stream\&. The class \fIInput\fP, nested within the scanner base class handles the communication with the input streams\&. The class \fIInput\fP, is described in this section\&. .IP o \fB9\&.2\&. CONSTRUCTORS\fP: the class \fIInput\fP can easily be replaced by another class\&. The constructor\-requirements are described in this section\&. .IP o \fB9\&.3\&. REQUIRED PUBLIC MEMBER FUNCTIONS\fP: this section covers the required public members of a self\-made \fIInput\fP class .PP .SH "1\&. QUICK START" .PP A bare\-bones, no\-frills scanner is generated as follows: .PP .IP o Create a file \fIlexer\fP defining the regular expressions to recognize, and the tokens to return\&. Use token values exceeding 0xff if plain ascii character values can also be used as token values\&. Example (assume capitalized words are token\-symbols defined in an enum defined by the scanner class): .nf %% [ \et\en]+ // skip white space chars\&. [0\-9]+ return NUMBER; [[:alpha:]_][[:alpha:][:digit:]_]* return IDENTIFIER; \&. return matched()[0]; .fi .IP .IP o Execute: .nf flexc++ lexer .fi This generates four files:\fIscanner\&.h, scanner\&.ih, scannerbase\&.h\fP, and \fIlex\&.cc\fP .PP .IP o Edit \fIscanner\&.h\fP, add the enum defining the token\-symbols in (usually) the public section of the class \fIScanner\fP\&. E\&.g\&., .nf class Scanner: public ScannerBase { public: enum Tokens { IDENTIFIER = 0x100, NUMBER }; // \&.\&.\&. (etc, as generated by flexc++) .fi .PP .IP o Create a file defining \fIint main\fP, e\&.g\&.: .nf #include #include \(dq\&scanner\&.h\(dq\& using namespace std; int main() { Scanner scanner; // define a Scanner object while (int token = scanner\&.lex()) // get all tokens { string const &text = scanner\&.matched(); switch (token) { case IDENTIFIER: cout << \(dq\&identifier: \(dq\& << text << \(cq\&\en\(cq\&; break; case NUMBER: cout << \(dq\&number: \(dq\& << text << \(cq\&\en\(cq\&; break; default: cout << \(dq\&char\&. token: `\(dq\& << text << \(dq\&\(cq\&\en\(dq\&; break; } } } .fi .IP o Compile all \fI\&.cc\fP files: .nf g++ \-\-std=c++0x *\&.cc .fi .PP .IP o To `tokenize\(cq\& \fImain\&.cc\fP, execute: .nf a\&.out < main\&.cc .fi ) .PP .SH "QUICK START: FLEXC++ and BISONC++" .PP To interface \fBflexc++\fP to the \fBbisonc++\fP(1) parser generator proceed as follows: .IP o Specify a grammar that can be processed by \fBbisonc++\fP(1)\&. Assuming that the scanner and parser are developed in, respectively, the sub\-directories \fIscanner\fP and \fIparser\fP, then a simple grammar specification that can be used with the scanner developed in the previous section is, e\&.g\&., write the file \fIparser/grammar\fP: .nf %scanner \&.\&./scanner/scanner\&.h %scanner\-token\-function d_scanner\&.lex() %token IDENTIFIER NUMBER CHAR %% startrule: startrule tokenshow | tokenshow ; tokenshow: token { std::cout << \(dq\&matched: \(dq\& << d_scanner\&.matched() << \(cq\&\en\(cq\&; } ; token: IDENTIFIER | NUMBER | CHAR ; .fi .IP o Write a scanner specification file\&. E\&.g\&., .nf %% [ \et\en]+ // skip white space chars\&. [0\-9]+ return Parser::NUMBER; [[:alpha:]_][[:alpha:][:digit:]_]* return Parser::IDENTIFIER; \&. return Parser::CHAR; .fi This causes the scanner to return \fIParser\fP tokens to the generated parser\&. .IP .IP o Add the line .nf #include \(dq\&\&.\&./parser/Parserbase\&.h\(dq\& .fi to the file \fIscanner/scanner\&.ih\fP .IP .IP o Write a simple \fImain\fP function in the file \fImain\&.cc\fP\&. E\&.g\&., .nf #include \(dq\&parser/Parser\&.h\(dq\& int main(int argc, char **argv) { Parser parser; parser\&.parse(); } .fi .IP .IP o Generate a scanner in the \fIscanner\fP subdirectory: .nf flexc++ lexer .fi .IP .IP o Generate a parser in the \fIparser\fP subdirectory: .nf bisonc++ grammar .fi .IP .IP o Compile all sources: .nf g++ \-\-std=c++0x *\&.cc */*\&.cc .fi .IP .IP o Execute the program, providing it some source file to be processed: .nf a\&.out < main\&.cc .fi .PP .SH "3\&. GENERATED FILES" .PP \fBFlexc++\fP generates four files from a well\-formed input file: .IP o A file containing the implementation of the \fIlex\fP member function and its support functions\&. By default this file is named \fIlex\&.cc\fP\&. .IP o A file containing the scanner\(cq\&s class interface\&. By default this file is named \fIscanner\&.h\fP\&. The scanner class itself is generated once and is thereafter `owned\(cq\& by the programmer, who may change it \fIad\-lib\fP\&. Newly added members (data members, function members) will survive future \fBflexc++\fP runs as \fBflexc++\fP will never rewrite an existing scanner class interface file, unless explicitly ordered to do so\&. (see also \fBscanner\&.h\fP(3flexc++))\&. .IP o A file containing the interface of the scanner class\(cq\&s \fIbase class\fP\&. The scanner class is publicly derived from this base class\&. It is used to minimize the size of the scanner interface itself\&. The scanner base class is `owned\(cq\& by \fBflexc++\fP and should never be hand\-modified\&. By default the scanner\(cq\&s base class is provided in the file \fIscannerbase\&.h\fP\&. At each new \fBflexc++\fP run this file is rewritten unless \fBflexc++\fP is explicitly ordered \fInot\fP to do so (see also \fBscannerbase\&.h\fP(3flexc++))\&. .IP o A file containing the \fIimplementation header\fP\&. This file should contain includes and declarations that are only required when compiling the members of the scanner class\&. By default this file is named \fIscanner\&.ih\fP\&. This file, like the file containing the scanner class\(cq\&s interface is never rewritten by \fBflexc++\fP unless \fBflexc++\fP is explicitly ordered to do so (see also \fBimplementationheader\fP(3flexc++))\&. .PP .SH "4\&. OPTIONS" .PP If available, single letter options are listed between parentheses following their associated long\-option variants\&. Single letter options require arguments if their associated long options require arguments as well\&. .IP o \fB\-\-baseclass\-header\fP=\fIheader\fP (\fB\-b\fP) .br Use \fIheader\fP as the pathname of the file containing the scanner class\(cq\&s base class\&. Defaults to the name of the scanner class plus \fIbase\&.h\fP .IP o \fB\-\-baseclass\-skeleton\fP=\fIskeleton\fP (\fB\-C\fP) .br Use \fIskeleton\fP as the pathname of the file containing the skeleton of the scanner class\(cq\&s base class\&. Its filename defaults to \fIflexc++base\&.h\fP\&. .IP o \fB\-\-case\-insensitive\fP .br Use this option to generate a scanner \fIcase insensitively\fP matching regular expressions\&. All regular expressions specified in \fBflexc++\fP\(cq\&s input file are interpreted case insensitively and the resulting scanner object will case insensitively interpret its input\&. .IP When this option is specified the resulting scanner does not distinguish between the following rules: .nf First // initial F is transformed to f first FIRST // all capitals are transformed to lower case chars .fi With a case\-insensitive scanner only the first rule can be matched, and \fBflexc++\fP will issue warnings for the second and third rule about rules that cannot be matched\&. .IP Input processed by a case\-insensitive scanner is also handled case insensitively\&. The above mentioned \fIFirst\fP rule is matched for all of the following input words: \fIfirst First FIRST firST\fP\&. .IP Although the matching process proceeds case insensitively, the matched text (as returned by the scanner\(cq\&s \fImatched()\fP member) always contains the original, unmodified text\&. So, with the above input \fImatched()\fP returns, respectively \fIfirst, First, FIRST\fP and \fIfirST\fP, while matching the rule \fIFirst\fP\&. .IP o \fB\-\-class\-header\fP=\fIheader\fP (\fB\-c\fP) .br Use \fIheader\fP as the pathname of the file containing the scanner class\&. Defaults to the name of the scanner class plus the suffix \fI\&.h\fP .IP o \fB\-\-class\-name\fP=\fIclass\fP .br Use \fIclass\fP (rather than \fIScanner\fP) as the name of the scanner class\&. Unless overridden by other options generated files will be given the (transformed to lower case) \fIclass*\fP name instead of \fIscanner\fP*\&. .IP o \fB\-\-class\-skeleton\fP=\fIskeleton\fP (\fB\-C\fP) .br Use \fIskeleton\fP as the pathname of the file containing the skeleton of the scanner class\&. Its filename defaults to \fIflexc++\&.h\fP\&. .IP o \fB\-\-construction\fP (\fB\-K\fP) .br Write details about the lexical scanner to the file \fI`rules\-file\(cq\&\&.output\fP\&. Details cover the used character ranges, information about the regexes, the raw NFA states, and the final DFAs\&. .IP o \fB\-\-debug\fP (\fB\-d\fP) .br Provide \fIlex\fP and its support functions with debugging code, showing the actual parsing process on the standard output stream\&. When included, the debugging output is active by default, but its activity may be controlled using the \fIsetDebug(bool on\-off)\fP member\&. Note that \fI#ifdef DEBUG\fP macros are not used anymore\&. By rerunning \fBflexc++\fP without the \fB\-\-debug\fP option an equivalent scanner is generated not containing the debugging code\&. .IP o \fB\-\-filenames\fP=\fIgenericName\fP (\fB\-f\fP) .br Generic name of generated files (header files, not the \fIlex\fP\-function source file, see the \fI\-\-lex\-source\fP option for that)\&. By default the header file names will be equal to the name of the generated class\&. .IP o \fB\-\-force\-class\-header\fP .br By default the generated class header is not overwritten once it has been created\&. This option can be used to force the (re)writing of the file containing the scanner\(cq\&s class\&. .IP o \fB\-\-force\-implementation\-header\fP .br By default the generated implementation header is not overwritten once it has been created\&. This option can be used to force the (re)writing of the implementation header file\&. .IP o \fB\-\-help\fP (\fB\-h\fP) .br Write basic usage information to the standard output stream and terminate\&. .IP o \fB\-\-implementation\-header\fP=\fIheader\fP (\fB\-i\fP) .br Use \fIheader\fP as the pathname of the file containing the implementation header\&. Defaults to the name of the generated scanner class plus the suffix \fI\&.ih\fP\&. The implementation header should contain all directives and declarations \fIonly\fP used by the implementations of the scanner\(cq\&s member functions\&. It is the only header file that is included by the source file containing \fBlex()\fP\(cq\&s implementation \&. User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the scanner class in one header file\&. .IP o \fB\-\-implementation\-skeleton\fP=\fIskeleton\fP (\fB\-I\fP) .br Use \fIskeleton\fP as the pathname of the file containing the skeleton of the implementation header\&. Its filename defaults to \fIflexc++\&.ih\fP\&. .IP o \fB\-\-interactive\fP .br Generate an interactive scanner\&. An interactive scanner reads lines from the input stream, and then returns the tokens encountered on that line\&. The interactive scanner implemented by \fBflexc++\fP only predefines the \fIScanner(std::istream &in, std::ostream &out)\fP constructor, by default assuming that input is read from \fIstd::cin\fP\&. .IP o \fB\-\-lex\-skeleton\fP=\fIskeleton\fP (\fB\-L\fP) .br Use \fIskeleton\fP as the pathname of the file containing the \fIlex()\fP member function\(cq\&s skeleton\&. Its filename defaults to \fIflexc++\&.cc\fP\&. .IP o \fB\-\-lex\-function\-name\fP=\fIfunname\fP .br Use \fIfunname\fP rather than \fIlex\fP as the name of the member function performing the lexical scanning\&. .IP o \fB\-\-lex\-source\fP=\fIsource\fP (\fB\-l\fP) .br Define \fIsource\fP as the name of the source file containing the scanner member function \fIlex\fP\&. Defaults to \fIlex\&.cc\fP\&. .IP o \fB\-\-matched\-rules\fP (\fB\-\(cq\&R\(cq\&\fP) .br The generated scanner will write the numbers of matched rules to the standard output\&. It is implied by the \fI\-\-debug\fP option\&. Displaying the matched rules can be suppressed by calling the generated scanner\(cq\&s member \fIsetDebug(false)\fP (or, of course, by re\-generating the scanner without using specifying \fI\-\-matched\-rules\fP)\&. .IP o \fB\-\-max\-depth\fP=\fIdepth\fP (\fB\-m\fP) .br Set the maximum inclusion depth of the lexical scanner\(cq\&s specification files to \fIdepth\fP\&. By default the maximum depth is set to 10\&. When more than \fIdepth\fP specification files are used the scanner throws a \fIMax stream stack size exceeded\fP \fIstd::length_error\fP exception\&. .IP o \fB\-\-namespace\fP=\fInamespace\fP (\fB\-n\fP) .br Define the scanner base class, the paser class and the scanner implentations in the namespace \fInamespace\fP\&. By default no namespace is defined\&. If this options is used the implementation header will contain a commented out \fIusing namespace\fP declaration for the requested namespace\&. .IP o \fB\-\-no\-baseclass\-header\fP .br Do not write the file containing the scanner\(cq\&s base class interface even if it doesn\(cq\&t yet exist\&. By default the file containing the scanner\(cq\&s base class interface is (re)written each time \fBflexc++\fP is called\&. .IP o \fB\-\-no\-lines\fP .br Do not put \fB#line\fP preprocessor directives in the file containing the scanner\(cq\&s \fIlex\fP function\&. By default \fI#line\fP directives are entered at the beginning of the action statements in the generated \fIlex\&.cc\fP file, allowing the compiler and debuggers to associate errors with lines in your grammar specification file, rather than with the source file containing the \fIlex\fP function itself\&. .IP o \fB\-\-no\-lex\-source\fP .br Do not write the file containing the scanner\(cq\&s predefined scanner member functions, even if that file doesn\(cq\&t yet exist\&. By default the file containing the scanner\(cq\&s \fIlex\fP member function is (re)written each time \fBflexc++\fP is called\&. This option should normally be avoided, as this file contains parsing tables which are altered whenever the grammar definition is modified\&. .IP o \fB\-\-own\-tokens\fP (\fB\-T\fP) .br The tokens returned as well as the text matched when \fBflexc++\fP reads its input files(s) are shown when this option is used\&. .IP This option does \fInot\fP result in the generated program displaying returned tokens and matched text\&. If that is what you want, use the \fI\-\-print\-tokens\fP option\&. .IP o \fB\-\-print\-tokens\fP (\fB\-t\fP) .br The tokens returned as well as the text matched by the generated \fIlex\fP function are displayed on the standard output stream, just before returning the token to \fIlex\fP\(cq\&s caller\&. Displaying tokens and matched text is suppressed again when the \fIlex\&.cc\fP file is generated without using this option\&. The function showing the tokens (\fIScannerBase::print__\fP) is called from \fIScanner::printTokens\fP, which is defined in\-line in \fIscanner\&.h\fP\&. Calling \fIScannerBase::print__\fP, therefore, can also easily be controlled by an option controlled by the program using the scanner object\&. .IP This option does \fInot\fP show the tokens returned and text matched by \fBflexc++\fP itself when reading its input \fBs\fP\&. If that is what you want, use the \fI\-\-own\-tokens\fP option\&. .IP o \fB\-\-show\-filenames\fP (\fB\-F\fP) .br Write the names of the files that are generated to the standard error stream\&. .IP o \fB\-\-skeleton\-directory\fP=\fIdirectory\fP (\fB\-S\fP) .br Specifies the directory containing the skeleton files to use\&. This option can be overridden by the specific skeleton\-specifying options (\fI\-B \-C, \-H,\fP and \fI\-I\fP)\&. .IP o \fB\-\-target\-directory\fP=\fIdirectory\fP .br Specifies the directory where generated files should be written\&. By default this is the directory of \fBflexc++\fP\(cq\&s input file\&. The \fI\-\-target\-directory\fP option does not affect files that were explicitly named (either as option or as directive)\&. .IP o \fB\-\-usage\fP (\fB\-h\fP) .br Write basic usage information to the standard output stream and terminate\&. .IP o \fB\-\-verbose\fP(\fB\-V\fP) .br The verbose option generates on the standard output stream various pieces of additional information, not covered by the \fI\-\-construction\fP and \fI\-\-show\-filenames\fP options\&. .IP o \fB\-\-version\fP (\fB\-v\fP) .br Display \fBflexc++\fP\(cq\&s version number and terminate\&. .PP .SH "5\&. INTERACTIVE SCANNERS" .PP An interactive scanner is characterized by the fact that scanning is postponed until an end\-of\-line character has been received, followed by reading all information on the line, read so far\&. \fBFlexc++\fP supports the \fI\-\-interactive\fP option (or the equivalent \fI%interactive\fP directive), generating an interactive scanner\&. Here it is assumed that \fIScanner\fP is the name of the scanner class generated by \fBflexc++\fP\&. .PP The interactive scanner generated by \fBflexc++\fP has the following characteristics: .IP o The \fIScanner\fP class is derived privately from \fIstd::istringstream\fP and (as usual) publicly from \fIScannerBase\fP\&. .IP o The \fIistringstream\fP base class is constructed by its default constructor\&. .IP o The function \fIlex\fP\(cq\&s default implementation is removed from \fIscanner\&.h\fP and is implemented in the generated \fIlex\&.cc\fP source file\&. It performs the following tasks: .IP \- If the token returned by the scanner is not equal to 0 it is returned as then next token; .IP \- Otherwise the next line is retrieved from the input stream passed to the \fIScanner\fP\(cq\&s constructor (by default \fIstd::cin\fP)\&. If this fails, 0 is returned\&. .IP \- A \fI\(cq\&\en\(cq\&\fP character is appended to the just read line, and the scanner\(cq\&s \fIstd::istringstream\fP base class object is re\-initialized with that line; .IP \- The member \fIlex__\fP returns the next token\&. This implementation allows code calling \fIScanner::lex()\fP to conclude, as usual, that the input is exhausted when \fIlex\fP returns 0\&. .PP Here is an example of how such a scanner could be used: .nf // scanner generated with: \(cq\&flexc++ \-\-interactive lexer\(cq\& or with // \(cq\&flexc++ lexer\(cq\& if lexer contains the %interactive directive int main() { Scanner scanner; // by default: read from std::cin while (true) { cout << \(dq\&? \(dq\&; // prompt at each line while (true) // process all the line\(cq\&s tokens { int token = scanner\&.lex(); if (token == \(cq\&\en\(cq\&) // end of line: new prompt break; if (token == 0) // end of input: done return 0; // process other tokens cout << scanner\&.matched() << \(cq\&\en\(cq\&; if (scanner\&.matched()[0] == \(cq\&q\(cq\&) return 0; } } } .fi .PP .SH "6\&. SPECIFICATION FILE(S)" .PP \fBFlexc++\fP expects an input file containing directives and the regular expressions that should be recognized by objects of the scanner class generated by \fBflexc++\fP\&. In this man page the elements and organization of \fBflexc++\fP\(cq\&s input file is described\&. .PP \fBFlexc++\fP\(cq\&s input file consists of two sections, separated from each other by a line merely containing two consecutive percent characters: .nf %% .fi The section before this separator contains directives; the section following this separator contains regular expressions and possibly actions to perform when these regular expressions are matched by the object of the scanner class generated by \fBflexc++\fP\&. .PP White space is usually ignored, as is comment, which may be of the traditional \fBC\fP form (i\&.e\&., \fI/*\fP, followed by (possibly multi\-line) comment text, followed by \fI*/\fP, and it may be \fBC++\fP end\-of\-line comment: two consecutive slashes (\fI//\fP) start the comment, which continues up to the next newline character\&. .PP .SH "6\&.1\&. FILE SWITCHING" .PP \fBFlexc++\fP\(cq\&s input file may be split into multiple files\&. This allows for the definition of logically separate elements of the specifications in different files\&. Include directives must be specified on a line of their own\&. To switch to another specification file the following stanza is used: .nf //include file\-location .fi The \fI//include\fP directive starts in the line\(cq\&s first column\&. File locations can be absolute or relative to the location of the file containing the \fI//include\fP directive\&. White space characters following \fI//include\fP and before the end of the line are ignored\&. The file specification may be surrounded by double quotes, but these double quotes are not required and are ignored (removed) if present\&. All remaining characters are expected to define the name of the file where \fBflexc++\fP\(cq\&s rules specifications continue\&. Once end of file of a sub\-file has been reached, processing continues at the line beyond the \fI//include\fP directive of the previously scanned file\&. The end\-of\-file of the file that was initially specified when \fBflexc++\fP was called indicates the end of \fBflexc++\fP\(cq\&s rules specification\&. .PP .SH "6\&.2\&. DIRECTIVES" .PP The first section of \fBflexc++\fP\(cq\&s input file consists of directives\&. In addition it may associate regular expressions with symbolic names, allowing you to use these identifiers in the rules section\&. Each directive is defined on a line of its own\&. When available, directives are overridden by \fBflexc++\fP command line options\&. .PP Some directives require arguments, which are usually provided following separating (but optional) \fI=\fP characters\&. Arguments of directives, are text, surrounded by double quotes (strings)\&. If a string must itself contain a double quote or a backslash, then precede these characters by a backslash\&. The exceptions are the \fI%s\fP and \fI%x\fP directives, which are immediately followed by name lists, consisting of identifiers separated by blanks\&. Here is an example of the definition of a directive: .nf %class\-name = \(dq\&MyScanner\(dq\& .fi .PP The following directives are available: .PP .IP o \fB%baseclass\-header\fP \fI= \(dq\&header\(dq\&\fP .br Defines the pathname of the file containing the scanner class\(cq\&s base class interface\&. Corresponding command\-line option: \fI\-\-baseclass\-header\fP\&. .IP o \fB%case\-insensitive\fP .br Generates a scanner \fIcase insensitively\fP matching regular expressions\&. All regular expressions specified in \fBflexc++\fP\(cq\&s input file are interpreted case insensitively and the resulting scanner object will case insensitively interpret its input\&. .IP Corresponding command\-line option: \fI\-\-cases\-insensitive\fP\&. .IP When this directive is specified the resulting scanner does not distinguish between the following rules: .nf First // initial F is transformed to f first FIRST // all capitals are transformed to lower case chars .fi With a case\-insensitive scanner only the first rule can be matched, and \fBflexc++\fP will issue warnings for the second and third rule about rules that cannot be matched\&. .IP Input processed by a case\-insensitive scanner is also handled case insensitively\&. The above mentioned \fIFirst\fP rule is matched for all of the following input words: \fIfirst First FIRST firST\fP\&. .IP Although the matching process proceeds case insensitively, the matched text (as returned by the scanner\(cq\&s \fImatched()\fP member) always contains the original, unmodified text\&. So, with the above input \fImatched()\fP returns, respectively \fIfirst, First, FIRST\fP and \fIfirST\fP, while matching the rule \fIFirst\fP\&. .IP .IP o \fB%class\-header\fP \fI= \(dq\&header\(dq\&\fP .br Defines the pathname of the file containing the scanner class\(cq\&s interface\&. Corresponding command\-line option: \fI\-\-class\-header\fP\&. .IP o \fB%class\-name\fP \fI = \(dq\&class\-name\(dq\&\fP .br Declares the name of the scanner class generated by \fBflexc++\fP\&. This directive corresponds to the \fI%name\fP directive used by \fBflex++\fP(1)\&. Contrary to \fBflex++\fP\(cq\&s \fI%name\fP declaration, \fIclass\-name\fP may appear anywhere in the first section of the grammar specification file\&. It may be defined only once\&. If no \fIclass\-name\fP is specified the default class name (\fIScanner\fP) is used\&. Corresponding command\-line option: \fI\-\-class\-name\fP\&. .IP o \fB%debug\fP .br Provide \fIlex\fP and its support functions with debugging code, showing the actual parsing process on the standard output stream\&. When included, the debugging output is active by default, but its activity may be controlled using the \fIsetDebug(bool on\-off)\fP member\&. Note that no \fI#ifdef DEBUG\fP macros are used in the generated code\&. .IP .IP o \fB%implementation\-header\fP \fI= \(dq\&header\(dq\&\fP .br Defines the pathname of the file to contain the implementation header\&. Corresponding command\-line option: \fI\-\-implementation\-header\fP\&. .IP o \fB%input\-implementation\fP \fI= \(dq\&sourcefile\(dq\&\fP .br Defines the pathname of the file containing the implementation of a user\-defined \fIInput\fP class\&. .IP o \fB%input\-interface\fP \fI= \(dq\&interface\(dq\&\fP .br Defines the pathname of the file containing the interface of a user\-defined \fIInput\fP class\&. See \fBinput\fP(3flexc++) for additional information about user\-defined \fIInput\fP classes\&. .IP o \fB%interactive\fP .br Generate an interactive scanner\&. An interactive scanner reads lines from the input stream, and then returns the tokens encountered on that line\&. The interactive scanner implemented by \fBflexc++\fP only predefines the \fIScanner(std::istream &in, std::ostream &out)\fP constructor, by default assuming that input is read from \fIstd::cin\fP\&. See also the \fIINTERACTIVE SCANNER\fP section in \fBflexc++\fP(1)\&. .IP o \fB%lex\-function\-name\fP \fI= \(dq\&funname\(dq\&\fP .br Defines the name of the scanner class\(cq\&s member to perform the lexical scanning\&. If this directive is omitted the default name (\fIlex\fP) is used\&. Corresponding command\-line option: \fI\-\-lex\-function\-name\fP\&. .IP o \fB%lex\-source\fP \fI= \(dq\&source\(dq\&\fP .br Defines the pathname of the file to contain the scanner member \fIlex\fP\&. Corresponding command\-line option: \fI\-\-lex\-source\fP\&. .IP o \fB%no\-lines\fP .br Do not put \fI#line\fP preprocessor directives in the file containing the scanner\(cq\&s \fIlex\fP function\&. If omitted \fI#line\fP directives are added to this file, unless overridden by the command line options \fI\-\-lines\fP and \fI\-\-no\-lines\fP\&. .IP o \fB%namespace\fP \fI= \(dq\&namespace\(dq\&\fP .br Define the scanner class in the namespace \fInamespace\fP\&. By default no namespace is used\&. If this options is used the implementation header is provided with a commented out \fIusing namespace\fP declaration for the requested namespace\&. This directive is overridden by the \fI\-\-namespace\fP command\-line option\&. .IP o \fB%print\-tokens\fP .br This option results in the tokens as well as the matched text to be displayed on the standard output stream, just before returning the token to \fIlex\fP\(cq\&s caller\&. Displaying is suppressed again when the \fIlex\&.cc\fP file is generated without using this directive\&. The function showing the tokens (\fIScannerBase::print__\fP) is called from \fIScanner::print()\fP, which is defined in\-line in \fIscanner\&.h\fP\&. Calling \fIScannerBase::print__\fP, therefore, can also easily be controlled by an option controlled by the program using the scanner object\&. This option does \fInot\fP show the tokens returned and text matched by \fBflexc++\fP itself when reading its input \fBs\fP\&. If that is what you want, use the \fI\-\-own\-tokens\fP option\&. .IP o \fB%s\fP \fInamelist\fP .br The \fI%s\fP directive is followed by a list of one or more identifiers, separated by blanks\&. Each identifier is the name of an \fIinclusive mini scanner\fP\&. .IP o \fB%skeleton\-directory\fP \fI= \(dq\&path\(dq\&\fP .br Use \fIpath\fP rather than the default (e\&.g\&., \fI/usr/share/flexc++\fP) path when looking for \fBflexc++\fP\(cq\&s skeleton files\&. Corresponding command\-line option: \fI\-\-skeleton\-directory\fP\&. .IP o \fB%target\-directory\fP \fI= \(dq\&path\(dq\&\fP .br Generate files in \fIpath\fP rather than in \fBflexc++\fP\(cq\&s input file\(cq\&s directory\&. The \fI%target\-directory\fP option does not affect files that were explicitly named (either as option or as directive)\&. .IP o \fB%x\fP \fInamelist\fP .br The \fI%x\fP directive is followed by a list of one or more identifiers, separated by blanks\&. Each identifier is the name of an \fIexclusive mini scanner\fP\&. .PP .SH "6\&.3\&. MINI SCANNERS" .PP Mini scanners come in two flavors: inclusive mini scanners and exclusive mini scanners\&. The rules that apply to an inclusive mini scanner are the mini scanner\(cq\&s own rules as well as the rules which apply to no mini scanners in particular (i\&.e\&., the rules that apply to the default (or \fIINITIAL\fP) mini scanner)\&. Exclusive mini scanners only use the rules that were defined for them\&. .PP To define an inclusive mini scanner use \fI%s\fP, followed by one or more identifiers specifying the name(s) of the mini\-scanner(s)\&. To define an exclusive mini scanner use \fI%x\fP, followed by or more identifiers specifying the name(s) of the mini\-scanner(s)\&. The following example defines the names of two mini scanners: \fIstring\fP and \fIcomment\fP: .nf %x string comment .fi Following this, rules defined in the context of the \fIstring\fP mini scanner (see below) will only be used when that mini scanner is active\&. .PP A \fBflexc++\fP input file may contain multiple \fI%s\fP and \fI%x\fP specifications\&. .PP .SH "6\&.4\&. DEFINITIONS" .PP Definitions are of the form .nf identifier regular\-expression .fi Each definition must be entered on a line of its own\&. Definitions associate identifiers with regular expressions, allowing the use of \fI${identifier}\fP as synonym for its regular expression in the rules section of the \fBflexc++\fP input file\&. One defined, the identifiers representing regular expressions can also be used in subsequent definitions\&. .PP Example: .nf FIRST [A\-Za\-z_] NAME {FIRST}[\-A\-Za\-z0\-9_]* .fi .PP .SH "6\&.5\&. %% SEPARATOR" .PP Following directives and definitions a line merely containing two consecutive \fI%\fP characters is expected\&. Following this line the rules are defined\&. Rules consist of regular expressions which should be recognized, possibly followed by actions to be executed once a rule\(cq\&s regular expression has been matched\&. .PP .SH "6\&.6\&. REGULAR EXPRESSIONS" .PP The regular expressions defined in \fBflexc++\fP\(cq\&s rules files are matched against the information passed to the scanner\(cq\&s \fIlex\fP function\&. .PP Regular expressions begin as the first non\-blank character on a line\&. Comment is interpreted as comment as long as it isn\(cq\&t part of the regular expresssion\&. To define a regular expression starting with two slashes (at least) the first slash can be escaped or double quoted\&. (E\&.g\&., \fI\(dq\&//\(dq\&\&.*\fP defines \fBC++\fP comment to end\-of\-line)\&. .PP Regular expressions end at the first blank character (to add a blank character, e\&.g\&., a space character, to a regular expression, prefix it by a backslash or put it in a double\-quoted string)\&. .PP Actions may be associated with regular expressions\&. At a match the action that is associated with the regular expression is executed, after which scanning continues when the lexical scanning function (e\&.g\&., \fIlex\fP) is called again\&. Actions are not required, and regular expressions can be defined without any actions at all\&. If such action\-less regular expressions are matched then the match is performed silently, after which processing continues\&. .PP \fBFlexc++\fP tries to match as many characters of the input file as possible (i\&.e\&., it uses `greedy matching\(cq\&)\&. Non\-greedy matching is accomplished by a combination of a scanner and parser and/or by using the `lookahead\(cq\& operator (\fI/\fP)\&. .PP The following regular expression `building blocks\(cq\& are available\&. More complex regular expressions are created by combining them: .PP .IP "\fIx\fP" the character `x\(cq\& .IP "\fI\&.\fP" any character (byte) except newline .IP "\fI[xyz]\fP" a character class; in this case, the pattern matches either an `x\(cq\&, a `y\(cq\&, or a `z\(cq\& .IP "\fI[abj\-oZ]\fP" a character class containing a range; matches an `a\(cq\&, a `b\(cq\&, any letter from `j\(cq\& through `o\(cq\&, or a `Z\(cq\& .IP "\fI[^A\-Z]\fP" a negated character class, i\&.e\&., any character except for those in the class\&. In this example, any non\-capital character\&. .IP "\fI\(dq\&[xyz]\e\(dq\&foo\(dq\&\fP" text between double quotes matches the literal string: \fI[xyz]\(dq\&foo\fP\&. .IP "\fI\eX\fP" if X is `a\(cq\&, `b\(cq\&, `f\(cq\&, `n\(cq\&, `r\(cq\&, `t\(cq\&, or `v\(cq\&, then the ANSI\-C interpretation of `\ex\(cq\& is matched\&. Otherwise, a literal `X\(cq\& is matched (this is used to escape operators such as `*\(cq\&)\&. .IP "\fI\e0\fP" a NUL character (ASCII code 0)\&. .IP "\fI\e123\fP" the character with octal value 123\&. .IP "\fI\ex2a\fP" the character with hexadecimal value 2a\&. .IP "\fI(r)\fP" the regular expression `r\(cq\&; parentheses are used to override precedence (see below) .IP "\fI{name}\fP" the expansion of the `name\(cq\& definition\&. .IP "\fIr*\fP" zero or more regular expressions `r\(cq\&\&. This also matches the empty string\&. .IP "\fIr+\fP" one or more regular expressions `r\(cq\&\&. .IP "\fIr?\fP" zero or one regular expression `r\(cq\&\&. This also matches the empty string\&. .IP "\fIrs\fP" the regular expression `r\(cq\& followed by the regular expression `s\(cq\&; called concatenation .IP "\fIr{m, n}\fP" regular expression `r\(cq\& at least m, but at most n times (\fI1 <= m <= n\fP)\&. .IP "\fIr{m,}\fP" regular expression `r\(cq\& m or more times (\fI1 <= m\fP)\&. .IP "\fIr{m}\fP" regular expression `r\(cq\& exactly m times (\fI1 <= m\fP)\&. .IP "\fIr|s\fP" either regular expression `r\(cq\& or regular expression `s\(cq\& .IP "\fIr/s\fP" regular expression `r\(cq\& if it is followed by regular expression `s\(cq\&\&. The text matched by `s\(cq\& is included when determining whether this rule results in the longest match, but `s\(cq\& is then returned to the input before the rule\(cq\&s action (if defined) is executed\&. .IP "\fI^r\fP" a regular expression `r\(cq\& at the beginning of a line or file\&. .IP "\fIr$\fP" a regular expression `r\(cq\&, occurring at the end of a line\&. This pattern is identical to `r/\en\(cq\&\&. .IP "\fIr\fP" a regular exprression `r\(cq\& in start condition `s\(cq\& .IP "\fIr\fP" a regular exprression `r\(cq\& in start conditions s1, s2, or s3\&. .IP "\fI<*>r\fP" a regular exprression `r\(cq\& in all start conditions\&. .IP "\fI<\fP\fI\fP\fI>\fP" an end\-of\-file\&. .IP "\fI<\fP\fI\fP\fI>\fP" an end\-of\-file when in start conditions s1 or s2 .PP Inside a character class all regular expression operators lose their special meanings, except for the escape character (\fI\e\fP) and the character class operators \fI\-\fP, \fI]]\fP, and, at the beginning of the class, \fI^\fP\&. To add a closing bracket to a character class use \fI[]\fP\&. To add a closing bracket to a negated character class use \fI[^]\fP\&. Once a character class has started, all subsequent character (ranges) are added to the set, until the final closing bracket (\fI]\fP) has been reached\&. .PP The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom\&. From lowest to highest precedence, the operators are: .IP o \fI|\fP: the or\-operator at the end of a line (instead of an action) indicates that this expression\(cq\&s action is identical to the action of the next rule\&. .IP o \fI/\fP: the look\-ahead operator; .IP o \fI|\fP: the or\-operator withn a regular expression; .IP o \fICHAR\fP: individual elements of the regular expression: characters, strings, quoted characters, escaped characters, character sets etc\&. are all considered \fICHAR\fP elements\&. Multiple \fICHAR\fP elements can be combined by enclosing them in parentheses (e\&.g\&., \fI(abc)+\fP indicates sequences of \fIabc\fP characters, like \fIabcabcabc\fP); .IP o \fI*, ?, +, {\fP: multipliers: .br \fI?\fP: zero or one occurrence of the previous element; .br \fI+\fP: one or more repetitions of the previous element; .br \fI*\fP: zero or more repetitions of the previous element; .br \fI{\&.\&.\&.}\fP: interval specification: a specified number of repetitions of the previous element (see above for specific forms of the interval specification) .IP o \fI{+}, {\-}\fP: set operators (\fI{+}\fP computing the union of two sets, \fI{\-}\fP computing the difference of the left\-hand side set minus the elements in the right\-hand side set); .PP The lex standard defines concatenation as having a higher precedence than the interval expression\&. This is different from many other regular expression engines, and \fBflexc++\fP follows these latter engines, giving all `multiplication operators\(cq\& equal priority\&. .PP Name expansion has the same precedence as grouping (using parentheses to influence the precedence of the other operators in the regular expression)\&. Since the name expansion is treated as a group in \fBflexc++\fP, it is not allowed to use the lookahead operator in a name definition (a named pattern, defined in the definition section)\&. .PP Character classes can also contain character class expressions\&. These are expressions enclosed inside \fI[:\fP and \fI:]\fP delimiters (which themselves must appear between the \fI[\fP and \fI]\fP of the character class\&. Other elements may occur inside the character class as well)\&. The character class expressions are: .nf [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] .fi .PP Character class expressions designate a set of characters equivalent to the corresponding standard \fBC\fP isXXX function\&. For example, \fI[:alnum:]\fP designates those characters for which \fIisalnum\fP returns true \- i\&.e\&., any alphabetic or numeric character\&. For example, the following character classes are all equivalent: .nf [[:alnum:]] [[:alpha:][:digit:]] [[:alpha:][0\-9]] [a\-zA\-Z0\-9] .fi .PP A negated character class such as the example \fI[^A\-Z]\fP above will match a newline unless \fI\en\fP (or an equivalent escape sequence) is one of the characters explicitly present in the negated character class (e\&.g\&., \fI[^A\-Z\en]\fP)\&. This differs from the way many other regular expression tools treat negated character classes, but unfortunately the inconsistency is historically entrenched\&. Matching newlines means that a pattern like \fI[^\(dq\&]*\fP can match the entire input unless there\(cq\&s another quote in the input\&. .PP \fBFlexc++\fP allows negation of character class expressions by prepending \fI^\fP to the POSIX character class name\&. .nf [:^alnum:] [:^alpha:] [:^blank:] [:^cntrl:] [:^digit:] [:^graph:] [:^lower:] [:^print:] [:^punct:] [:^space:] [:^upper:] [:^xdigit:] .fi .PP The \fI{\-}\fP operator computes the difference of two character classes\&. For example, \fI[a\-c]{\-}[b\-z]\fP represents all the characters in the class \fI[a\-c]\fP that are not in the class \fI[b\-z]\fP (which in this case, is just the single character \fIa\fP)\&. The \fI{\-}\fP operator is left associative, so \fI[abc]{\-}[b]{\-}[c]\fP is the same as \fI[a]\fP\&. .PP The \fI{+}\fP operator computes the union of two character classes\&. For example, \fI[a\-z]{+}[0\-9]\fP is the same as \fI[a\-z0\-9]\fP\&. This operator is useful when preceded by the result of a difference operation, as in, \fI[[:alpha:]]{\-}[[:lower:]]{+}[q]\fP, which is equivalent to \fI[A\-Zq]\fP in the \fBC\fP locale\&. .PP A rule can have at most one instance of trailing context (the \fI/\fP operator or the \fI$\fP operator)\&. The start condition, \fI^\fP, and \fI<>\fP patterns can only occur at the beginning of a pattern, and cannot be surrounded by parentheses\&. The characters \fI^\fP and \fI$\fP only have their special properties at, respectively, the beginning and end of regular expressions\&. In all other cases they are treated as a normal characters\&. .PP .SH "6\&.7\&. SPECIFICATION EXAMPLE" .PP .nf %option debug %x comment NAME [[:alpha:]][_[:alnum:]]* %% \(dq\&//\(dq\&\&.* // ignore \(dq\&/*\(dq\& begin(comment); \&.|\en // ignore \(dq\&*/\(dq\& begin(INITIAL); ^a return 1; a return 2; a$ return 3; {NAME} return 4; \&.|\en // ignore .fi .PP ) .PP .SH "7\&. THE CLASS INTERFACE: SCANNER\&.H" .PP By default, \fBflexc++\fP generates a file \fIscanner\&.h\fP containing the initial interface of the scanner class performing the lexical scan according to the specifications given in \fBflexc++\fP\(cq\&s input file\&. The name of the file that is generated can easily be changed using \fBflexc++\fP\(cq\&s \fI\-\-class\-header\fP option\&. In this man\-page we\(cq\&ll stick to using the default name\&. .PP The file \fIscanner\&.h\fP is generated only once, unless an explicit request is made to rewrite it (using \fBflexc++\fP\(cq\&s \fI\-\-force\-class\-header\fP option)\&. .PP The provided interface is very light\-weight, primarily offering a link to the scanner\(cq\&s base class (see \fBscannerbase\&.h\fP(3flexc++)\&. .PP \fBMany of the facilities offered by the scanner class are inherited from the \fIScannerBase\fP base class, and the reader should consult\fP \fBscannerbase\&.h\fP(3flexc++) \fBfor an overview of additional facilities offered by the \fIScanner\fP class\&.\fP .PP .SH "7\&.1\&. NAMING CONVENTION" .PP All symbols that are required by the generated scanner class end in two consecutive underscore characters (e\&.g\&., \fIexecuteAction__\fP)\&. These names should not be redefined\&. As they are part of the \fIScanner\fP and \fIScannerBase\fP class their scope is immediately clear and confusion with identically named identifiers elsewhere is unlikely\&. .PP Some member functions do not use the underscore convention\&. These are the scanner class\(cq\&s constructors, or names that are similar or equal to names that have historically been used (e\&.g\&., \fIlength\fP)\&. Also, some functions are offered offering hooks into the implementation (like \fIpreCode\fP)\&. The latter category of function also have names that don\(cq\&t end in underscores\&. .PP .SH "7\&.2 CONSTRUCTORS" .PP .IP o \fBexplicit Scanner(std::istream &in = std::cin, std::ostream &out = std::cout)\fP This constructor by default reads information from the standard input stream and writes to the standard output stream\&. When the \fIScanner\fP object goes out of scope the input and output files are closed\&. .IP With interactive scanners input stream switching or stacking is not available; switching output streams, however, is\&. .IP .IP o \fBScanner(std::string const &infile, std::string const &outfile)\fP This constructor opens the input and output streams whose file names were specified\&. When the \fIScanner\fP object goes out of scope the input and output files are closed\&. If \fIoutfile == \(dq\&\-\(dq\&\fP then the standard output stream is used as the scanner\(cq\&s output medium; if \fIoutfile == \(dq\&\(dq\&\fP then the standard error stream is used as the scanner\(cq\&s output medium\&. .IP \fBThis constructor is not available with interactive scanners\&.\fP .PP .SH "7\&.3\&. PUBLIC MEMBER FUNCTIONS" .PP .IP o \fBint lex()\fP The \fIlex\fP function performs the lexical scanning of the input file specified at construction time (but also see \fBscannerbase\&.h\fP(3flexc++) for information about intermediate stream\-switching facilities)\&. It returns an \fIint\fP representing the \fItoken\fP associated with the matched regular expression\&. The returned value 0 indicates end\-of\-file\&. Considering its default implementation, it could be redefined by the user\&. \fILex\fP\(cq\&s default implementation merely calls \fIlex__\fP: .nf inline int Scanner::lex() { return lex__(); } .fi .IP \fBCaveat\fP: with interactive scanners the \fIlex\fP function is defined in the generated \fIlex\&.cc\fP file\&. Once \fBflexc++\fP has generated the scanner class header file this scanner class header file isn\(cq\&t automatically rewritten by \fBflexc++\fP\&. If, at some later stage, an interactive scanner must be generated, then the inline \fIlex\fP implementation must be removed `by hand\(cq\& from the scanner class header file\&. Likewise, a \fIlex\fP member implementation (like the above) must be provided `by hand\(cq\& if a non\-interactive scanner is required after first having generated files implementing an interactive scanner\&. .PP .SH "7\&.4\&. PRIVATE MEMBER FUNCTIONS" .PP .IP o \fBint lex__()\fP This function is used internally by \fIlex\fP and should not otherwise be used\&. .IP o \fBint executeAction__()\fP This function is used internally by \fIlex\fP and should not otherwise be used\&. .IP o \fBvoid preCode()\fP By default this function has an empty, inline implementation in \fIscanner\&.h\fP\&. It can safely be replaced by a user\-defined implementation\&. This function is called by \fIlex__\fP, just before it starts to match input characters against its rules: \fIpreCode\fP is called by \fIlex__\fP when \fIlex__\fP is called and also after having executed the actions of a rule which did not execute a \fIreturn\fP statement\&. The outline of \fIlex__\fP\(cq\&s implementation looks like this: .nf int Scanner::lex__() { \&.\&.\&. preCode(); while (true) { size_t ch = get__(); // fetch next char \&.\&.\&. switch (actionType__(range)) // determine the action { \&.\&.\&. maybe return } \&.\&.\&. no return, continue scanning preCode(); } // while } .fi .IP o \fBvoid print()\fP When the \fI\-\-print\-tokens\fP or \fI%print\-tokens\fP directive is used this function is called to display, on the standard output stream, the tokens returned and text matched by the scanner generated by \fBflexc++\fP\&. .IP Displaying is suppressed when the \fIlex\&.cc\fP file is (re)generated without using this directive\&. The function actually showing the tokens (\fIScannerBase::print__\fP) is called from \fIprint\fP, which is defined in\-line in \fIscanner\&.h\fP\&. Calling \fIScannerBase::print__\fP, therefore, can also easily be controlled by an option controlled by the program using the scanner object\&. .PP .SH "7\&.5\&. SCANNER CLASS HEADER EXAMPLE" .PP .nf #ifndef Scanner_H_INCLUDED_ #define Scanner_H_INCLUDED_ // $insert baseclass_h #include \(dq\&scannerbase\&.h\(dq\& class Scanner: public ScannerBase { public: explicit Scanner(std::istream &in = std::cin, std::ostream &out = std::cout); Scanner(std::string const &infile, std::string const &outfile); // $insert lexFunctionDecl int lex(); private: int lex__(); int executeAction__(size_t ruleNr); void preCode(); // re\-implement this function for code to be // exec\(cq\&ed before the pattern matching starts }; inline void Scanner::preCode() { // optionally replace by your own code } inline Scanner::Scanner(std::istream &in, std::ostream &out) : ScannerBase(in, out) {} inline Scanner::Scanner(std::string const &infile, std::string const &outfile) : ScannerBase(infile, outfile) {} // $insert inlineLexFunction inline int Scanner::lex() { return lex__(); } #endif // Scanner_H_INCLUDED_ .fi .PP .SH "8\&.1\&. THE SCANNER BASE CLASS() .PP By default, \fBflexc++\fP generates a file \fIscannerbase\&.h\fP containing the interface of the base class of the scanner class also generated by \fBflexc++\fP\&. The name of the file that is generated can easily be changed using \fBflexc++\fP\(cq\&s \fI\-\-baseclass\-header\fP option\&. In this man\-page we use the default name\&. .PP The file \fIscanner\&.h\fP is generated at each new \fBflexc++\fP run\&. It contains no user\-serviceable or extensible parts\&. Rewriting can be prevented by specifying \fBflexc++\fP\(cq\&s \fI\-\-no\-baseclass\-header\fP option" \&. .PP .SH "8\&.2\&. PUBLIC ENUMS AND \-TYPES" .PP .IP o \fBenum class StartCondition__\fP This strongly typed enumeration defines the names of the start conditions (i\&.e\&., mini scanners)\&. It at least contains \fIINITIAL\fP, but when the \fI%s\fP or \fI%x\fP directives were used it also contains the identifiers of the mini scanners declared by these directives\&. Since \fIStartCondition__\fP is a strongly typed enum its values must be preceded by its enum name\&. E\&.g\&., .nf begin(StartCondition__::INITIAL); .fi .PP .SH "8\&.3\&. PROTECTED ENUMS AND \-TYPES" .PP .IP o \fBenum class ActionType__\fP This strongly typed enumeration is for internal use only\&. .IP o \fBenum Leave__\fP This enumeration is for internal use only\&. .PP .SH "8\&.4\&. NO PUBLIC CONSTRUCTORS" .PP There are no public constructors\&. \fIScannerBase\fP is a base class for the \fIScanner\fP class generated by \fBflexc++\fP\&. \fIScannerBase\fP only offers protected constructors\&. .PP .SH "8\&.5\&. PUBLIC MEMBER FUNCTIONS" .PP .IP o \fBbool debug() const\fP returns \fItrue\fP if \fI\-\-debug\fP or \fI%debug\fP was specified, otherwise \fIfalse\fP\&. .IP o \fBstd::string const &filename() const\fP returns the name of the file currently processed by the scanner object\&. .IP o \fBsize_t length() const\fP returns the length of the text that was matched by \fIlex\fP\&. With \fBflex++\fP this function was called \fIleng\fP\&. .IP o \fBsize_t lineNr() const\fP returns the line number of the currently scanned line\&. This function is always available (note: \fBflex++\fP only offered a similar function (called \fIlineno\fP) after using the \fI%lineno\fP option)\&. .IP o \fBstd::string const &matched() const\fP returns the text matched by \fIlex\fP (note: \fBflex++\fP offers a similar member called \fIYYText\fP)\&. .IP o \fBvoid setDebug(bool onOff)\fP Switches on/off debugging output by providing the argument \fItrue\fP or \fIfalse\fP\&. Switching on debugging output only has visible effects if the \fIdebug\fP option was specified\&. .IP .IP o \fBvoid switchIstream(std::string const &infilename)\fP The currently processed input stream is closed, and processing continues at the stream whose name is specified as the function\(cq\&s argument\&. This is \fInot\fP a stack\-operation: after processing \fIinfilename\fP processing does not return to the original stream\&. .IP \fBThis member is not available with interactive scanners\&.\fP .IP .IP o \fBvoid switchOstream(std::ostream &out)\fP The currently processed output stream is closed, and new output is written to \fIout\fP\&. .IP .IP o \fBvoid switchOstream(std::string const &outfilename)\fP .IP The current output stream is closed, and output is written to \fIoutfilename\fP\&. If this file already exists, it is rewritten\&. .IP .IP o \fBvoid switchStreams(std::istream &in, std::ostream &out = std::cout)\fP The currently processed input and output streams are closed, and processing continues at \fIin\fP, writing output to \fIout\fP\&. This is \fInot\fP a stack\-operation: after processing \fIin\fP processing does not return to the original stream\&. .IP \fBThis member is not available with interactive scanners\&.\fP .IP .IP o \fBvoid switchStreams(std::string const &infilename, std::string const &outfilename)\fP The currently processed input and output streams are closed, and processing continues at the stream whose name is specified as the function\(cq\&s first argument, writing output to the file whose name is specified as the function\(cq\&s second argument\&. This latter file is rewritten\&. This is \fInot\fP a stack\-operation: after processing \fIinfilename\fP processing does not return to the original stream\&. If \fIoutfilename == \(dq\&\-\(dq\&\fP then the standard output stream is used as the scanner\(cq\&s output medium; if \fIoutfilename == \(dq\&\(dq\&\fP then the standard error stream is used as the scanner\(cq\&s output medium\&. .IP If \fIoutfilename == \(dq\&\-\(dq\&\fP then the standard output stream is used as the scanner\(cq\&s output medium; if \fIoutfilename == \(dq\&\(dq\&\fP then the standard error stream is used as the scanner\(cq\&s output medium\&. .IP \fBThis member is not available with interactive scanners\&.\fP .IP .SH "8\&.6\&. PROTECTED CONSTRUCTORS" .PP .IP o \fBScannerBase(std::string const &infilename, std::string const &outfilename)\fP The scanner object opens and reads \fIinfilename\fP and opens (rewrites) and writes \fIoutfilename\fP\&. It is called from the corresponding \fIScanner\fP constructor\&. .IP \fBThis member is not available for interactive scanners\&.\fP .IP .IP o \fBScannerBase(std::istream &in, std::ostream &out)\fP The \fIin\fP and \fIout\fP parameters are, respectively, the derived class constructor\(cq\&s input stream and output streams\&. .PP .SH "8\&.7\&. PROTECTED MEMBER FUNCTIONS" .PP All member functions ending in two underscore characters are for internal use only and should not be called by user\-defined members of the \fIScanner\fP class\&. .PP The following members, however, can safely be called by members of the generated \fIScanner\fP class: .PP .IP o \fBvoid accept(size_t nChars = 0)\fP \fIaccept(n)\fP returns all but the first `nChars\(cq\& characters of the current token back to the input stream, where they will be rescanned when the scanner looks for the next match\&. So, it matches `nChars\(cq\& of the characters in the input buffer, rescanning the rest\&. This function effectively sets \fIlength\fP\(cq\&s return value to \fInChars\fP (note: with \fBflex++\fP this function was called \fIless\fP); .IP .IP o \fBvoid begin(StartCondition__ startCondition)\fP activate the regular expression rules associated with \fIStartCondition__ startCondition\fP\&. As this enumeration is a strongly typed enum the \fIStartCondition__\fP scope must be specified as well\&. E\&.g\&., .nf begin(StartCondition__::INITIAL); .fi .IP .IP o \fBvoid echo() const\fP The currently matched text (i\&.e\&., the text returned by the member \fImatched\fP) is inserted into the scanner object\(cq\&s output stream; .IP .IP o \fBvoid leave(int retValue)\fP actions defined in the lexical scanner specification file may or may not return\&. This frequently results in complicated or overlong compound statements, blurring the readability of the specification file\&. By encapsulating the actions in a member function readability is enhanced\&. However, frequently a compound statement is still required, as in: .nf regex\-to\-match { if (int ret = memberFunction()) return ret; } .fi The member \fIleave\fP removes the need for constructions like the above\&. The member \fIleave\fP can be called from within member functions encapsulating actions performed when a regular expression has been matched\&. It ends \fIlex\fP, returning \fIretValue\fP to its caller\&. The above rule can now be written like this: .nf regex\-to\-match memberFunction(); .fi and \fImemberFunction\fP could be implemented as follows: .nf void memberFunction() { if (someCondition()) { // any action, e\&.g\&., // switch mini\-scanner begin(StartCondition__::INITIAL); leave(Parser::TOKENVALUE); // lex returns TOKENVALUE // this point is never reached } pushStream(d_matched); // switch to the next stream // lex continues } .fi The member \fIleave\fP should only (indirectly) be called (usually nested) from actions defined in the scanner\(cq\&s specification \fBs\fP; calling \fIleave\fP outside of this context results in undefined behavior\&. .IP .IP o \fBvoid more()\fP the matched text is kept and will be prefixed to the text that is matched at the next lexical scan; .IP .IP o \fBstd::ostream &out()\fP returns a reference to the scanner\(cq\&s output stream; .IP .IP o \fBbool popStream()\fP closes the currently processed input stream and continues to process the most recently stacked input stream (removing it from the stack of streams)\&. If this switch was successfully performed \fItrue\fP is returned, otherwise (e\&.g\&., when the stream stack is empty) \fIfalse\fP is returned; .IP .IP o \fBvoid push(size_t ch)\fP character \fIch\fP is pushed back onto the input stream\&. I\&.e\&., it will be the character that is retrieved at the next attempt to obtain a character from the input stream; .IP .IP o \fBvoid push(std::string const &txt)\fP the characters in the string \fItxt\fP are pushed back onto the input stream\&. I\&.e\&., they will be the characters that are retrieved at the next attempt to obtain characters from the input stream\&. The characters in \fItxt\fP are retrieved from the first character to the last\&. So if \fItxt == \(dq\&hello\(dq\&\fP then the \fI\(cq\&h\(cq\&\fP will be the character that\(cq\&s retrieved next, followed by \fI\(cq\&e\(cq\&\fP, etc, until \fI\(cq\&o\(cq\&\fP; .IP .IP o \fBvoid pushStream(std::istream &curStream)\fP this function pushes \fIcurStream\fP on the stream stack; .IP \fBThis member is not available with interactive scanners\&.\fP .IP .IP o \fBvoid pushStream(std::string const &curName)\fP same, but the stream \fIcurName\fP is opened first, and the resulting \fIistream\fP is pushed on the stream stack; .IP \fBThis member is not available with interactive scanners\&.\fP .IP .IP o \fBvoid redo(size_t nChars = 0)\fP this member acts like \fIaccept\fP but its argument counts backward from the end of the matched text\&. All but these \fInChars\fP characters are kept and the last \fInChar\fP characters are rescanned\&. This function effectively reduces \fIlength\fP\(cq\&s return value by \fInChars\fP; .IP .IP o \fBvoid setFilename(std::string const &name)\fP this function sets the name of the stream returned by \fIfilename\fP to \fIname\fP; .IP .IP o \fBvoid setMatched(std::string const &text)\fP this function stores \fItext\fP in the matched text buffer\&. Following a call to this function \fImatched\fP returns \fItext\fP\&. .IP .IP o \fBStartCondition__ startCondition() const\fP returns the currently active start condition (mini scanner); .IP .IP o \fBstd::vector const &streamStack() const\fP returns the vector of currently stacked input streams\&. The vector\(cq\&s size equals 0 unless \fIpushStream\fP has been used\&. So \fBflexc++\fP\(cq\&s input file is not counted here\&. The \fIStreamStruct\fP is a \fIstruct\fP only having one accessible member: \fIstd::string const &pushedName\fP, which holds the name of the pushed stream\&. The vector is used internally as a stack: the stream that was first pushed is found at index position 0, the most recently pushed stream is found at \fIstreamStack()\&.back()\fP\&. .IP \fBThis member is not available with interactive scanners\&.\fP .IP .SH "8\&.8\&. PROTECTED DATA MEMBERS" .PP All protected data members are for internal use only, allowing \fIlex__\fP to access them\&. All of them end in two underscore characters\&. .PP .SH "8\&.9\&. FLEX++ TO FLEXC++ MEMBERS" .PP .TS tab(~); --- lcl --- lcl lcl lcl --- c. Flex++ (old)~~Flexc++ (new) \fIlineno()\fP~~\fIlineNr()\fP \fIYYText()\fP~~\fImatched()\fP \fIless()\fP~~\fIaccept()\fP .TE .PP .SH "9\&.1 THE CLASS INPUT" .PP \fBFlexc++\fP generates a file \fIscannerbase\&.h\fP defining the scanner class\(cq\&s base class, by default named \fIScannerBase\fP (which is the name used in this man\-page)\&. The base class \fIScannerBase\fP contains a nested class \fIInput\fP whose interface looks like this: .nf class Input { public: Input(); Input(std::istream *iStream, size_t lineNr = 1); size_t get(); size_t lineNr() const; void reRead(size_t ch); void reRead(std::string const &str, size_t fmIdx); void close(); }; .fi The members of this class are all required and offer a level in between the operations of \fIScannerBase\fP and \fBflexc++\fP\(cq\&s actual input file that\(cq\&s being processed\&. .PP By default, \fBflexc++\fP provides an implementation for all of \fIInput\fP\(cq\&s required members\&. Therefore, in most situations this man\-page can safely be ignored\&. .PP However, users may define and extend their own \fIInput\fP class and provide \fBflexc++\fP\(cq\&s base class with that \fIInput\fP class\&. To do so \fBflexc++\fP\(cq\&s rules file must contain the following two directives: .nf %input\-implementation = \(dq\&sourcefile\(dq\& %input\-interface = \(dq\&interface\(dq\& .fi Here, \fIinterface\fP is the name of a file containing the class \fIInput\fP\(cq\&s interface\&. This interface is then inserted into \fIScannerBase\fP\(cq\&s interface instead of the default class \fIInput\fP\(cq\&s interface\&. This interface must \fIat least\fP offer the above\-mentioned members and constructors (their functions are described below)\&. The class may contain additional members if required by the user\-defined implementation\&. The implementation itself is expected in \fIsourcefile\fP\&. The contents of this file are inserted in the generated \fIlex\&.cc\fP file instead of \fIInput\fP\(cq\&s default implementation\&. The file \fIsourcefile\fP should probably not have a \fI\&.cc\fP extension to prevent its compilation by a program maintenance utility\&. .PP When the lexical scanner generated by \fBflexc++\fP switches streams using the \fI//include\fP directive (see the \fBrules\fP(3flexc++) man\-page) the input stream that\(cq\&s currently processed is pushed on an \fIInput\fP stack maintained by \fIScannerBase\fP, and processing continues at the file named at the \fI//include\fP directive\&. Once the latter file has been processed, the previously pushed stream is popped off the stack, and processing of the popped stream continues\&. This implies that \fIInput\fP objects must be `stack\-able\(cq\&\&. The required interface is designed to satisfy this requirement\&. .PP .SH "9\&.2\&. CONSTRUCTORS" .PP .IP o \fBInput()\fP The default constructor is used by \fBScannerBase\fP to prepare the stack for \fIInput\fP objects\&. It must make sure that a default (empty) \fIInput\fP object is in a valid state and can be destroyed\&. It serves no further purpose\&. \fIInput\fP objects, however, must support the default (or overloaded) assignment operator\&. .IP o \fBInput(std::istream *iStream, size_t lineNr = 1)\fP This constructor receives a pointer to a dynamically allocated \fIistream\fP object\&. The \fIInput\fP constructor should preserve this pointer when the \fIInput\fP object is pushed on and popped off the stack\&. A \fIshared_ptr\fP probably comes in handy here\&. The \fIInput\fP object becomes the owner of the \fIistream\fP object, albeit that its destructor is \fInot\fP supposed to destroy the \fIistream\fP object\&. Destruction remains the responsibility of the \fIScannerBase\fP object, which calls the \fIInput::close\fP member (see below) when it\(cq\&s time to destroy (close) the stream\&. .IP The new input stream\(cq\&s line counter is set to \fIlineNr\fP, by default 1\&. .PP .SH "9\&.3\&. REQUIRED PUBLIC MEMBER FUNCTIONS" .PP .IP o \fBsize_t get()\fP returns the next character to be processed by the lexical scanner\&. Usually it will be the next character from the \fIistream\fP passed to the \fIInput\fP class at construction time\&. It is never called by the \fIScannerBase\fP object for \fIInput\fP objects defined using \fIInput\fP\(cq\&s default constructor\&. It should return 0x100 once \fIistream\fP\(cq\&s end\-of\-file has been reached\&. .IP o \fBsize_t lineNr() const\fP should return the (1\-based) number of the \fIistream\fP object passed to the \fIInput\fP object\&. At construction time the \fIistream\fP has just been opened and so at that point \fIlineNr\fP should return 1\&. .IP o \fBvoid reRead(size_t ch)\fP if provided with a value smaller than 0x100 \fIch\fP should be pushed back onto the \fIistream\fP, where it becomes the character next to be returned\&. Physically the character doesn\(cq\&t have to be pushed back\&. The default implementation uses a \fIdeque\fP onto which the character is pushed\-front\&. Only when this \fIdeque\fP is exhausted characters are retrieved from the \fIInput\fP object\(cq\&s \fIistream\fP\&. .IP o \fBvoid reRead(std::string const &str, size_t fmIdx)\fP the characters in \fIstr\fP from \fIfmIdx\fP until the string\(cq\&s final character are pushed back onto the \fIistream\fP object so that the string\(cq\&s first character is retrieved first and the string\(cq\&s last character is retrieved last\&. .IP o \fBvoid close()\fP the \fIistream\fP object initially passed to the \fIInput\fP object is deleted by \fIclose\fP, thereby not only freeing the stream\(cq\&s memory, but also closing the stream if the stream in fact was an \fIifstream\fP\&. Note that the \fIInput\fP\(cq\&s destructor should \fInot\fP destroy the \fIInput\fP\(cq\&s \fIistream\fP object\&. .PP .SH "FILES" .PP \fBFlexc++\fP\(cq\&s default skeleton files are in \fI/usr/share/flexc++\fP\&. .br By default, \fBflexc++\fP generates the following files: .IP o \fIscanner\&.h\fP: the header file containing the scanner class\(cq\&s interface\&. .IP o \fIscannerbase\&.h\fP: the header file containing the interface of the scanner class\(cq\&s base class\&. .IP o \fIscanner\&.ih\fP: the internal header file that is meant to be included by the scanner class\(cq\&s source files (e\&.g\&., it is included by \fIlex\&.cc\fP, see the next file), and that should contain all declarations required for compiling the scanner class\(cq\&s sources\&. .IP o \fIlex\&.cc\fP: the source file implementing the scanner class member function \fIlex\fP (and support functions), performing the lexical scan\&. .PP .SH "SEE ALSO" .PP \fBbisonc++\fP(1) .PP .SH "BUGS" .PP .IP o The priority of interval expressions (\fI{\&.\&.\&.}\fP) equals the priority of other multiplicative operators (like \fI*\fP)\&. .IP o All \fIINITIAL\fP rules apply to inclusive mini scanners, also those \fIINITIAL\fP rules that were explicitly associated with the \fIINITIAL\fP mini scanner\&. .PP .SH "ABOUT flexc++" .PP \fBFlexc++\fP was originally started as a programming project by Jean\-Paul van Oosten and Richard Berendsen in the 2007\-2008 academic year\&. After graduating, Richard left the project and moved to Amsterdam\&. Jean\-Paul remained in Groningen, and after on\-and\-off activities on the project, in close cooperation with Frank B\&. Brokken, Frank undertook a rewrite of the project\(cq\&s code around 2010\&. During the development of \fBflexc++\fP, the lookahead\-operator handling continuously threatened the completion of the project\&. By now, the project has evolved to a level that we feel it\(cq\&s defensible to publish the program, although we still tend to consider the program in its experimental stage; it will remain that way until we decide to move its version from the 0\&.9x\&.xx series to the 1\&.xx\&.xx series\&. .PP .SH "COPYRIGHT" This is free software, distributed under the terms of the GNU General Public License (GPL)\&. .PP .SH "AUTHOR" Frank B\&. Brokken (\fBf\&.b\&.brokken@rug\&.nl\fP), .br Jean\-Paul van Oosten (\fBj\&.p\&.van\&.oosten@rug\&.nl\fP), .br Richard Berendsen (\fBrichardberendsen@xs4all\&.nl\fP) (until 2010)\&. .br .PP