.TH "flexc++input" "7" "2008\-2020" "flexc++\&.2\&.08\&.01\&.tar\&.gz" "flexc++ input file organization" .PP .SH "NAME" flexc++input \- Organization of flexc++\(cq\&s input \fBs\fP .PP .SH "DESCRIPTION" .PP \fBFlexc++\fP(1) was designed after \fBflex\fP(1) and \fBflex++\fP(1)\&. Like these two programs \fBflexc++\fP generates code performing pattern\-matching on text, possibly executing actions when certain \fIregular expressions\fP are recognized\&. .PP Refer to \fBflexc++\fP(1) for a general overview\&. This manual page describes how \fBflexc++\fP\(cq\&s input \fBs\fP should be organized\&. It contains the following sections: .PP .IP o \fB1\&. SPECIFICATION FILE(S)\fP: the format and contents of \fBflexc++\fP input files, specifying the Scanner\(cq\&s characteristics .IP o \fB2\&. FILE SWITCHING\fP: how to switch to another input specification file .IP o \fB3\&. DIRECTIVES\fP: directives that can be used in input specification files .IP o \fB4\&. MINI SCANNERS\fP: how to declare mini\-scanners .IP o \fB5\&. DEFINITIONS\fP: how to define symbolic names for regular expressions .IP o \fB6\&. %% SEPARATOR\fP: the separator between the input specification sections .IP o \fB7\&. REGULAR EXPRESSIONS\fP: regular expressions supported by \fBflexc++\fP .IP o \fB8\&. SPECIFICATION EXAMPLE\fP: an example of a specification file .PP .SH "UNDERSCORES" Starting with version 2\&.07\&.00 \fBflexc++\fP reserved identifiers no longer end in two underscore characters, but in one\&. This modification was necessary because according to the \fBC++\fP standard identifiers having two or more consecutive underscore characters are reserved by the language\&. In practice this could require some minor modifications of existing source files using \fBflexc++\fP\(cq\&s facilities, most likely limited to changing \fIStartCondition__\fP into \fIStartCondition_\fP and changing \fIPostEnum__\fP into \fIPostEnum_\fP\&. .PP The complete list of affected names is: .IP "Enums:" .RS ActionType_, Leave_, StartConditon_, PostEnum_; .RE .IP "Member functions:" .RS actionType_, continue_, echoCh_, echoFirst_, executeAction_, getRange_, get_, istreamName_, lex_, lop1_, lop2_, lop3_, lop4_, lopf_, matched_, noReturn_, print_, pushFront_, reset_, return_; .RE .IP "Protected data members:" .RS d_in_ d_token_ s_finIdx_, s_interactive_, s_maxSizeofStreamStack_, s_nRules_, s_rangeOfEOF_, s_ranges_, s_rf_\&. .RE .PP .SH "1\&. SPECIFICATION FILE(S)" .PP \fBFlexc++\fP expects an input file containing directives and the regular expressions that should be recognized by objects of the scanner class generated by \fBflexc++\fP\&. In this man page the elements and organization of \fBflexc++\fP\(cq\&s input file is described\&. .PP \fBFlexc++\fP\(cq\&s input file consists of two sections, separated from each other by a line merely containing two consecutive percent characters: .nf %% .fi The section before this separator contains directives; the section following this separator contains regular expressions and possibly actions to perform when these regular expressions are matched by the object of the scanner class generated by \fBflexc++\fP\&. If a second line is encountered immediately beginning with two consecutive percent characters then this ends \fBflexc++\fP\(cq\&s input file processing\&. See also section 6 (%% SEPARATOR) below\&. .PP White space is usually ignored, as is comment, which may be of the traditional \fBC\fP form (i\&.e\&., \fI/*\fP, followed by (possibly multi\-line) comment text, followed by \fI*/\fP, and it may be \fBC++\fP end\-of\-line comment: two consecutive slashes (\fI//\fP) start the comment, which continues up to the next newline character\&. .PP .SH "2\&. FILE SWITCHING" .PP \fBFlexc++\fP\(cq\&s input file may be split into multiple files\&. This allows for the definition of logically separate elements of the specifications in different files\&. Include directives must be specified on separate lines and may not contain any other information than the (path)name of the file to switch to\&. File names may be surrounded by double quotes, but these double quotes are optional and are ignored (removed) when encountered\&. All remaining characters define the name of the subsequently processed file by \fBflexc++\fP\&. White space characters following \fI//include\fP and preceding the end of the line are ignored\&. To switch files the following stanza is used: .nf //include file\-location .fi The \fI//include\fP directive must start in the line\(cq\&s first column\&. File locations can be absolute or relative to the location of the file containing the \fI//include\fP directive\&. Once \fBflexc++\fP has switched to another file that file\(cq\&s directory becomes \fBflexc++\fP\(cq\&s working directory\&. .PP Once the end of file of a file has been reached, processing continues at the line beyond the \fI//include\fP directive of the previously scanned file, and \fBflexc++\fP\(cq\&s working directory is reset to the working directory of the file to which \fBflexc++\fP returns\&. The end\-of\-file of the file that was initially specified when \fBflexc++\fP started indicates the end of \fBflexc++\fP\(cq\&s rules specification\&. .PP .SH "3\&. DIRECTIVES" .PP The first section of \fBflexc++\fP\(cq\&s input file consists of directives\&. In addition it may associate regular expressions with symbolic names, allowing you to use these identifiers in the rules section\&. Each directive is defined on a line of its own\&. When available, directives are overridden by \fBflexc++\fP command line options\&. .PP Some directives require arguments, which are usually provided following separating (but optional) \fI=\fP characters\&. Arguments of directives are text, surrounded by double quotes (strings), or embedded in raw string literals (rawstrings)\&. Double quotes or backslashes inside strings must themselves be preceded by backslashes; these backslashes are not required when rawstrings are used\&. .PP The \fI%s\fP and \fI%x\fP directives are immediately followed by name lists, consisting of identifiers separated by blanks\&. Here is an example of the definition of a directive: .nf %class\-name = \(dq\&MyScanner\(dq\& .fi .PP Directives accepting a `filename\(cq\& do not accept path names, i\&.e\&., they cannot contain directory separators (\fI/\fP); options accepting a \(cq\&pathname\(cq\& may contain directory separators\&. A \(cq\&pathname\(cq\& using blank characters should be surrounded by double quotes\&. .PP Some directives may generate errors\&. This happens when a directive conflicts with the contents of an existing file which \fBflexc++\fP cannot modify (e\&.g\&., a scanner class header file exists, but doesn\(cq\&t define a name space, but a \fI%namespace\fP directive was provided)\&. To solve the error the offending directive could be omitted, the existing file could be removed, or the existing file could be hand\-edited according to the directive\(cq\&s specification\&. Note that \fBflexc++\fP currently does not handle the opposite error condition: if a previously used directive is omitted, then \fBflexc++\fP does not detect the inconsistency\&. In those cases you may encounter compilation errors\&. .PP .IP o \fB%baseclass\-header\fP \fI= \(dq\&filename\(dq\&\fP .br Defines the name of the file to contain the scanner class\(cq\&s base class interface\&. Corresponding command\-line option: \fI\-\-baseclass\-header\fP\&. .IP It is an error if this directive is used and an already existing scanner\-class header file does not include \fI`filename\(cq\&\fP\&. .IP .IP o \fB%case\-insensitive\fP .br Generates a scanner which \fIcase insensitively\fP matches regular expressions\&. All regular expressions specified in \fBflexc++\fP\(cq\&s input file are interpreted case insensitively and the resulting scanner object will case insensitively interpret its input\&. .IP Corresponding command\-line option: \fI\-\-cases\-insensitive\fP\&. .IP When this directive is specified the resulting scanner does not distinguish between the following rules: .nf First // initial F is transformed to f first FIRST // all capitals are transformed to lower case chars .fi With a case\-insensitive scanner only the first rule can be matched, and \fBflexc++\fP will issue warnings for the second and third rule about rules that cannot be matched\&. .IP Input processed by a case\-insensitive scanner is also handled case insensitively\&. The above mentioned \fIFirst\fP rule is matched for all of the following input words: \fIfirst First FIRST firST\fP\&. .IP Although the matching process proceeds case insensitively, the matched text (as returned by the scanner\(cq\&s \fImatched()\fP member) always contains the original, unmodified text\&. So, with the above input \fImatched()\fP returns, respectively \fIfirst, First, FIRST\fP and \fIfirST\fP, while matching the rule \fIFirst\fP\&. .IP .IP o \fB%class\-header\fP \fI= \(dq\&filename\(dq\&\fP .br Defines the name of the file to contain the scanner class\(cq\&s interface\&. Corresponding command\-line option: \fI\-\-class\-header\fP\&. .IP .IP o \fB%class\-name\fP \fI = \(dq\&className\(dq\&\fP .br Declares the name of the scanner class generated by \fBflexc++\fP\&. This directive corresponds to the \fI%name\fP directive used by \fBflex++\fP(1)\&. Contrary to \fBflex++\fP\(cq\&s \fI%name\fP declaration, \fIclass\-name\fP may appear anywhere in the first section of the grammar specification file\&. It may be defined only once\&. If no \fIclass\-name\fP is specified the default class name (\fIScanner\fP) is used\&. Corresponding command\-line option: \fI\-\-class\-name\fP\&. .IP It is an error if this directive is used and an already existing scanner\-class header file does not define \fIclass `className\(cq\&\fP\&. .IP .IP o \fB%debug\fP .br Provide \fIlex\fP and its support functions with debugging code, showing the actual parsing process on the standard output stream\&. When included, the debugging output is active by default, but its activity may be controlled using the \fIsetDebug(bool on\-off)\fP member\&. Note that no \fI#ifdef DEBUG\fP macros are used in the generated code\&. .IP .IP o \fB%filenames\fP \fI= \(dq\&basename\(dq\&\fP .br Defines the basename of the \fIScanner\&.h, Scanner\&.ih,\fP and \fIScannerbase\&.h\fP files\&. E\&.g\&., when using the directive .nf %filenames = \(dq\&scanner\(dq\& .fi the names of the generated files are, respectively, \fIscanner\&.h, scanner\&.ih,\fP and \fIscannerbase\&.h\fP\&. Corresponding command\-line option: \fI\-\-filenames\fP\&. The name of the source file (by default \fIlex\&.cc\fP) is controlled by the \fI%lex\-source\fP directive\&. .IP .IP o \fB%implementation\-header\fP \fI= \(dq\&filename\(dq\&\fP .br Defines the name of the file to contain the implementation header\&. Corresponding command\-line option: \fI\-\-implementation\-header\fP\&. .IP It is an error if this directive is used and an already \fI\(cq\&filename\(cq\&\fP file does not include the scanner class header file\&. .IP .IP o \fB%input\-implementation\fP \fI= \(dq\&sourcefile\(dq\&\fP .br Defines the pathname of the file containing the implementation of a user\-defined \fIInput\fP class\&. .IP .IP o \fB%input\-interface\fP \fI= \(dq\&interface\(dq\&\fP .br Defines the pathname of the file containing the interface of a user\-defined \fIInput\fP class\&. See section \fB17\&. THE CLASS INPUT\fP in the \fBflexc++api\fP(3) manual page for additional information about user\-defined \fIInput\fP classes\&. .IP .IP o \fB%interactive\fP .br Generate an interactive scanner\&. An interactive scanner reads lines from the input stream, and then returns the tokens encountered on that line\&. The interactive scanner implemented by \fBflexc++\fP only predefines the \fIScanner(std::istream &in, std::ostream &out)\fP constructor, by default assuming that input is read from \fIstd::cin\fP\&. See also section \fI1\&. INTERACTIVE SCANNER\fP section in the \fBflexc++api\fP(3) manual page\&. .IP .IP o \fB%lex\-function\-name\fP \fI= \(dq\&funname\(dq\&\fP .br Defines the name of the scanner class\(cq\&s member to perform the lexical scanning\&. If this directive is omitted the default name (\fIlex\fP) is used\&. Corresponding command\-line option: \fI\-\-lex\-function\-name\fP\&. .IP .IP o \fB%lex\-source\fP \fI= \(dq\&filename\(dq\&\fP .br Defines the name of the file to contain the scanner member \fIlex\fP\&. Corresponding command\-line option: \fI\-\-lex\-source\fP\&. .IP .IP o \fB%no\-lines\fP .br Do not put \fI#line\fP preprocessor directives in the file containing the scanner\(cq\&s \fIlex\fP function\&. If omitted \fI#line\fP directives are added to this file, unless overridden by the command line options \fI\-\-lines\fP and \fI\-\-no\-lines\fP\&. .IP .IP o \fB%namespace\fP \fI= \(dq\&identifer\(dq\&\fP .br Define the scanner class in the namespace \fIidentifier\fP\&. By default no namespace is used\&. If this directives is used the implementation header is provided with a commented out \fIusing namespace\fP declaration for the requested namespace\&. In addition, the scanner and scanner base class header files also use the specified namespace to define their include guard directives\&. .IP It is an error if this directive is used and an already scanner\-class header file does not define \fInamespace identifier\fP\&. .IP .IP o \fB%print\-tokens\fP .br this directive results in the tokens as well as the matched text to be displayed on the standard output stream, just before returning the token to \fIlex\fP\(cq\&s caller\&. Displaying is suppressed again when the \fIlex\&.cc\fP file is generated without using this directive\&. The function showing the tokens (\fIScannerBase::print_\fP) is called from \fIScanner::print()\fP, which is defined in\-line in \fIScanner\&.h\fP\&. Calling \fIScannerBase::print_\fP, therefore, can also easily be controlled by an option controlled by the program using the scanner object\&. this directive does \fInot\fP show the tokens returned and text matched by \fBflexc++\fP itself when reading its input \fBs\fP\&. If that is what you want, use the \fI\-\-own\-tokens\fP option\&. .IP .IP o \fB%s\fP \fInamelist\fP .br The \fI%s\fP directive is followed by a list of one or more identifiers, separated by blanks\&. Each identifier is the name of an \fIinclusive start condition\fP\&. .IP .IP o \fB%skeleton\-directory\fP \fI= \(dq\&pathname\(dq\&\fP .br Use \fIpathname\fP rather than the default (e\&.g\&., \fI/usr/share/flexc++\fP) path when looking for \fBflexc++\fP\(cq\&s skeleton files\&. Corresponding command\-line option: \fI\-\-skeleton\-directory\fP\&. .IP .IP o \fBstartcondition\-name\fP \fI = \(dq\&startconditionName\(dq\&\fP .br By default, \fBflexc++\fP defines the enum \fIStartCondition_\fP defining the names of start\-conditions\&. The \fI%startcondition\-name\fP directive can be used to configure another name for the enum containing the names of the start\-conditions\&. It may be defined only once\&. .IP The name of the startcondition\-enum may be modified, and the directive can also be omitted again after it has been specified before\&. When changing the name of the startcondition\-enum or when reverting to the default name newly generated \fIlex\&.cc\fP and \fIScannerBase\&.h\fP files will use the currently defined startcondition\-enum name\&. Be advised, though, that the startcondition\-enum name may also be used in user\-defined members of the scanner\-class, or in the scanner\(cq\&s header and internal header files\&. If so, the user is responsible for updating those files to the currently defined name of the startcondition\-enum\&. .IP .IP o \fB%target\-directory\fP \fI= \(dq\&pathname\(dq\&\fP .br \fIPathname\fP defines the directory where generated files should be written\&. By default this is the directory where \fBflexc++\fP is called\&. This directive is overruled by the \fI\-\-target\-directory\fP command\-line option\&. .IP .IP o \fB%x\fP \fInamelist\fP .br The \fI%x\fP directive is followed by a list of one or more identifiers, separated by blanks\&. Each identifier is the name of an \fIexclusive start condition\fP\&. .PP .SH "4\&. MINI SCANNERS" .PP Mini scanners come in two flavors: inclusive mini scanners and exclusive mini scanners\&. The rules that apply to an inclusive mini scanner are the mini scanner\(cq\&s own rules as well as the rules which apply to no mini scanners in particular (i\&.e\&., the rules that apply to the default (or \fIINITIAL\fP) mini scanner)\&. Exclusive mini scanners only use the rules that were defined for them\&. .PP To define an inclusive mini scanner use \fI%s\fP, followed by one or more identifiers specifying the name(s) of the mini\-scanner(s)\&. To define an exclusive mini scanner use \fI%x\fP, followed by or more identifiers specifying the name(s) of the mini\-scanner(s)\&. The following example defines the names of two mini scanners: \fIstring\fP and \fIcomment\fP: .nf %x string comment .fi Following this, rules defined in the context of the \fIstring\fP mini scanner (see below) will only be used when that mini scanner is active\&. .PP A \fBflexc++\fP input file may contain multiple \fI%s\fP and \fI%x\fP specifications\&. .PP .SH "5\&. DEFINITIONS" .PP Definitions are of the form .nf identifier regular\-expression .fi Each definition must be entered on a line of its own\&. Definitions associate identifiers with regular expressions, allowing the use of \fI${identifier}\fP as synonym for its regular expression in the rules section of \fBflexc++\fP\(cq\&s input file\&. One defined, the identifiers representing regular expressions can also be used in subsequent definitions\&. .PP Example: .nf FIRST [A\-Za\-z_] NAME {FIRST}[\-A\-Za\-z0\-9_]* .fi .PP .SH "6\&. %% SEPARATOR" .PP Following directives and definitions a line merely containing two consecutive \fI%\fP characters is expected\&. Following this line the rules are defined\&. Rules consist of regular expressions which should be recognized, possibly followed by actions to be executed once a rule\(cq\&s regular expression has been matched\&. .PP If the rule section contains a line starting with two consecutive \fI%\fP characters, then any remaining input is ignored\&. Note that this second \fI%%\fP separator does not have to be specified\&. It is purely optional\&. To specify a regular expression starting with \fI%%\fP surround the \fI%%\fP with double quotes (\fI\(dq\&%%\(dq\&\fP) or prefix the \fI%%\fP with a blank space: the \fI%%\fP\-characters are only considered a separator if they are encountered at the very beginning of a line\&. .PP .SH "7\&. REGULAR EXPRESSIONS" .PP The regular expressions defined in \fBflexc++\fP\(cq\&s rules files are matched against the information passed to the scanner\(cq\&s \fIlex\fP function\&. .PP Regular expressions begin as the first non\-blank character on a line\&. Comment is interpreted as comment as long as it isn\(cq\&t part of the regular expresssion\&. To define a regular expression starting with two slashes (at least) the first slash can be escaped or double quoted\&. (E\&.g\&., \fI\(dq\&//\(dq\&\&.*\fP defines \fBC++\fP comment to end\-of\-line)\&. .PP Regular expressions end at the first blank character (to add a blank character, e\&.g\&., a space character, to a regular expression, prefix it by a backslash or put it in a double\-quoted string)\&. .PP Actions may be associated with regular expressions\&. At a match the action that is associated with the regular expression is executed, after which scanning continues when the lexical scanning function (e\&.g\&., \fIlex\fP) is called again\&. Actions are not required, and regular expressions can be defined without any actions at all\&. If such action\-less regular expressions are matched then the match is performed silently, after which processing continues\&. .PP \fBFlexc++\fP tries to match as many characters of the input file as possible (i\&.e\&., it uses `greedy matching\(cq\&)\&. Non\-greedy matching is accomplished by a combination of a scanner and parser and/or by using the `lookahead\(cq\& operator (\fI/\fP)\&. .PP The following regular expression `building blocks\(cq\& are available\&. More complex regular expressions are created by combining them: .PP .IP "\fIx\fP" the character `x\(cq\&; .IP "\fI\&.\fP" any character (byte) except newline; .IP "\fI[xyz]\fP" a character class; in this case, the pattern matches either an `x\(cq\&, a `y\(cq\&, or a `z\(cq\&\&. See also the paragraph about character classes below; .IP "\fI[abj\-oZ]\fP" a character class containing a range; matches an `a\(cq\&, a `b\(cq\&, any letter from `j\(cq\& through `o\(cq\&, or a `Z\(cq\&\&. See also the paragraph about character classes below; .IP "\fI[^A\-Z]\fP" a negated character class, i\&.e\&., any character except for those in the class\&. In this example, any non\-capital character\&. See also the paragraph about character classes below; .IP "\fI\(dq\&[xyz]\e\(dq\&foo\(dq\&\fP" text between double quotes matches the literal string: \fI[xyz]\(dq\&foo\fP; .IP "R\(dq\&([xyz]\e\(dq\&foo)\(dq\&" the literal string `\fI[xyz]\e\(dq\&foo\fP\(cq\& (using a raw string literal); .IP "\fI\eX\fP" if X is `a\(cq\&, `b\(cq\&, `f\(cq\&, `n\(cq\&, `r\(cq\&, `t\(cq\&, or `v\(cq\&, then the ANSI\-C interpretation of `\ex\(cq\& is matched\&. Otherwise, a literal `X\(cq\& is matched (this is used to escape operators such as `*\(cq\&); .IP "\fI\e0\fP" a NUL character (ASCII code 0); .IP "\fI\e123\fP" the character with octal value 123; .IP "\fI\ex2a\fP" the character with hexadecimal value 2a; .IP "\fI(r)\fP" the regular expression `r\(cq\&; parentheses are used to override precedence (see below); .IP "\fI{name}\fP" the expansion of the `name\(cq\& definition; .IP "\fIr*\fP" zero or more regular expressions `r\(cq\&\&. This also matches the empty string; .IP "\fIr+\fP" one or more regular expressions `r\(cq\&; .IP "\fIr?\fP" zero or one regular expression `r\(cq\&\&. This also matches the empty string; .IP "\fIrs\fP" the regular expression `r\(cq\& followed by the regular expression `s\(cq\&; called concatenation; .IP "\fIr{m, n}\fP" regular expression `r\(cq\& at least m, but at most n times (\fI0 <= m <= n\fP)\&. A regular expression to which \fI{0, 0}\fP is appended is ignored, and a warning message is shown\&. .IP "\fIr{m,}\fP" regular expression `r\(cq\& m or more times (\fI0 <= m\fP); .IP "\fIr{m}\fP" regular expression `r\(cq\& exactly m times (\fI0 <= m\fP)\&. A regular expression to which \fI{0}\fP is appended is ignored, and a warning message is shown; .IP "\fIr|s\fP" either regular expression `r\(cq\& or regular expression `s\(cq\&; .IP "\fIr/s\fP" regular expression `r\(cq\& if it is followed by regular expression `s\(cq\&\&. The text matched by `s\(cq\& is included when determining whether this rule results in the longest match, but `s\(cq\& is then returned to the input before the rule\(cq\&s action (if defined) is executed\&. .IP If \fBflexc++\fP detects patterns potentially not matching any text it generates warnings like this: .nf [Warning] input, line 7: null\-matching regular expression .fi By placing the comment .nf //%nowarn .fi on the line just before a regular expression that potentially does not match any text, the warning for that regular expression is suppressed; .IP .IP "\fI^r\fP" a regular expression `r\(cq\& at the beginning of a line or file; .IP "\fIr$\fP" a regular expression `r\(cq\&, occurring at the end of a line\&. This pattern is identical to `r/\en\(cq\&; .IP "\fIr\fP" a regular expression `r\(cq\& in start condition `s\(cq\&; .IP "\fIr\fP" a regular expression `r\(cq\& in start conditions s1, s2, or s3; .IP "\fI<*>r\fP" a regular expression `r\(cq\& in all start conditions; .IP "\fI<\fP\fI\fP\fI>\fP" an end\-of\-file; .IP "\fI<\fP\fI\fP\fI>\fP" an end\-of\-file when in start conditions s1 or s2 \&. .PP \fBCharacter classes\fP .PP Inside a character class all regular expression operators lose their special meanings, except for the escape character (\fI\e\fP), the character range operator \fI\-\fP, the end of character class operator \fI]\fP, and, at the beginning of the class, \fI^\fP\&. All ordinary escape sequences are supported, all other escaped characters are interpreted as literal characters (e\&.g\&., \fI\ec\fP is a literal \fIc\fP)\&. .PP To add a closing bracket to a character class use \fI[]\fP or \fI\e]\fP\&. To add a closing bracket to a negated character class use \fI[^]\fP (or use \fI[^\fP followed by \fI\e]\fP somewhere within the character class)\&. Minus characters are used to define character ranges (e\&.g\&., \fI[a\-d]\fP, defining \fI[abcd]\fP) except in the following cases, where \fBflexc++\fP recognizes a literal minus character: \fI[\-\fP, or \fI[^\-\fP (a minus at the very beginning of a character class); \fI\-]\fP (a minus at the very end of a character class); or \fI\e\-\fP (an escaped minus character)) Once a character class has started, all subsequent character (ranges) are added to the set, until the final closing bracket (\fI]\fP) has been reached\&. .PP \fBOperator precedence\fP .PP The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom\&. From lowest to highest precedence, the operators are: .IP o \fI|\fP: the or\-operator at the end of a line (instead of an action) indicates that this expression\(cq\&s action is identical to the action of the next rule\&. .IP o \fI/\fP: the look\-ahead operator; .IP o \fI|\fP: the or\-operator withn a regular expression; .IP o \fICHAR\fP: individual elements of the regular expression: characters, strings, quoted characters, escaped characters, character sets etc\&. are all considered \fICHAR\fP elements\&. Multiple \fICHAR\fP elements can be combined by enclosing them in parentheses (e\&.g\&., \fI(abc)+\fP indicates sequences of \fIabc\fP characters, like \fIabcabcabc\fP); .IP o \fI*, ?, +, {\fP: multipliers: .br \fI?\fP: zero or one occurrence of the previous element; .br \fI+\fP: one or more repetitions of the previous element; .br \fI*\fP: zero or more repetitions of the previous element; .br \fI{\&.\&.\&.}\fP: interval specification: a specified number of repetitions of the previous element (see above for specific forms of the interval specification) .IP o \fI{+}, {\-}\fP: set operators (\fI{+}\fP computing the union of two sets, \fI{\-}\fP computing the difference of the left\-hand side set minus the elements in the right\-hand side set); .PP The lex standard defines concatenation as having a higher precedence than the interval expression\&. This is different from many other regular expression engines, and \fBflexc++\fP follows these latter engines, giving all `multiplication operators\(cq\& equal priority\&. .PP Name expansion has the same precedence as grouping (using parentheses to influence the precedence of the other operators in the regular expression)\&. Since the name expansion is treated as a group in \fBflexc++\fP, it is not allowed to use the lookahead operator in a name definition (a named pattern, defined in the definition section)\&. .PP \fBPredefined sets of characters\fP .PP Character classes can also contain character class expressions\&. These are expressions enclosed inside \fI[:\fP and \fI:]\fP delimiters (which themselves must appear between the \fI[\fP and \fI]\fP of the character class\&. Other elements may occur inside the character class as well)\&. The character class expressions are: .nf [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] .fi .PP Character class expressions designate a set of characters equivalent to the corresponding standard \fBC\fP isXXX function\&. For example, \fI[:alnum:]\fP designates those characters for which \fIisalnum\fP returns true \- i\&.e\&., any alphabetic or numeric character\&. For example, the following character classes are all equivalent: .nf [[:alnum:]] [[:alpha:][:digit:]] [[:alpha:][0\-9]] [a\-zA\-Z0\-9] .fi .PP A negated character class such as the example \fI[^A\-Z]\fP above will match a newline unless \fI\en\fP (or an equivalent escape sequence) is one of the characters explicitly present in the negated character class (e\&.g\&., \fI[^A\-Z\en]\fP)\&. This differs from the way many other regular expression tools treat negated character classes, but unfortunately the inconsistency is historically entrenched\&. Matching newlines means that a pattern like \fI[^\(dq\&]*\fP can match the entire input unless there\(cq\&s another quote in the input\&. .PP \fBFlexc++\fP allows negation of character class expressions by prepending \fI^\fP to the POSIX character class name\&. .nf [:^alnum:] [:^alpha:] [:^blank:] [:^cntrl:] [:^digit:] [:^graph:] [:^lower:] [:^print:] [:^punct:] [:^space:] [:^upper:] [:^xdigit:] .fi .PP \fBCombining character sets\fP .PP The \fI{\-}\fP operator computes the difference of two character classes\&. For example, \fI[a\-c]{\-}[b\-z]\fP represents all the characters in the class \fI[a\-c]\fP that are not in the class \fI[b\-z]\fP (which in this case, is just the single character \fIa\fP)\&. The \fI{\-}\fP operator is left associative, so \fI[abc]{\-}[b]{\-}[c]\fP is the same as \fI[a]\fP\&. .PP The \fI{+}\fP operator computes the union of two character classes\&. For example, \fI[a\-z]{+}[0\-9]\fP is the same as \fI[a\-z0\-9]\fP\&. This operator is useful when preceded by the result of a difference operation, as in, \fI[[:alpha:]]{\-}[[:lower:]]{+}[q]\fP, which is equivalent to \fI[A\-Zq]\fP in the \fBC\fP locale\&. .PP \fBTrailing context\fP .PP A rule can have at most one instance of trailing context (the \fI/\fP operator or the \fI$\fP operator)\&. The start condition, \fI^\fP, and \fI<>\fP patterns can only occur at the beginning of a pattern, and cannot be surrounded by parentheses\&. The characters \fI^\fP and \fI$\fP only have their special properties at, respectively, the beginning and end of regular expressions\&. In all other cases they are treated as a normal characters\&. .PP .SH "8\&. SPECIFICATION EXAMPLE" .PP .nf %option debug %x comment NAME [[:alpha:]][_[:alnum:]]* %% \(dq\&//\(dq\&\&.* // ignore \(dq\&/*\(dq\& begin(StartCondition_::comment); \&.|\en // ignore \(dq\&*/\(dq\& begin(StartCondition_::INITIAL); ^a return 1; a return 2; a$ return 3; {NAME} return 4; \&.|\en // ignore .fi .PP ) .PP .SH "FILES" .PP \fBFlexc++\fP\(cq\&s default skeleton files are in \fI/usr/share/flexc++\fP\&. .br By default, \fBflexc++\fP generates the following files: .IP o \fIScanner\&.h\fP: the header file containing the scanner class\(cq\&s interface\&. .IP o \fIScannerbase\&.h\fP: the header file containing the interface of the scanner class\(cq\&s base class\&. .IP o \fIScanner\&.ih\fP: the internal header file that is meant to be included by the scanner class\(cq\&s source files (e\&.g\&., it is included by \fIlex\&.cc\fP, see the next item\(cq\&s file), and that should contain all declarations required for compiling the scanner class\(cq\&s sources\&. .IP o \fIlex\&.cc\fP: the source file implementing the scanner class member function \fIlex\fP (and support functions), performing the lexical scan\&. .PP .SH "SEE ALSO" .PP \fBflexc++\fP(1), \fBflexc++api\fP(3) .PP .SH "BUGS" .PP .IP o The priority of interval expressions (\fI{\&.\&.\&.}\fP) equals the priority of other multiplicative operators (like \fI*\fP)\&. .PP .SH "COPYRIGHT" This is free software, distributed under the terms of the GNU General Public License (GPL)\&. .PP .SH "AUTHOR" Frank B\&. Brokken (\fBf\&.b\&.brokken@rug\&.nl\fP), .br Jean\-Paul van Oosten (\fBj\&.p\&.van\&.oosten@rug\&.nl\fP), .br Richard Berendsen (\fBrichardberendsen@xs4all\&.nl\fP) (until 2010)\&. .br .PP