.TH "flexc++input" "7" "2008\-2020" "flexc++\&.2\&.08\&.01\&.tar\&.gz" "flexc++ input file organization"

.PP 
.SH "NAME"
flexc++input \- Organization of flexc++\(cq\&s input \fBs\fP
.PP 
.SH "DESCRIPTION"

.PP 
\fBFlexc++\fP(1) was designed after \fBflex\fP(1) and \fBflex++\fP(1)\&. Like these
two programs \fBflexc++\fP generates code performing pattern\-matching on text,
possibly executing actions when certain \fIregular expressions\fP are
recognized\&.
.PP 
Refer to \fBflexc++\fP(1) for a general overview\&. This manual page describes
how \fBflexc++\fP\(cq\&s input \fBs\fP should be organized\&. It contains the following
sections:
.PP 
.IP o 
\fB1\&. SPECIFICATION FILE(S)\fP: the format and contents of \fBflexc++\fP input
files, specifying the Scanner\(cq\&s characteristics
.IP o 
\fB2\&. FILE SWITCHING\fP: how to switch to another input specification
file
.IP o 
\fB3\&. DIRECTIVES\fP: directives that can be used in input
specification files
.IP o 
\fB4\&. MINI SCANNERS\fP: how to declare mini\-scanners
.IP o 
\fB5\&. DEFINITIONS\fP: how to define symbolic names for regular
expressions
.IP o 
\fB6\&. %% SEPARATOR\fP: the separator between the input specification
sections 
.IP o 
\fB7\&. REGULAR EXPRESSIONS\fP: regular expressions supported by \fBflexc++\fP
.IP o 
\fB8\&. SPECIFICATION EXAMPLE\fP: an example of a specification file

.PP 
.SH "UNDERSCORES"
Starting with version 2\&.07\&.00 \fBflexc++\fP reserved identifiers no longer end in two
underscore characters, but in one\&. This modification was necessary because
according to the \fBC++\fP standard identifiers having two or more consecutive
underscore characters are reserved by the language\&. In practice this could
require some minor modifications of existing source files  using \fBflexc++\fP\(cq\&s
facilities, most likely limited to changing \fIStartCondition__\fP into
\fIStartCondition_\fP and changing \fIPostEnum__\fP into \fIPostEnum_\fP\&. 
.PP 
The complete list of affected names is:
.IP "Enums:"
.RS 
ActionType_, Leave_, StartConditon_, PostEnum_;
.RE
.IP "Member functions:"
.RS 
actionType_, continue_, echoCh_, echoFirst_,
executeAction_, getRange_, get_, istreamName_, lex_, lop1_, 
lop2_, lop3_, lop4_, lopf_, matched_, noReturn_, print_, 
pushFront_, reset_, return_;
.RE
.IP "Protected data members:"
.RS 
d_in_ d_token_ s_finIdx_, s_interactive_, 
s_maxSizeofStreamStack_, s_nRules_, s_rangeOfEOF_, 
s_ranges_, s_rf_\&.
.RE

.PP 
.SH "1\&. SPECIFICATION FILE(S)"

.PP 
\fBFlexc++\fP expects an input file containing directives and the regular
expressions that should be recognized by objects of the scanner class
generated by \fBflexc++\fP\&. In this man page the elements and organization of \fBflexc++\fP\(cq\&s
input file is described\&. 
.PP 
\fBFlexc++\fP\(cq\&s input file consists of two sections, separated from each other by
a line merely containing two consecutive percent characters:
.nf 

%%
    
.fi 
The section before this separator contains directives; the section
following this separator contains regular expressions and possibly actions to
perform when these regular expressions are matched by the object of the
scanner class generated by \fBflexc++\fP\&. If a second line is encountered immediately
beginning with  two consecutive percent characters then this ends \fBflexc++\fP\(cq\&s
input file processing\&. See also section 6 (%% SEPARATOR) below\&.
.PP 
White space is usually ignored, as is comment, which may be of the
traditional \fBC\fP form (i\&.e\&., \fI/*\fP, followed by (possibly multi\-line)
comment text, followed by \fI*/\fP, and it may be \fBC++\fP end\-of\-line comment:
two consecutive slashes (\fI//\fP) start the comment, which continues up to
the next newline character\&.
.PP 
.SH "2\&. FILE SWITCHING"

.PP 
\fBFlexc++\fP\(cq\&s input file may be split into multiple files\&. This allows for the
definition of logically separate elements of the specifications in different
files\&. Include directives must be specified on separate lines and may not
contain any other information than the (path)name of the file to switch
to\&. File names may be surrounded by double quotes, but these double quotes are
optional and are ignored (removed) when encountered\&. All remaining characters
define the name of the subsequently processed file by \fBflexc++\fP\&.  White space
characters following \fI//include\fP and preceding the end of the line are
ignored\&. To switch files the following stanza is used:
.nf 

//include file\-location
        
.fi 
The \fI//include\fP directive must start in the line\(cq\&s first column\&. File
locations can be absolute or relative to the location of the file containing
the \fI//include\fP directive\&. Once \fBflexc++\fP has switched to another file that
file\(cq\&s directory becomes \fBflexc++\fP\(cq\&s working directory\&.
.PP 
Once the end of file of a file has been reached, processing continues at
the line beyond the \fI//include\fP directive of the previously scanned file,
and \fBflexc++\fP\(cq\&s working directory is reset to the working directory of the file to
which \fBflexc++\fP returns\&. The end\-of\-file of the file that was initially specified
when \fBflexc++\fP started indicates the end of \fBflexc++\fP\(cq\&s rules specification\&.
.PP 
.SH "3\&. DIRECTIVES"

.PP 
The first section of \fBflexc++\fP\(cq\&s input file consists of directives\&. In
addition it may associate regular expressions with symbolic names, allowing
you to use these identifiers in the rules section\&. Each directive is defined
on a line of its own\&. When available, directives are overridden by \fBflexc++\fP
command line options\&.
.PP 
Some directives require arguments, which are usually provided following
separating (but optional) \fI=\fP characters\&. Arguments of directives are text,
surrounded by double quotes (strings), or embedded in raw string literals
(rawstrings)\&.  Double quotes or backslashes inside strings must themselves be
preceded by backslashes; these backslashes are not required when rawstrings
are used\&. 
.PP 
The \fI%s\fP and \fI%x\fP directives are immediately followed by name lists,
consisting of identifiers separated by blanks\&.  Here is an example of the
definition of a directive:
.nf 

    %class\-name = \(dq\&MyScanner\(dq\&
        
.fi 

.PP 
Directives accepting a `filename\(cq\& do not accept path names, i\&.e\&., they
cannot contain directory separators (\fI/\fP); options accepting a \(cq\&pathname\(cq\&
may contain directory separators\&. A \(cq\&pathname\(cq\& using blank characters should
be surrounded by double quotes\&.
.PP 
Some directives may generate errors\&. This happens when a directive conflicts
with the contents of an existing file which \fBflexc++\fP cannot modify (e\&.g\&., a
scanner class header file exists, but doesn\(cq\&t define a name space, but a
\fI%namespace\fP directive was provided)\&. To solve the error the offending
directive could be omitted, the existing file could be removed, or the
existing file could be hand\-edited according to the directive\(cq\&s specification\&.
Note that \fBflexc++\fP currently does not handle the opposite error condition: if a
previously used directive is omitted, then \fBflexc++\fP does not detect the
inconsistency\&. In those cases you may encounter compilation errors\&.
.PP 
.IP o 
\fB%baseclass\-header\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the scanner class\(cq\&s base
class interface\&. Corresponding command\-line option:
\fI\-\-baseclass\-header\fP\&.
.IP 
It is an error if this directive is used and an already
existing scanner\-class header file does not include
\fI`filename\(cq\&\fP\&. 
.IP 
.IP o 
\fB%case\-insensitive\fP
.br 
Generates a scanner which \fIcase insensitively\fP matches regular
expressions\&. All regular expressions specified in \fBflexc++\fP\(cq\&s input
file are interpreted case insensitively and the resulting scanner
object will case insensitively interpret its input\&.
.IP 
Corresponding command\-line option: \fI\-\-cases\-insensitive\fP\&.
.IP 
When this directive is specified the resulting scanner does not
distinguish between the following rules:
.nf 

        First       // initial F is transformed to f
        first
        FIRST       // all capitals are transformed to lower case chars
                
.fi 
With a case\-insensitive scanner only the first rule can be matched,
and \fBflexc++\fP will issue warnings for the second and third rule about
rules that cannot be matched\&.
.IP 
Input processed by a case\-insensitive scanner is also handled case
insensitively\&. The above mentioned \fIFirst\fP rule is matched for
all of the following input words: \fIfirst First FIRST firST\fP\&. 
.IP 
Although the matching process proceeds case insensitively, the
matched text (as returned by the scanner\(cq\&s \fImatched()\fP member)
always contains the original, unmodified text\&. So, with the above
input \fImatched()\fP returns, respectively \fIfirst, First, FIRST\fP
and \fIfirST\fP, while matching the rule \fIFirst\fP\&.
.IP 
.IP o 
\fB%class\-header\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the scanner class\(cq\&s
interface\&. Corresponding command\-line option: \fI\-\-class\-header\fP\&.
.IP 
.IP o 
\fB%class\-name\fP \fI = \(dq\&className\(dq\&\fP 
.br 
Declares the name of the scanner class generated by \fBflexc++\fP\&. This
directive corresponds to the \fI%name\fP directive used by
\fBflex++\fP(1)\&. Contrary to \fBflex++\fP\(cq\&s \fI%name\fP declaration,
\fIclass\-name\fP may appear anywhere in the first section of the
grammar specification file\&. It may be defined only once\&. If no
\fIclass\-name\fP is specified the default class name (\fIScanner\fP)
is used\&. Corresponding command\-line option:
\fI\-\-class\-name\fP\&.
.IP 
It is an error if this directive is used and an already
existing scanner\-class header file does not define \fIclass
`className\(cq\&\fP\&.
.IP 
.IP o 
\fB%debug\fP 
.br 
Provide \fIlex\fP and its support functions with debugging code,
showing the actual parsing process on the standard output
stream\&. When included, the debugging output is active by default,
but its activity may be controlled using the \fIsetDebug(bool
on\-off)\fP member\&. Note that no \fI#ifdef DEBUG\fP macros are used in
the generated code\&. 
.IP 
.IP o 
\fB%filenames\fP \fI= \(dq\&basename\(dq\&\fP 
.br 
Defines the basename of the \fIScanner\&.h, Scanner\&.ih,\fP and
\fIScannerbase\&.h\fP files\&. E\&.g\&., when using the directive
.nf 

    %filenames = \(dq\&scanner\(dq\&
                
.fi 
the names of the generated files are, respectively, \fIscanner\&.h,
scanner\&.ih,\fP and \fIscannerbase\&.h\fP\&.  Corresponding command\-line
option: \fI\-\-filenames\fP\&. The name of the source file (by default
\fIlex\&.cc\fP) is controlled by the \fI%lex\-source\fP directive\&.
.IP 
.IP o 
\fB%implementation\-header\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the implementation header\&.
Corresponding command\-line option: \fI\-\-implementation\-header\fP\&.
.IP 
It is an error if this directive is used and an already
\fI\(cq\&filename\(cq\&\fP file does not include the scanner class header
file\&.
.IP 
.IP o 
\fB%input\-implementation\fP \fI= \(dq\&sourcefile\(dq\&\fP 
.br 
Defines the pathname of the file containing the implementation of a
user\-defined \fIInput\fP class\&. 
.IP 
.IP o 
\fB%input\-interface\fP \fI= \(dq\&interface\(dq\&\fP 
.br 
Defines the pathname of the file containing the interface of a
user\-defined \fIInput\fP class\&. See section \fB17\&. THE CLASS INPUT\fP
in the \fBflexc++api\fP(3) manual page for additional information
about user\-defined \fIInput\fP classes\&.
.IP 
.IP o 
\fB%interactive\fP
.br 
Generate an interactive scanner\&. An interactive scanner reads lines
from the input stream, and then returns the tokens encountered on
that line\&. The interactive scanner implemented by \fBflexc++\fP only
predefines the \fIScanner(std::istream &in, std::ostream &out)\fP
constructor, by default assuming that input is read from
\fIstd::cin\fP\&. See also section \fI1\&. INTERACTIVE SCANNER\fP section
in the \fBflexc++api\fP(3) manual page\&.
.IP 
.IP o 
\fB%lex\-function\-name\fP \fI= \(dq\&funname\(dq\&\fP 
.br 
Defines the name of the scanner class\(cq\&s member to perform the
lexical scanning\&. If this directive is omitted the default name
(\fIlex\fP) is used\&. Corresponding command\-line option:
\fI\-\-lex\-function\-name\fP\&.
.IP 
.IP o 
\fB%lex\-source\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the scanner member
\fIlex\fP\&. Corresponding command\-line option: \fI\-\-lex\-source\fP\&.
.IP 
.IP o 
\fB%no\-lines\fP 
.br 
Do not put \fI#line\fP preprocessor directives in the file containing
the scanner\(cq\&s \fIlex\fP function\&. If omitted \fI#line\fP directives
are added to this file, unless overridden by the command line
options \fI\-\-lines\fP and \fI\-\-no\-lines\fP\&.
.IP 
.IP o 
\fB%namespace\fP \fI= \(dq\&identifer\(dq\&\fP 
.br 
Define the scanner class in the namespace \fIidentifier\fP\&. By
default no namespace is used\&. If this directives is used the
implementation header is provided with a commented out \fIusing
namespace\fP declaration for the requested namespace\&.  In addition,
the scanner and scanner base class header files also use the
specified namespace to define their include guard directives\&.
.IP 
It is an error if this directive is used and an already
scanner\-class header file does not define \fInamespace
identifier\fP\&.
.IP 
.IP o 
\fB%print\-tokens\fP 
.br 
this directive results in the tokens as well as the matched text to
be displayed on the standard output stream, just before returning
the token to \fIlex\fP\(cq\&s caller\&. Displaying is suppressed again when
the \fIlex\&.cc\fP file is generated without using this directive\&. The
function showing the tokens (\fIScannerBase::print_\fP) is called
from \fIScanner::print()\fP, which is defined in\-line in
\fIScanner\&.h\fP\&. Calling \fIScannerBase::print_\fP, therefore, can
also easily be controlled by an option controlled by the program
using the scanner object\&.
this directive does \fInot\fP show the tokens returned and text
matched by \fBflexc++\fP itself when reading its input \fBs\fP\&. If that is
what you want, use the \fI\-\-own\-tokens\fP option\&.
.IP 
.IP o 
\fB%s\fP \fInamelist\fP 
.br 
The \fI%s\fP directive is followed by a list of one or more
identifiers, separated by blanks\&. Each identifier is the name of
an \fIinclusive start condition\fP\&.
.IP 
.IP o 
\fB%skeleton\-directory\fP \fI= \(dq\&pathname\(dq\&\fP 
.br 
Use \fIpathname\fP rather than the default (e\&.g\&.,
\fI/usr/share/flexc++\fP) path when looking for \fBflexc++\fP\(cq\&s skeleton
files\&. Corresponding command\-line option:
\fI\-\-skeleton\-directory\fP\&.
.IP 
.IP o 
\fBstartcondition\-name\fP \fI = \(dq\&startconditionName\(dq\&\fP 
.br 
By default, \fBflexc++\fP defines the enum \fIStartCondition_\fP defining
the names of start\-conditions\&. The \fI%startcondition\-name\fP
directive can be used to configure another name for the enum
containing the names of the start\-conditions\&.  It may be defined
only once\&. 
.IP 
The name of the startcondition\-enum may be modified, and the
directive can also be omitted again after it has been specified
before\&. When changing the name of the startcondition\-enum or when
reverting to the default name newly generated \fIlex\&.cc\fP and
\fIScannerBase\&.h\fP files will use the currently defined
startcondition\-enum name\&. Be advised, though, that the
startcondition\-enum name may also be used in user\-defined members
of the scanner\-class, or in the scanner\(cq\&s header and internal
header files\&. If so, the user is responsible for updating those
files to the currently defined name of the startcondition\-enum\&.
.IP 
.IP o 
\fB%target\-directory\fP \fI= \(dq\&pathname\(dq\&\fP 
.br 
\fIPathname\fP defines the directory where generated files should be
written\&.  By default this is the directory where \fBflexc++\fP is
called\&. This directive is overruled by the \fI\-\-target\-directory\fP
command\-line option\&.
.IP 
.IP o 
\fB%x\fP \fInamelist\fP 
.br 
The \fI%x\fP directive is followed by a list of one or more
identifiers, separated by blanks\&. Each identifier is the name of
an \fIexclusive start condition\fP\&.

.PP 
.SH "4\&. MINI SCANNERS"

.PP 
Mini scanners come in two flavors: inclusive mini scanners and exclusive
mini scanners\&. The rules that apply to an inclusive mini scanner are the mini
scanner\(cq\&s own rules as well as the rules which apply to no mini scanners in
particular (i\&.e\&., the rules that apply to the default (or \fIINITIAL\fP) mini
scanner)\&. Exclusive mini scanners only use the rules that were defined for
them\&. 
.PP 
To define an inclusive mini scanner use \fI%s\fP, followed by one
or more identifiers specifying the name(s) of the mini\-scanner(s)\&. To define
an exclusive mini scanner use \fI%x\fP, followed by or more identifiers
specifying the name(s) of the mini\-scanner(s)\&. The following example defines
the names of two mini scanners: \fIstring\fP and \fIcomment\fP: 
.nf 

    %x string comment 
        
.fi 
Following this, rules defined in the context of the \fIstring\fP mini
scanner (see below) will only be used when that mini scanner is active\&.
.PP 
A \fBflexc++\fP input file may contain multiple \fI%s\fP and \fI%x\fP
specifications\&.
.PP 
.SH "5\&. DEFINITIONS"

.PP 
Definitions are of the form
.nf 

identifier  regular\-expression
        
.fi 
Each definition must be entered on a line of its own\&. Definitions
associate identifiers with regular expressions, allowing the use of
\fI${identifier}\fP as synonym for its regular expression in the rules section
of \fBflexc++\fP\(cq\&s input file\&. One defined, the identifiers representing regular
expressions can also be used in subsequent definitions\&.
.PP 
Example:
.nf 

FIRST                   [A\-Za\-z_]
NAME                    {FIRST}[\-A\-Za\-z0\-9_]*
        
.fi 

.PP 
.SH "6\&. %% SEPARATOR"

.PP 
Following directives and definitions a line merely containing two consecutive
\fI%\fP characters is expected\&. Following this line the rules are defined\&. Rules
consist of regular expressions which should be recognized, possibly followed
by actions to be executed once a rule\(cq\&s regular expression has been matched\&.
.PP 
If the rule section contains a line starting with two consecutive \fI%\fP
characters, then any remaining input is ignored\&. Note that this second \fI%%\fP
separator does not have to be specified\&. It is purely optional\&. To specify a
regular expression starting with \fI%%\fP surround the \fI%%\fP with double quotes
(\fI\(dq\&%%\(dq\&\fP) or prefix the \fI%%\fP with a blank space: the \fI%%\fP\-characters are
only considered a separator if they are encountered at the very beginning of a
line\&. 
.PP 
.SH "7\&. REGULAR EXPRESSIONS"

.PP 
The regular expressions defined in \fBflexc++\fP\(cq\&s rules files are matched against 
the information passed to the scanner\(cq\&s \fIlex\fP function\&. 
.PP 
Regular expressions begin as the first non\-blank character on a line\&. Comment
is interpreted as comment as long as it isn\(cq\&t part of the regular
expresssion\&. To define a  regular expression starting with two slashes (at
least) the first slash can be escaped or double quoted\&. (E\&.g\&., \fI\(dq\&//\(dq\&\&.*\fP
defines \fBC++\fP comment to end\-of\-line)\&.
.PP 
Regular expressions end at the first blank character (to add a blank character,
e\&.g\&., a space character, to a regular expression, prefix it by a backslash or
put it in a double\-quoted string)\&.
.PP 
Actions may be associated with regular expressions\&.  At a match the action
that is associated with the regular expression is executed, after which
scanning continues when the lexical scanning function (e\&.g\&., \fIlex\fP) is
called again\&. Actions are not required, and regular expressions can be defined
without any actions at all\&. If such action\-less regular expressions are
matched then the match is performed silently, after which processing
continues\&.
.PP 
\fBFlexc++\fP tries to match as many characters of the input file as possible (i\&.e\&.,
it uses `greedy matching\(cq\&)\&. Non\-greedy matching is accomplished by a
combination of a scanner and parser and/or by using the `lookahead\(cq\& operator
(\fI/\fP)\&.
.PP 
The following regular expression `building blocks\(cq\& are available\&. More complex
regular expressions are created by combining them:
.PP 
.IP "\fIx\fP"
the character `x\(cq\&;
.IP "\fI\&.\fP"
any character (byte) except newline;
.IP "\fI[xyz]\fP"
a character class; in this case, the pattern matches either an `x\(cq\&,
a `y\(cq\&, or a `z\(cq\&\&. See also the paragraph about character classes below;
.IP "\fI[abj\-oZ]\fP"
a character class containing a range; matches an `a\(cq\&, a `b\(cq\&, any
letter from `j\(cq\& through `o\(cq\&, or a `Z\(cq\&\&. See also the paragraph about
character classes below;
.IP "\fI[^A\-Z]\fP"
a negated character class, i\&.e\&., any character except
for those in the class\&.  In this example, any non\-capital character\&. See
also the paragraph about character classes below; 
.IP "\fI\(dq\&[xyz]\e\(dq\&foo\(dq\&\fP"
text between double quotes matches the literal string: \fI[xyz]\(dq\&foo\fP;
.IP "R\(dq\&([xyz]\e\(dq\&foo)\(dq\&"
the literal string  `\fI[xyz]\e\(dq\&foo\fP\(cq\& (using a raw string literal);
.IP "\fI\eX\fP"
if X is `a\(cq\&, `b\(cq\&, `f\(cq\&, `n\(cq\&, `r\(cq\&, `t\(cq\&, or `v\(cq\&, then the ANSI\-C
interpretation of `\ex\(cq\& is matched\&. Otherwise, a literal `X\(cq\& is matched
(this is used to escape operators such as `*\(cq\&);
.IP "\fI\e0\fP"
a NUL character (ASCII code 0);
.IP "\fI\e123\fP"
the character with octal value 123;
.IP "\fI\ex2a\fP"
the character with hexadecimal value 2a;
.IP "\fI(r)\fP"
the regular expression `r\(cq\&; parentheses are used to override
precedence (see below);
.IP "\fI{name}\fP"
the expansion of the `name\(cq\& definition;
.IP "\fIr*\fP"
zero or more regular expressions `r\(cq\&\&. This also matches the empty
string;
.IP "\fIr+\fP"
one or more regular expressions `r\(cq\&;
.IP "\fIr?\fP"
zero or one regular expression `r\(cq\&\&.  This also matches the empty
string;
.IP "\fIrs\fP"
the regular expression `r\(cq\& followed by the regular expression `s\(cq\&;
called concatenation;
.IP "\fIr{m, n}\fP"
regular expression `r\(cq\& at least m, but at most n times (\fI0 <= m
<= n\fP)\&.  A regular expression to which \fI{0, 0}\fP is appended
is ignored, and a warning message is shown\&.
.IP "\fIr{m,}\fP"
regular expression `r\(cq\& m or more times (\fI0 <= m\fP);
.IP "\fIr{m}\fP"
regular expression `r\(cq\& exactly m times (\fI0 <= m\fP)\&.  A regular expression
to which \fI{0}\fP is appended is ignored, and a warning message is shown; 
.IP "\fIr|s\fP"
either regular expression `r\(cq\& or regular expression `s\(cq\&;
.IP "\fIr/s\fP"
regular expression `r\(cq\& if it is followed by regular expression
`s\(cq\&\&. The text matched by `s\(cq\& is included when determining whether this
rule results in the longest match, but `s\(cq\& is then returned to the input
before the rule\(cq\&s action (if defined) is executed\&.
.IP 
If \fBflexc++\fP detects patterns potentially not matching any text it generates 
warnings like this:
.nf 

    [Warning] input, line 7: null\-matching regular expression
        
.fi 
By placing the comment
.nf 

    //%nowarn
        
.fi 
on the line just before a regular expression that potentially does not
match any text, the warning for that regular expression is suppressed;
.IP 
.IP "\fI^r\fP"
a regular expression `r\(cq\& at the beginning of a line or file;
.IP "\fIr$\fP"
a regular expression `r\(cq\&, occurring  at the end of a line\&. This
pattern is identical to `r/\en\(cq\&;
.IP "\fI<s>r\fP"
a regular expression `r\(cq\& in start condition `s\(cq\&;
.IP "\fI<s1,s2,s3>r\fP"
a regular expression `r\(cq\& in start conditions s1, s2, or s3;
.IP "\fI<*>r\fP"
a regular expression `r\(cq\& in all start conditions;
.IP "\fI<\fP\fI<EOF>\fP\fI>\fP"
an end\-of\-file;
.IP "\fI<s1,s2><\fP\fI<EOF>\fP\fI>\fP"
an end\-of\-file when in start conditions s1 or s2 \&.

.PP 
\fBCharacter classes\fP
.PP 
Inside a character class all regular expression operators lose their special
meanings, except for the escape character (\fI\e\fP), the character range
operator \fI\-\fP, the end of character class operator \fI]\fP, and, at the
beginning of the class, \fI^\fP\&. All ordinary escape sequences are supported,
all other escaped characters are interpreted as literal characters (e\&.g\&.,
\fI\ec\fP is a literal \fIc\fP)\&.
.PP 
To add a closing bracket to a character class use \fI[]\fP or \fI\e]\fP\&. To add a
closing bracket to a negated character class use \fI[^]\fP (or use \fI[^\fP
followed by \fI\e]\fP somewhere within the character class)\&. Minus characters are
used to define character ranges (e\&.g\&., \fI[a\-d]\fP, defining \fI[abcd]\fP) except
in the following cases, where \fBflexc++\fP recognizes a literal minus character:
\fI[\-\fP, or \fI[^\-\fP (a minus at the very beginning of a character class); 
\fI\-]\fP (a minus at the very end of a character class); 
or \fI\e\-\fP (an escaped minus character))
Once a character class has started, all
subsequent character (ranges) are added to the set, until the final closing
bracket (\fI]\fP) has been reached\&.
.PP 
\fBOperator precedence\fP
.PP 
The regular expressions listed above are grouped according to precedence, from
highest precedence at the top to lowest at the bottom\&. From lowest to highest
precedence, the operators are:
.IP o 
\fI|\fP: the or\-operator at the end of a line (instead of an action)
indicates that this expression\(cq\&s action is identical to the action of the next
rule\&. 
.IP o 
\fI/\fP: the look\-ahead operator;
.IP o 
\fI|\fP: the or\-operator withn a regular expression;
.IP o 
\fICHAR\fP: individual elements of the regular expression: characters,
strings, quoted characters, escaped characters, character sets etc\&. are all
considered \fICHAR\fP elements\&. Multiple \fICHAR\fP elements can be combined by
enclosing them in parentheses (e\&.g\&., \fI(abc)+\fP indicates sequences of \fIabc\fP
characters, like \fIabcabcabc\fP);
.IP o 
\fI*, ?, +, {\fP: multipliers:
.br 
\fI?\fP: zero or one occurrence  of the previous element;
.br 
\fI+\fP: one or more repetitions of the previous element;
.br 
\fI*\fP: zero or more repetitions of the previous element;
.br 
\fI{\&.\&.\&.}\fP: interval specification: a specified number of
repetitions of the previous element (see above for specific
forms of the interval specification)
.IP o 
\fI{+}, {\-}\fP: set operators (\fI{+}\fP computing the union of two sets,
\fI{\-}\fP computing the difference of the left\-hand side set
minus the elements in the right\-hand side set);

.PP 
The lex standard defines concatenation as having a higher precedence than the
interval expression\&. This is different from many other regular expression
engines, and \fBflexc++\fP follows these latter engines, giving all `multiplication
operators\(cq\& equal priority\&.
.PP 
Name expansion has the same precedence as grouping (using parentheses to
influence the precedence of the other operators in the regular expression)\&.
Since the name expansion is treated as a group in \fBflexc++\fP, it is not allowed to
use the lookahead operator in a name definition (a named pattern, defined in
the definition section)\&.
.PP 
\fBPredefined sets of characters\fP
.PP 
Character classes can also contain character class expressions\&. These are
expressions enclosed inside \fI[:\fP and \fI:]\fP delimiters (which themselves
must appear between the \fI[\fP and \fI]\fP of the character class\&. Other elements
may occur inside the character class as well)\&. The character class expressions
are:
.nf 
     
     [:alnum:] [:alpha:] [:blank:]
     [:cntrl:] [:digit:] [:graph:]
     [:lower:] [:print:] [:punct:]
     [:space:] [:upper:] [:xdigit:]
        
.fi 

.PP 
Character class expressions designate a set of characters equivalent to
the corresponding standard \fBC\fP isXXX function\&. For example, \fI[:alnum:]\fP
designates those characters for which \fIisalnum\fP returns true \- i\&.e\&., any
alphabetic or numeric character\&.  For example, the following character classes
are all equivalent:
.nf 
 
    [[:alnum:]]
    [[:alpha:][:digit:]]
    [[:alpha:][0\-9]]
    [a\-zA\-Z0\-9]
        
.fi 

.PP 
A negated character class such as the example \fI[^A\-Z]\fP above will match a
newline unless \fI\en\fP (or an equivalent escape sequence) is one of the
characters explicitly present in the negated character class (e\&.g\&.,
\fI[^A\-Z\en]\fP)\&. This differs from the way many other regular expression tools
treat negated character classes, but unfortunately the inconsistency is
historically entrenched\&. Matching newlines means that a pattern like \fI[^\(dq\&]*\fP
can match the entire input unless there\(cq\&s another quote in the input\&.
.PP 
\fBFlexc++\fP allows negation of character class expressions by prepending \fI^\fP to
the POSIX character class name\&.
.nf 
                
    [:^alnum:] [:^alpha:] [:^blank:]
    [:^cntrl:] [:^digit:] [:^graph:]
    [:^lower:] [:^print:] [:^punct:]
    [:^space:] [:^upper:] [:^xdigit:]
        
.fi 

.PP 
\fBCombining character sets\fP
.PP 
The \fI{\-}\fP operator computes the difference of two character classes\&. For
example, \fI[a\-c]{\-}[b\-z]\fP represents all the characters in the class
\fI[a\-c]\fP that are not in the class \fI[b\-z]\fP (which in this case, is just the
single character \fIa\fP)\&. The \fI{\-}\fP operator is left associative, so
\fI[abc]{\-}[b]{\-}[c]\fP is the same as \fI[a]\fP\&.
.PP 
The \fI{+}\fP operator computes the union of two character classes\&. For example,
\fI[a\-z]{+}[0\-9]\fP is the same as \fI[a\-z0\-9]\fP\&. This operator is useful when
preceded by the result of a difference operation, as in,
\fI[[:alpha:]]{\-}[[:lower:]]{+}[q]\fP, which is equivalent to \fI[A\-Zq]\fP in the
\fBC\fP locale\&.
.PP 
\fBTrailing context\fP
.PP 
A rule can have at most one instance of trailing context (the \fI/\fP operator
or the \fI$\fP operator)\&. The start condition, \fI^\fP, and \fI<<EOF>>\fP patterns
can only occur at the beginning of a pattern, and cannot be surrounded by
parentheses\&. The characters \fI^\fP and \fI$\fP only have their special properties
at, respectively, the beginning and end of regular expressions\&. In all other
cases they are treated as a normal characters\&.
.PP 
.SH "8\&. SPECIFICATION EXAMPLE"

.PP 
.nf 

%option debug

%x comment

NAME    [[:alpha:]][_[:alnum:]]*

%%

\(dq\&//\(dq\&\&.*          // ignore

\(dq\&/*\(dq\&            begin(StartCondition_::comment);

<comment>\&.|\en   // ignore
<comment>\(dq\&*/\(dq\&   begin(StartCondition_::INITIAL);

^a              return 1;
a               return 2;
a$              return 3;
{NAME}          return 4;

\&.|\en            // ignore
        
.fi 

.PP 
)
.PP 
.SH "FILES"

.PP 
\fBFlexc++\fP\(cq\&s default skeleton files are in \fI/usr/share/flexc++\fP\&.
.br 
By default, \fBflexc++\fP generates the following files:
.IP o 
\fIScanner\&.h\fP: the header file containing the scanner class\(cq\&s
interface\&. 
.IP o 
\fIScannerbase\&.h\fP: the header file containing the interface of the 
scanner class\(cq\&s base class\&.
.IP o 
\fIScanner\&.ih\fP: the internal header file that is meant to be included
by the scanner class\(cq\&s source files (e\&.g\&., it is included by
\fIlex\&.cc\fP, see the next item\(cq\&s file), and that should contain all
declarations required for compiling the scanner class\(cq\&s sources\&.
.IP o 
\fIlex\&.cc\fP: the source file implementing the scanner class member
function \fIlex\fP (and support functions), performing the lexical
scan\&.

.PP 
.SH "SEE ALSO"

.PP 
\fBflexc++\fP(1), \fBflexc++api\fP(3)
.PP 
.SH "BUGS"

.PP 
.IP o 
The priority of interval expressions (\fI{\&.\&.\&.}\fP) equals the priority
of other multiplicative operators (like \fI*\fP)\&.

.PP 
.SH "COPYRIGHT"
This is free software, distributed under the terms of the 
GNU General Public License (GPL)\&.
.PP 
.SH "AUTHOR"
Frank B\&. Brokken (\fBf\&.b\&.brokken@rug\&.nl\fP),
.br 
Jean\-Paul van Oosten (\fBj\&.p\&.van\&.oosten@rug\&.nl\fP),
.br 
Richard Berendsen (\fBrichardberendsen@xs4all\&.nl\fP) (until 2010)\&.
.br 

.PP