.TH "flexc++input" "7" "2008\-2023" "flexc++\&.2\&.14\&.00" "flexc++ input file organization"

.PP 
.SH "NAME"
flexc++input \- Organization of flexc++\(cq\&s input \fBs\fP
.PP 
.SH "DESCRIPTION"

.PP 
\fBFlexc++\fP(1) was designed after \fBflex\fP(1) and \fBflex++\fP(1)\&. Like these
two programs \fBflexc++\fP generates code performing pattern\-matching on text,
possibly executing actions when certain \fIregular expressions\fP are
recognized\&.
.PP 
Refer to \fBflexc++\fP(1) for a general overview\&. This manual page describes
how \fBflexc++\fP\(cq\&s input \fBs\fP should be organized\&. It contains the following
sections:
.PP 
.IP o 
\fB1\&. SPECIFICATION FILE(S)\fP: the format and contents of \fBflexc++\fP input
files, specifying the Scanner\(cq\&s characteristics
.IP o 
\fB2\&. FILE SWITCHING\fP: how to switch to another input specification
file
.IP o 
\fB3\&. DIRECTIVES\fP: directives that can be used in input
specification files
.IP o 
\fB4\&. MINI SCANNERS\fP: how to declare mini\-scanners
.IP o 
\fB5\&. DEFINITIONS\fP: how to define symbolic names for regular
expressions
.IP o 
\fB6\&. %% SEPARATOR\fP: the separator between the input specification
sections 
.IP o 
\fB7\&. REGULAR EXPRESSIONS\fP: regular expressions supported by \fBflexc++\fP
.IP o 
\fB8\&. SPECIFICATION EXAMPLE\fP: an example of a specification file

.PP 
.SH "UNDERSCORES"
Starting with version 2\&.07\&.00 \fBflexc++\fP reserved identifiers no longer end in two
underscore characters, but in one\&. This modification was necessary because
according to the \fBC++\fP standard identifiers having two or more consecutive
underscore characters are reserved by the language\&. In practice this could
require some minor modifications of existing source files  using \fBflexc++\fP\(cq\&s
facilities, most likely limited to changing \fIStartCondition__\fP into
\fIStartCondition_\fP and changing \fIPostEnum__\fP into \fIPostEnum_\fP\&. 
.PP 
The complete list of affected names is:
.IP "Enums:"
.RS 
ActionType_, Leave_, StartConditon_, PostEnum_;
.RE
.IP "Member functions:"
.RS 
actionType_, continue_, echoCh_, echoFirst_,
executeAction_, getRange_, get_, istreamName_, lex_, lop1_, 
lop2_, lop3_, lop4_, lopf_, matched_, noReturn_, print_, 
pushFront_, reset_, return_;
.RE
.IP "Protected data members:"
.RS 
d_in_ d_token_ s_finIdx_, s_interactive_, 
s_maxSizeofStreamStack_, s_nRules_, s_rangeOfEOF_, 
s_ranges_, s_rf_\&.
.RE

.PP 
.SH "1\&. SPECIFICATION FILE(S)"

.PP 
\fBFlexc++\fP expects an input file containing directives and the regular
expressions that should be recognized by objects of the scanner class
generated by \fBflexc++\fP\&. In this man page the elements and organization of \fBflexc++\fP\(cq\&s
input file is described\&. 
.PP 
\fBFlexc++\fP\(cq\&s input file consists of two sections, separated from each other by
a line merely containing two consecutive percent characters:
.nf 

%%
    
.fi 
The section before this separator contains directives; the section
following this separator contains regular expressions and possibly actions to
perform when these regular expressions are matched by the object of the
scanner class generated by \fBflexc++\fP\&. If a second line is encountered immediately
beginning with  two consecutive percent characters then this ends \fBflexc++\fP\(cq\&s
input file processing\&. See also section 6 (%% SEPARATOR) below\&.
.PP 
White space is usually ignored, as is comment, which may be of the
traditional \fBC\fP form (i\&.e\&., \fI/*\fP, followed by (possibly multi\-line)
comment text, followed by \fI*/\fP, and it may be \fBC++\fP end\-of\-line comment:
two consecutive slashes (\fI//\fP) start the comment, which continues up to
the next newline character\&.
.PP 
.SH "2\&. FILE SWITCHING"

.PP 
\fBFlexc++\fP\(cq\&s input file may be split into multiple files\&. This allows for the
definition of logically separate elements of the specifications in different
files\&. Include directives must be specified on separate lines and may not
contain any other information than the (path)name of the file to switch
to\&. File names may be surrounded by double quotes, but these double quotes are
optional and are ignored (removed) when encountered\&. All remaining characters
define the name of the subsequently processed file by \fBflexc++\fP\&.  White space
characters following \fI//include\fP and preceding the end of the line are
ignored\&. To switch files the following stanza is used:
.nf 

//include file\-location
        
.fi 
The \fI//include\fP directive must start in the line\(cq\&s first column\&. File
locations can be absolute or relative to the location of the file containing
the \fI//include\fP directive\&. Once \fBflexc++\fP has switched to another file that
file\(cq\&s directory becomes \fBflexc++\fP\(cq\&s working directory\&.
.PP 
Once the end of file of a file has been reached, processing continues at
the line beyond the \fI//include\fP directive of the previously scanned file,
and \fBflexc++\fP\(cq\&s working directory is reset to the working directory of the file to
which \fBflexc++\fP returns\&. The end\-of\-file of the file that was initially specified
when \fBflexc++\fP started indicates the end of \fBflexc++\fP\(cq\&s rules specification\&.
.PP 
.SH "3\&. DIRECTIVES"

.PP 
The first section of \fBflexc++\fP\(cq\&s input file consists of directives\&. In
addition it may associate regular expressions with symbolic names, allowing
you to use these identifiers in the rules section\&. Each directive is defined
on a line of its own\&. When available, directives are overridden by \fBflexc++\fP
command line options\&.
.PP 
Some directives require arguments, which are usually provided following
separating (but optional) \fI=\fP characters\&. Arguments of directives are text,
surrounded by double quotes (strings), or embedded in raw string literals
(rawstrings)\&.  Double quotes or backslashes inside strings must themselves be
preceded by backslashes; these backslashes are not required when rawstrings
are used\&. 
.PP 
The \fI%s\fP and \fI%x\fP directives are immediately followed by name lists,
consisting of identifiers separated by blanks\&.  Here is an example of the
definition of a directive:
.nf 

    %class\-name = \(dq\&MyScanner\(dq\&
        
.fi 

.PP 
Directives accepting a `filename\(cq\& do not accept path names, i\&.e\&., they
cannot contain directory separators (\fI/\fP); options accepting a \(cq\&pathname\(cq\&
may contain directory separators\&. A \(cq\&pathname\(cq\& using blank characters should
be surrounded by double quotes\&.
.PP 
Some directives may generate errors\&. This happens when a directive conflicts
with the contents of an existing file which \fBflexc++\fP cannot modify (e\&.g\&., a
scanner class header file exists, but doesn\(cq\&t define a name space, but a
\fI%namespace\fP directive was provided)\&. To solve the error the offending
directive could be omitted, the existing file could be removed, or the
existing file could be hand\-edited according to the directive\(cq\&s specification\&.
Note that \fBflexc++\fP currently does not handle the opposite error condition: if a
previously used directive is omitted, then \fBflexc++\fP does not detect the
inconsistency\&. In those cases you may encounter compilation errors\&.
.PP 
.IP o 
\fB%baseclass\-header\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the scanner class\(cq\&s base
class interface\&. Corresponding command\-line option:
\fI\-\-baseclass\-header\fP\&.
.IP 
It is an error if this directive is used and an already
existing scanner\-class header file does not include
\fI`filename\(cq\&\fP\&. 
.IP 
.IP o 
\fB%baseclass\-preinclude\fP \fI= \(dq\&filename\(dq\&\fP 
.br 

.IP 
\fIFilename\fP defines the path to the file preincluded by the
scanner\(cq\&s base\-class header\&.  It may (only) be required when a
user\-defined \fIInput\fP class is used\&. See the description of the
\fI\-\-baseclass\-preinclude\fP option for details about this
directive\&. Instead of surrounding \fIfilename\fP by double quotes
pointed brackets (like \fI<filename>\fP) can also be used\&.
.IP 
.IP o 
\fB%case\-insensitive\fP
.br 
Generates a scanner which \fIcase insensitively\fP matches regular
expressions\&. All regular expressions specified in \fBflexc++\fP\(cq\&s input
file are interpreted case insensitively and the resulting scanner
object will case insensitively interpret its input\&.
.IP 
Corresponding command\-line option: \fI\-\-cases\-insensitive\fP\&.
.IP 
When this directive is specified the resulting scanner does not
distinguish between the following rules:
.nf 

        First       // initial F is transformed to f
        first
        FIRST       // all capitals are transformed to lower case chars
                
.fi 
With a case\-insensitive scanner only the first rule can be matched,
and \fBflexc++\fP will issue warnings for the second and third rule about
rules that cannot be matched\&.
.IP 
Input processed by a case\-insensitive scanner is also handled case
insensitively\&. The above mentioned \fIFirst\fP rule is matched for
all of the following input words: \fIfirst First FIRST firST\fP\&. 
.IP 
Although the matching process proceeds case insensitively, the
matched text (as returned by the scanner\(cq\&s \fImatched()\fP member)
always contains the original, unmodified text\&. So, with the above
input \fImatched()\fP returns, respectively \fIfirst, First, FIRST\fP
and \fIfirST\fP, while matching the rule \fIFirst\fP\&.
.IP 
.IP o 
\fB%class\-header\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the scanner class\(cq\&s
interface\&. Corresponding command\-line option: \fI\-\-class\-header\fP\&.
.IP 
.IP o 
\fB%class\-name\fP \fI = \(dq\&className\(dq\&\fP 
.br 
Declares the name of the scanner class generated by \fBflexc++\fP\&. This
directive corresponds to the \fI%name\fP directive used by
\fBflex++\fP(1)\&. Contrary to \fBflex++\fP\(cq\&s \fI%name\fP declaration,
\fIclass\-name\fP may appear anywhere in the first section of the
grammar specification file\&. It may be defined only once\&. If no
\fIclass\-name\fP is specified the default class name (\fIScanner\fP)
is used\&. Corresponding command\-line option:
\fI\-\-class\-name\fP\&.
.IP 
It is an error if this directive is used and an already
existing scanner\-class header file does not define \fIclass
`className\(cq\&\fP\&.
.IP 
.IP o 
\fB%debug\fP 
.br 
Provide \fIlex\fP and its support functions with debugging code,
showing the actual parsing process on the standard output
stream\&. When included, the debugging output is active by default,
but its activity may be controlled using the \fIsetDebug(bool
on\-off)\fP member\&. Note that no \fI#ifdef DEBUG\fP macros are used in
the generated code\&. 
.IP 
.IP o 
\fB%filenames\fP \fI= \(dq\&basename\(dq\&\fP 
.br 
Defines the basename of the \fIScanner\&.h, Scanner\&.ih,\fP and
\fIScannerbase\&.h\fP files\&. E\&.g\&., when using the directive
.nf 

    %filenames = \(dq\&scanner\(dq\&
                
.fi 
the names of the generated files are, respectively, \fIscanner\&.h,
scanner\&.ih,\fP and \fIscannerbase\&.h\fP\&.  Corresponding command\-line
option: \fI\-\-filenames\fP\&. The name of the source file (by default
\fIlex\&.cc\fP) is controlled by the \fI%lex\-source\fP directive\&.
.IP 
.IP o 
\fB%implementation\-header\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the implementation header\&.
Corresponding command\-line option: \fI\-\-implementation\-header\fP\&.
.IP 
It is an error if this directive is used and an already
\fI\(cq\&filename\(cq\&\fP file does not include the scanner class header
file\&.
.IP 
.IP o 
\fB%input\-implementation\fP \fI= \(dq\&sourcefile\(dq\&\fP 
.br 
Defines the pathname of the file containing the implementation of a
user\-defined \fIInput\fP class\&.  See section \fB17\&. THE CLASS INPUT\fP
in the \fBflexc++api\fP(3) manual page for additional information
about user\-defined \fIInput\fP classes\&.
.IP 
.IP o 
\fB%input\-inline\fP \fI= \(dq\&sourcefile\(dq\&\fP 
.br 
Defines the pathname of the file containing inline implementations
of a user\-defined \fIInput\fP class\&. 
.IP 
.IP o 
\fB%input\-interface\fP \fI= \(dq\&interface\(dq\&\fP 
.br 
Defines the pathname of the file containing the interface of a
user\-defined \fIInput\fP class\&. 
.IP 
.IP o 
\fB%interactive\fP
.br 
Generate an interactive scanner\&. An interactive scanner reads lines
from the input stream, and then returns the tokens encountered on
that line\&. The interactive scanner implemented by \fBflexc++\fP only
predefines the \fIScanner(std::istream &in, std::ostream &out)\fP
constructor, by default assuming that input is read from
\fIstd::cin\fP\&. See also section \fI1\&. INTERACTIVE SCANNER\fP section
in the \fBflexc++api\fP(3) manual page\&.
.IP 
.IP o 
\fB%lex\-function\-name\fP \fI= \(dq\&funname\(dq\&\fP 
.br 
Defines the name of the scanner class\(cq\&s member to perform the
lexical scanning\&. If this directive is omitted the default name
(\fIlex\fP) is used\&. Corresponding command\-line option:
\fI\-\-lex\-function\-name\fP\&.
.IP 
.IP o 
\fB%lex\-source\fP \fI= \(dq\&filename\(dq\&\fP 
.br 
Defines the name of the file to contain the scanner member
\fIlex\fP\&. Corresponding command\-line option: \fI\-\-lex\-source\fP\&.
.IP 
.IP o 
\fB%no\-lines\fP 
.br 
Do not put \fI#line\fP preprocessor directives in the file containing
the scanner\(cq\&s \fIlex\fP function\&. If omitted \fI#line\fP directives
are added to this file, unless overridden by the command line
options \fI\-\-lines\fP and \fI\-\-no\-lines\fP\&.
.IP 
.IP o 
\fB%namespace\fP \fI= \(dq\&identifer\(dq\&\fP 
.br 
Define the scanner class in the namespace \fIidentifier\fP\&. By
default no namespace is used\&. If this directives is used the
implementation header is provided with a commented out \fIusing
namespace\fP declaration for the requested namespace\&.  In addition,
the scanner and scanner base class header files also use the
specified namespace to define their include guard directives\&.
.IP 
It is an error if this directive is used and an already
scanner\-class header file does not define \fInamespace
identifier\fP\&.
.IP 
.IP o 
\fB%print\-tokens\fP 
.br 
this directive results in the tokens as well as the matched text to
be displayed on the standard output stream, just before returning
the token to \fIlex\fP\(cq\&s caller\&. Displaying is suppressed again when
the \fIlex\&.cc\fP file is generated without using this directive\&. The
function showing the tokens (\fIScannerBase::print_\fP) is called
from \fIScanner::print()\fP, which is defined in\-line in
\fIScanner\&.h\fP\&. Calling \fIScannerBase::print_\fP, therefore, can
also easily be controlled by an option controlled by the program
using the scanner object\&.
this directive does \fInot\fP show the tokens returned and text
matched by \fBflexc++\fP itself when reading its input \fBs\fP\&. If that is
what you want, use the \fI\-\-own\-tokens\fP option\&.
.IP 
.IP o 
\fB%s\fP \fInamelist\fP 
.br 
The \fI%s\fP directive is followed by a list of one or more
identifiers, separated by blanks\&. Each identifier is the name of
an \fIinclusive start condition\fP\&.
.IP 
.IP o 
\fB%skeleton\-directory\fP \fI= \(dq\&pathname\(dq\&\fP 
.br 
Use \fIpathname\fP rather than the default (e\&.g\&.,
\fI/usr/share/flexc++\fP) path when looking for \fBflexc++\fP\(cq\&s skeleton
files\&. Corresponding command\-line option:
\fI\-\-skeleton\-directory\fP\&.
.IP 
.IP o 
\fBstartcondition\-name\fP \fI = \(dq\&startconditionName\(dq\&\fP 
.br 
By default, \fBflexc++\fP defines the enum \fIStartCondition_\fP defining
the names of start\-conditions\&. The \fI%startcondition\-name\fP
directive can be used to configure another name for the enum
containing the names of the start\-conditions\&.  It may be defined
only once\&. 
.IP 
The name of the startcondition\-enum may be modified, and the
directive can also be omitted again after it has been specified
before\&. When changing the name of the startcondition\-enum or when
reverting to the default name newly generated \fIlex\&.cc\fP and
\fIScannerBase\&.h\fP files will use the currently defined
startcondition\-enum name\&. Be advised, though, that the
startcondition\-enum name may also be used in user\-defined members
of the scanner\-class, or in the scanner\(cq\&s header and internal
header files\&. If so, the user is responsible for updating those
files to the currently defined name of the startcondition\-enum\&.
.IP 
.IP o 
\fB%target\-directory\fP \fI= \(dq\&pathname\(dq\&\fP 
.br 
\fIPathname\fP defines the directory where generated files should be
written\&.  By default this is the directory where \fBflexc++\fP is
called\&. This directive is overruled by the \fI\-\-target\-directory\fP
command\-line option\&.
.IP 
.IP o 
\fB%x\fP \fInamelist\fP 
.br 
The \fI%x\fP directive is followed by a list of one or more
identifiers, separated by blanks\&. Each identifier is the name of
an \fIexclusive start condition\fP\&.

.PP 
.SH "4\&. MINI SCANNERS"

.PP 
Mini scanners come in two flavors: inclusive mini scanners and exclusive
mini scanners\&. The rules that apply to an inclusive mini scanner are the mini
scanner\(cq\&s own rules as well as the rules which apply to no mini scanners in
particular (i\&.e\&., the rules that apply to the default (or \fIINITIAL\fP) mini
scanner)\&. Exclusive mini scanners only use the rules that were defined for
them\&. 
.PP 
To define an inclusive mini scanner use \fI%s\fP, followed by one
or more identifiers specifying the name(s) of the mini\-scanner(s)\&. To define
an exclusive mini scanner use \fI%x\fP, followed by or more identifiers
specifying the name(s) of the mini\-scanner(s)\&. The following example defines
the names of two mini scanners: \fIstring\fP and \fIcomment\fP: 
.nf 

    %x string comment 
        
.fi 
Following this, rules defined in the context of the \fIstring\fP mini
scanner (see below) will only be used when that mini scanner is active\&.
.PP 
A \fBflexc++\fP input file may contain multiple \fI%s\fP and \fI%x\fP
specifications\&.
.PP 
.SH "5\&. DEFINITIONS"

.PP 
Definitions are of the form
.nf 

identifier  regular\-expression
        
.fi 
Each definition must be entered on a line of its own\&. Definitions
associate identifiers with regular expressions, allowing the use of
\fI${identifier}\fP as synonym for its regular expression in the rules section
of \fBflexc++\fP\(cq\&s input file\&. One defined, the identifiers representing regular
expressions can also be used in subsequent definitions\&.
.PP 
Example:
.nf 

FIRST                   [A\-Za\-z_]
NAME                    {FIRST}[\-A\-Za\-z0\-9_]*
        
.fi 

.PP 
.SH "6\&. %% SEPARATOR"

.PP 
Following directives and definitions a line merely containing two consecutive
\fI%\fP characters is expected\&. Following this line the rules are defined\&. Rules
consist of regular expressions which should be recognized, possibly followed
by actions to be executed once a rule\(cq\&s regular expression has been matched\&.
.PP 
If the rule section contains a line starting with two consecutive \fI%\fP
characters, then any remaining input is ignored\&. Note that this second \fI%%\fP
separator does not have to be specified\&. It is purely optional\&. To specify a
regular expression starting with \fI%%\fP surround the \fI%%\fP with double quotes
(\fI\(dq\&%%\(dq\&\fP) or prefix the \fI%%\fP with a blank space: the \fI%%\fP\-characters are
only considered a separator if they are encountered at the very beginning of a
line\&. 
.PP 
.SH "7\&. REGULAR EXPRESSIONS"

.PP 
The regular expressions defined in \fBflexc++\fP\(cq\&s rules files are matched against 
the information passed to the scanner\(cq\&s \fIlex\fP function\&. 
.PP 
Regular expressions begin as the first non\-blank character on a line\&. Comment
is interpreted as comment as long as it isn\(cq\&t part of the regular
expresssion\&. To define a  regular expression starting with two slashes (at
least) the first slash can be escaped or double quoted\&. (E\&.g\&., \fI\(dq\&//\(dq\&\&.*\fP
defines \fBC++\fP comment to end\-of\-line)\&.
.PP 
Regular expressions end at the first blank character (to add a blank character,
e\&.g\&., a space character, to a regular expression, prefix it by a backslash or
put it in a double\-quoted string)\&.
.PP 
Actions may be associated with regular expressions\&.  At a match the action
that is associated with the regular expression is executed, after which
scanning continues when the lexical scanning function (e\&.g\&., \fIlex\fP) is
called again\&. Actions are not required, and regular expressions can be defined
without any actions at all\&. If such action\-less regular expressions are
matched then the match is performed silently, after which processing
continues\&.
.PP 
\fBFlexc++\fP tries to match as many characters of the input file as possible (i\&.e\&.,
it uses `greedy matching\(cq\&)\&. Non\-greedy matching is accomplished by a
combination of a scanner and parser and/or by using the `lookahead\(cq\& operator
(\fI/\fP)\&.
.PP 
The following regular expression `building blocks\(cq\& are available\&. More complex
regular expressions are created by combining them:
.PP 
.IP "\fIx\fP"
the character `x\(cq\&;
.IP "\fI\&.\fP"
any character (byte) except newline;
.IP "\fI[xyz]\fP"
a character class; in this case, the pattern matches either an `x\(cq\&,
a `y\(cq\&, or a `z\(cq\&\&. See also the paragraph about character classes below;
.IP "\fI[abj\-oZ]\fP"
a character class containing a range; matches an `a\(cq\&, a `b\(cq\&, any
letter from `j\(cq\& through `o\(cq\&, or a `Z\(cq\&\&. See also the paragraph about
character classes below;
.IP "\fI[^A\-Z]\fP"
a negated character class, i\&.e\&., any character except
for those in the class\&.  In this example, any non\-capital character\&. See
also the paragraph about character classes below; 
.IP "\fI\(dq\&[xyz]\e\(dq\&foo\(dq\&\fP"
text between double quotes matches the literal string: \fI[xyz]\(dq\&foo\fP;
.IP "R\(dq\&([xyz]\e\(dq\&foo)\(dq\&"
the literal string  `\fI[xyz]\e\(dq\&foo\fP\(cq\& (using a raw string literal);
.IP "\fI\eX\fP"
if X is `a\(cq\&, `b\(cq\&, `f\(cq\&, `n\(cq\&, `r\(cq\&, `t\(cq\&, or `v\(cq\&, then the ANSI\-C
interpretation of `\ex\(cq\& is matched\&. Otherwise, a literal `X\(cq\& is matched
(this is used to escape operators such as `*\(cq\&);
.IP "\fI\e0\fP"
a NUL character (ASCII code 0);
.IP "\fI\e123\fP"
the character with octal value 123;
.IP "\fI\ex2a\fP"
the character with hexadecimal value 2a;
.IP "\fI(r)\fP"
the regular expression `r\(cq\&; parentheses are used to override
precedence (see below);
.IP "\fI{name}\fP"
the expansion of the `name\(cq\& definition;
.IP "\fIr*\fP"
zero or more regular expressions `r\(cq\&\&. This also matches the empty
string;
.IP "\fIr+\fP"
one or more regular expressions `r\(cq\&;
.IP "\fIr?\fP"
zero or one regular expression `r\(cq\&\&.  This also matches the empty
string;
.IP "\fIrs\fP"
the regular expression `r\(cq\& followed by the regular expression `s\(cq\&;
called concatenation;
.IP "\fIr{m, n}\fP"
regular expression `r\(cq\& at least m, but at most n times (\fI0 <= m
<= n\fP)\&.  A regular expression to which \fI{0, 0}\fP is appended
is ignored, and a warning message is shown\&.
.IP "\fIr{m,}\fP"
regular expression `r\(cq\& m or more times (\fI0 <= m\fP);
.IP "\fIr{m}\fP"
regular expression `r\(cq\& exactly m times (\fI0 <= m\fP)\&.  A regular expression
to which \fI{0}\fP is appended is ignored, and a warning message is shown; 
.IP "\fIr|s\fP"
either regular expression `r\(cq\& or regular expression `s\(cq\&;
.IP "\fIr/s\fP"
regular expression `r\(cq\& if it is followed by regular expression
`s\(cq\&\&. The text matched by `s\(cq\& is included when determining whether this
rule results in the longest match, but `s\(cq\& is then returned to the input
before the rule\(cq\&s action (if defined) is executed\&.
.IP 
If \fBflexc++\fP detects patterns potentially not matching any text it generates 
warnings like this:
.nf 

    [Warning] input, line 7: null\-matching regular expression
        
.fi 
By placing the comment
.nf 

    //%nowarn
        
.fi 
on the line just before a regular expression that potentially does not
match any text, the warning for that regular expression is suppressed;
.IP 
.IP "\fI^r\fP"
a regular expression `r\(cq\& at the beginning of a line or file;
.IP "\fIr$\fP"
a regular expression `r\(cq\&, occurring  at the end of a line\&. This
pattern is identical to `r/\en\(cq\&;
.IP "\fI<s>r\fP"
a regular expression `r\(cq\& in start condition `s\(cq\&;
.IP "\fI<s1,s2,s3>r\fP"
a regular expression `r\(cq\& in start conditions s1, s2, or s3;
.IP "\fI<*>r\fP"
a regular expression `r\(cq\& in all start conditions;
.IP "\fI<\fP\fI<EOF>\fP\fI>\fP"
an end\-of\-file;
.IP "\fI<s1,s2><\fP\fI<EOF>\fP\fI>\fP"
an end\-of\-file when in start conditions s1 or s2 \&.

.PP 
\fBCharacter classes\fP
.PP 
Inside a character class all regular expression operators lose their special
meanings, except for the escape character (\fI\e\fP), the character range
operator \fI\-\fP, the end of character class operator \fI]\fP, and, at the
beginning of the class, \fI^\fP\&. All ordinary escape sequences are supported,
all other escaped characters are interpreted as literal characters (e\&.g\&.,
\fI\ec\fP is a literal \fIc\fP)\&.
.PP 
To add a closing bracket to a character class use \fI[]\fP or \fI\e]\fP\&. To add a
closing bracket to a negated character class use \fI[^]\fP (or use \fI[^\fP
followed by \fI\e]\fP somewhere within the character class)\&. Minus characters are
used to define character ranges (e\&.g\&., \fI[a\-d]\fP, defining \fI[abcd]\fP) except
in the following cases, where \fBflexc++\fP recognizes a literal minus character:
\fI[\-\fP, or \fI[^\-\fP (a minus at the very beginning of a character class); 
\fI\-]\fP (a minus at the very end of a character class); 
or \fI\e\-\fP (an escaped minus character))
Once a character class has started, all
subsequent character (ranges) are added to the set, until the final closing
bracket (\fI]\fP) has been reached\&.
.PP 
\fBCharacter constants\fP
.PP 
Character constants are surrounded by single quote characters\&. They match
single characters which, however, can be specified in various ways\&.
.IP o 
The simplest form consists of just a single character: the pattern
\fI\(cq\&a\(cq\&\fP matches the character \fIa\fP, the pattern \fI\(cq\&\&.\(cq\&\fP matches the
dot\-character (\fI\&.\fP thus loses its meaning of `any character but
the newline character\(cq\&);
.IP o 
Standard escape characters (like \fI\(cq\&\en\(cq\&, \(cq\&\ef\(cq\&, \(cq\&\eb\(cq\&\fP) are converted
to their (single character) ascii\-values, matching those characters
when they are encountered in the input\&. Therefore, of the following
two rules the second is never matched (with \fBflexc++\fP generating a
corresponding warning, since both match the newline character):
.nf 

    \(cq\&\en\(cq\&    return 1;
    \en      return 2;
       
.fi 
.IP o 
Octal numbers, starting with a backslash and consisting of three
octal digits are converted to a number matching input characters of
those numbers\&. E\&.g\&., \fI\(cq\&\e101\(cq\&\fP is converted to 65, matching ascii
character \fIA\fP;
.IP o 
Likewise, hexadecimal numbers, starting with \fIx\fP and followed by
two hexadecimal digits are converted to a number matching input
characters whose values equal those numbers\&. E\&.g\&., \fI\(cq\&\ex41\(cq\&\fP is also
matching ascii character \fIA\fP;
.IP o 
Other escaped single characters match those characters\&. E\&.g\&.,
\fI\(cq\&\e\e\(cq\&\fP matches the single backslash, \fI\(cq\&\e\(cq\&\(cq\&\fP matches the single
quote character\&. But also: \fI\(cq\&\eF\(cq\&\fP matches the single \fIF\fP
character, since no special escaped meaning is associated with \fIF\fP\&.

.PP 
Considering the above, to match character (in this example: except for the
newline character) including its surrounding quotes a regular expression
consisting of an escaped quote character, followed by any character, followed
by a quote character can be used:
.nf 

    \e\(cq\&\&.\(cq\&        // matches characters surrounded by quotes
        
.fi 

.PP 
\fBOperator precedence\fP
.PP 
The regular expressions listed above are grouped according to precedence, from
highest precedence at the top to lowest at the bottom\&. From lowest to highest
precedence, the operators are:
.IP o 
\fI|\fP: the or\-operator at the end of a line (instead of an action)
indicates that this expression\(cq\&s action is identical to the action of the next
rule\&. 
.IP o 
\fI/\fP: the look\-ahead operator;
.IP o 
\fI|\fP: the or\-operator withn a regular expression;
.IP o 
\fICHAR\fP: individual elements of the regular expression: characters,
strings, quoted characters, escaped characters, character sets etc\&. are all
considered \fICHAR\fP elements\&. Multiple \fICHAR\fP elements can be combined by
enclosing them in parentheses (e\&.g\&., \fI(abc)+\fP indicates sequences of \fIabc\fP
characters, like \fIabcabcabc\fP);
.IP o 
\fI*, ?, +, {\fP: multipliers:
.br 
\fI?\fP: zero or one occurrence  of the previous element;
.br 
\fI+\fP: one or more repetitions of the previous element;
.br 
\fI*\fP: zero or more repetitions of the previous element;
.br 
\fI{\&.\&.\&.}\fP: interval specification: a specified number of
repetitions of the previous element (see above for specific
forms of the interval specification)
.IP o 
\fI{+}, {\-}\fP: set operators (\fI{+}\fP computing the union of two sets,
\fI{\-}\fP computing the difference of the left\-hand side set
minus the elements in the right\-hand side set);

.PP 
The lex standard defines concatenation as having a higher precedence than the
interval expression\&. This is different from many other regular expression
engines, and \fBflexc++\fP follows these latter engines, giving all `multiplication
operators\(cq\& equal priority\&.
.PP 
Name expansion has the same precedence as grouping (using parentheses to
influence the precedence of the other operators in the regular expression)\&.
Since the name expansion is treated as a group in \fBflexc++\fP, it is not allowed to
use the lookahead operator in a name definition (a named pattern, defined in
the definition section)\&.
.PP 
\fBPredefined sets of characters\fP
.PP 
Character classes can also contain character class expressions\&. These are
expressions enclosed inside \fI[:\fP and \fI:]\fP delimiters (which themselves
must appear between the \fI[\fP and \fI]\fP of the character class\&. Other elements
may occur inside the character class as well)\&. The character class expressions
are:
.nf 
     
     [:alnum:] [:alpha:] [:blank:]
     [:cntrl:] [:digit:] [:graph:]
     [:lower:] [:print:] [:punct:]
     [:space:] [:upper:] [:xdigit:]
        
.fi 

.PP 
Character class expressions designate a set of characters equivalent to
the corresponding standard \fBC\fP isXXX function\&. For example, \fI[:alnum:]\fP
designates those characters for which \fIisalnum\fP returns true \- i\&.e\&., any
alphabetic or numeric character\&.  For example, the following character classes
are all equivalent:
.nf 
 
    [[:alnum:]]
    [[:alpha:][:digit:]]
    [[:alpha:][0\-9]]
    [a\-zA\-Z0\-9]
        
.fi 

.PP 
A negated character class such as the example \fI[^A\-Z]\fP above will match a
newline unless \fI\en\fP (or an equivalent escape sequence) is one of the
characters explicitly present in the negated character class (e\&.g\&.,
\fI[^A\-Z\en]\fP)\&. This differs from the way many other regular expression tools
treat negated character classes, but unfortunately the inconsistency is
historically entrenched\&. Matching newlines means that a pattern like \fI[^\(dq\&]*\fP
can match the entire input unless there\(cq\&s another quote in the input\&.
.PP 
\fBFlexc++\fP allows negation of character class expressions by prepending \fI^\fP to
the POSIX character class name\&.
.nf 
                
    [:^alnum:] [:^alpha:] [:^blank:]
    [:^cntrl:] [:^digit:] [:^graph:]
    [:^lower:] [:^print:] [:^punct:]
    [:^space:] [:^upper:] [:^xdigit:]
        
.fi 

.PP 
\fBCombining character sets\fP
.PP 
The \fI{\-}\fP operator computes the difference of two character classes\&. For
example, \fI[a\-c]{\-}[b\-z]\fP represents all the characters in the class
\fI[a\-c]\fP that are not in the class \fI[b\-z]\fP (which in this case, is just the
single character \fIa\fP)\&. The \fI{\-}\fP operator is left associative, so
\fI[abc]{\-}[b]{\-}[c]\fP is the same as \fI[a]\fP\&.
.PP 
The \fI{+}\fP operator computes the union of two character classes\&. For example,
\fI[a\-z]{+}[0\-9]\fP is the same as \fI[a\-z0\-9]\fP\&. This operator is useful when
preceded by the result of a difference operation, as in,
\fI[[:alpha:]]{\-}[[:lower:]]{+}[q]\fP, which is equivalent to \fI[A\-Zq]\fP in the
\fBC\fP locale\&.
.PP 
\fBTrailing context\fP
.PP 
A rule can have at most one instance of trailing context (the \fI/\fP operator
or the \fI$\fP operator)\&. The start condition, \fI^\fP, and \fI<<EOF>>\fP patterns
can only occur at the beginning of a pattern, and cannot be surrounded by
parentheses\&. The characters \fI^\fP and \fI$\fP only have their special properties
at, respectively, the beginning and end of regular expressions\&. In all other
cases they are treated as a normal characters\&.
.PP 
.SH "8\&. SPECIFICATION EXAMPLE"

.PP 
.nf 

%option debug

%x comment

NAME    [[:alpha:]][_[:alnum:]]*

%%

\(dq\&//\(dq\&\&.*          // ignore

\(dq\&/*\(dq\&            begin(StartCondition_::comment);

<comment>\&.|\en   // ignore
<comment>\(dq\&*/\(dq\&   begin(StartCondition_::INITIAL);

^a              return 1;
a               return 2;
a$              return 3;
{NAME}          return 4;

\&.|\en            // ignore
        
.fi 

.PP 
)
.PP 
.SH "FILES"

.PP 
\fBFlexc++\fP\(cq\&s default skeleton files are in \fI/usr/share/flexc++\fP\&.
.br 
By default, \fBflexc++\fP generates the following files:
.IP o 
\fIScanner\&.h\fP: the header file containing the scanner class\(cq\&s
interface\&. 
.IP o 
\fIScannerbase\&.h\fP: the header file containing the interface of the 
scanner class\(cq\&s base class\&.
.IP o 
\fIScanner\&.ih\fP: the internal header file that is meant to be included
by the scanner class\(cq\&s source files (e\&.g\&., it is included by
\fIlex\&.cc\fP, see the next item\(cq\&s file), and that should contain all
declarations required for compiling the scanner class\(cq\&s sources\&.
.IP o 
\fIlex\&.cc\fP: the source file implementing the scanner class member
function \fIlex\fP (and support functions), performing the lexical
scan\&.

.PP 
.SH "SEE ALSO"

.PP 
\fBflexc++\fP(1), \fBflexc++api\fP(3)
.PP 
.SH "BUGS"

.PP 
.IP o 
The priority of interval expressions (\fI{\&.\&.\&.}\fP) equals the priority
of other multiplicative operators (like \fI*\fP)\&.

.PP 
.SH "COPYRIGHT"
This is free software, distributed under the terms of the 
GNU General Public License (GPL)\&.
.PP 
.SH "AUTHOR"
Frank B\&. Brokken (\fBf\&.b\&.brokken@rug\&.nl\fP),
.br 
Jean\-Paul van Oosten (\fBj\&.p\&.van\&.oosten@rug\&.nl\fP),
.br 
Richard Berendsen (\fBrichardberendsen@xs4all\&.nl\fP) (until 2010)\&.
.br 

.PP