.TH "MANDOC_ESCAPE" "3" "$Mdocdate: January 21 2015 $" "Debian" "Library Functions Manual"
.nh
.if n .ad l
.SH "NAME"
\fBmandoc_escape\fR
\- parse roff escape sequences
.SH "SYNOPSIS"
\fB#include <sys/types.h>\fR
.br
\fB#include <mandoc.h>\fR
.sp
\fIenum mandoc_esc\fR
.PD 0
.HP 4n
\fBmandoc_escape\fR(\fIconst\ char\ **end\fR,
\fIconst\ char\ **start\fR,
\fIint\ *sz\fR);
.PD
.SH "DESCRIPTION"
This function scans a
roff(7)
escape sequence.
.PP
An escape sequence consists of
.PD 0
.TP 4n
\fB-\fR
an initial backslash character
(\(oq\e\(cq),
.TP 4n
\fB-\fR
a single ASCII character called the escape sequence identifier,
.TP 4n
\fB-\fR
and, with only a few exceptions, an argument.
.PD
.PP
Arguments can be given in the following forms; some escape sequence
identifiers only accept some of these forms as specified below.
The first three forms are called the standard forms.
.TP 4n
\&In brackets: \fB\&[\fR\fIargument\fR\fB\&]\fR
The argument starts after the initial
\(oq\&[\(cq,
ends before the final
\(oq\&]\(cq,
and the escape sequence ends with the final
\(oq\&]\(cq.
.TP 4n
Two-character argument short form: \fB\&(\fR\fIar\fR
This form can only be used for arguments
consisting of exactly two characters.
It has the same effect as
\fB\&[\fR\fIar\fR\fB\&]\fR.
.TP 4n
One-character argument short form: \fIa\fR
This form can only be used for arguments
consisting of exactly one character.
It has the same effect as
\fB\&[\fR\fIa\fR\fB\&]\fR.
.TP 4n
Delimited form: \fIC\fR\fIargument\fR\fIC\fR
The argument starts after the initial delimiter character
\fIC\fR,
ends before the next occurrence of the delimiter character
\fIC\fR,
and the escape sequence ends with that second
\fIC\fR.
Some escape sequences allow arbitrary characters
\fIC\fR
as quoting characters, some restrict the range of characters
that can be used as quoting characters.
.PP
Upon function entry,
\fIend\fR
is expected to point to the escape sequence identifier.
The values passed in as
\fIstart\fR
and
\fIsz\fR
are ignored and overwritten.
.PP
By design, this function cannot handle those
roff(7)
escape sequences that require in-place expansion, in particular
user-defined strings
\fB\e*\fR,
number registers
\fB\en\fR,
width measurements
\fB\ew\fR,
and numerical expression control
\fB\eB\fR.
These are handled by
\fBroff_res\fR(),
a private preprocessor function called from
\fBroff_parseln\fR(),
see the file
\fIroff.c\fR.
.PP
The function
\fBmandoc_escape\fR()
is used
.PD 0
.TP 4n
\fB-\fR
recursively by itself, because some escape sequence arguments can
in turn contain other escape sequences,
.TP 4n
\fB-\fR
for error detection internally by the
roff(7)
parser part of the
mandoc(3)
library, see the file
\fIroff.c\fR,
.TP 4n
\fB-\fR
above all externally by the
mandoc
formatting modules, in particular
\fB\-Tascii\fR
and
\fB\-Thtml\fR,
for formatting purposes, see the files
\fIterm.c\fR
and
\fIhtml.c\fR,
.TP 4n
\fB-\fR
and rarely externally by high-level utilities using the mandoc library,
for example
makewhatis(8),
to purge escape sequences from text.
.PD
.SH "RETURN VALUES"
Upon function return, the pointer
\fIend\fR
is set to the character after the end of the escape sequence,
such that the calling higher-level parser can easily continue.
.PP
For escape sequences taking an argument, the pointer
\fIstart\fR
is set to the beginning of the argument and
\fIsz\fR
is set to the length of the argument.
For escape sequences not taking an argument,
\fIstart\fR
is set to the character after the end of the sequence and
\fIsz\fR
is set to 0.
Both
\fIstart\fR
and
\fIsz\fR
may be
\fRNULL\fR;
in that case, the argument and the length are not returned.
.PP
For sequences taking an argument, the function
\fBmandoc_escape\fR()
returns one of the following values:
.TP 4n
\fRESCAPE_FONT\fR
The escape sequence
\fB\ef\fR
taking an argument in standard form:
\fB\ef[\fR, \fB\ef(\fR, \fB\ef\fR\fIa\fR.
Two-character arguments starting with the character
\(oqC\(cq
are reduced to one-character arguments by skipping the
\(oqC\(cq.
More specific values are returned for the most commonly used arguments:
.TS
l l.
.PP
argument	return value
.PP
\fBR\fR or \fB1\fR	\fRESCAPE_FONTROMAN\fR
.PP
\fBI\fR or \fB2\fR	\fRESCAPE_FONTITALIC\fR
.PP
\fBB\fR or \fB3\fR	\fRESCAPE_FONTBOLD\fR
.PP
\fBP\fR	\fRESCAPE_FONTPREV\fR
.PP
\fBBI\fR	\fRESCAPE_FONTBI\fR
.TE
.TP 4n
\fRESCAPE_SPECIAL\fR
The escape sequence
\fB\eC\fR
taking an argument delimited with the single quote character
and, as a special exception, the escape sequences
\fInot\fR
having an identifier, that is, those where the argument, in standard
form, directly follows the initial backslash:
\fB\eC'\fR, \fB\e[\fR, \fB\e(\fR, \fB\e\fR\fIa\fR.
Note that the one-character argument short form can only be used for
argument characters that do not clash with escape sequence identifiers.
.sp
If the argument matches one of the forms described below under
\fRESCAPE_UNICODE\fR,
that value is returned instead.
.sp
The
\fRESCAPE_SPECIAL\fR
special character escape sequences can be rendered using the functions
\fBmchars_spec2cp\fR()
and
\fBmchars_spec2str\fR()
described in the
mchars_alloc(3)
manual.
.TP 4n
\fRESCAPE_UNICODE\fR
Escape sequences of the same format as described above under
\fRESCAPE_SPECIAL\fR,
but with an argument of the forms
\fBu\fR\fIXXXX\fR,
\fBu\fR\fIYXXXX\fR,
or
\fBu10\fR\fIXXXX\fR
where
\fIX\fR
and
\fIY\fR
are hexadecimal digits and
\fIY\fR
is not zero:
\fB\eC'u\fR, \fB\e[u\fR.
As a special exception,
\fIstart\fR
is set to the character after the
\fBu\fR,
and the
\fIsz\fR
return value does not include the
\fBu\fR
either.
.sp
Such Unicode character escape sequences can be rendered using the function
\fBmchars_num2uc\fR()
described in the
mchars_alloc(3)
manual.
.TP 4n
\fRESCAPE_NUMBERED\fR
The escape sequence
\fB\eN\fR
followed by a delimited argument.
The delimiter character is arbitrary except that digits cannot be used.
If a digit is encountered instead of the opening delimiter, that
digit is considered to be the argument and the end of the sequence, and
\fRESCAPE_IGNORE\fR
is returned.
.sp
Such ASCII character escape sequences can be rendered using the function
\fBmchars_num2char\fR()
described in the
mchars_alloc(3)
manual.
.TP 4n
\fRESCAPE_OVERSTRIKE\fR
The escape sequence
\fB\eo\fR
followed by an argument delimited by an arbitrary character.
.TP 4n
\fRESCAPE_IGNORE\fR
.PP
.RS 4n
.PD 0
.TP 4n
\fB\(bu\fR
The escape sequence
\fB\es\fR
followed by an argument in standard form or by an argument delimited
by the single quote character:
\fB\es'\fR, \fB\es[\fR, \fB\es(\fR, \fB\es\fR\fIa\fR.
As a special exception, an optional
\(oq+\(cq
or
\(oq\-\(cq
character is allowed after the
\(oqs\(cq
for all forms.
.PD
.TP 4n
\fB\(bu\fR
The escape sequences
\fB\eF\fR,
\fB\eg\fR,
\fB\ek\fR,
\fB\eM\fR,
\fB\em\fR,
\fB\en\fR,
\fB\eV\fR,
and
\fB\eY\fR
followed by an argument in standard form.
.TP 4n
\fB\(bu\fR
The escape sequences
\fB\eA\fR,
\fB\eb\fR,
\fB\eD\fR,
\fB\eR\fR,
\fB\eX\fR,
and
\fB\eZ\fR
followed by an argument delimited by an arbitrary character.
.TP 4n
\fB\(bu\fR
The escape sequences
\fB\eH\fR,
\fB\eh\fR,
\fB\eL\fR,
\fB\el\fR,
\fB\eS\fR,
\fB\ev\fR,
and
\fB\ex\fR
followed by an argument delimited by a character that cannot occur
in numerical expressions.
However, if any character that can occur in numerical expressions
is found instead of a delimiter, the sequence is considered to end
with that character, and
\fRESCAPE_ERROR\fR
is returned.
.PD 0
.PP
.RE
.PD
.TP 4n
\fRESCAPE_ERROR\fR
Escape sequences taking an argument but not matching any of the above patterns.
In particular, that happens if the end of the logical input line
is reached before the end of the argument.
.PP
For sequences that do not take an argument, the function
\fBmandoc_escape\fR()
returns one of the following values:
.TP 4n
\fRESCAPE_SKIPCHAR\fR
The escape sequence
"\ez".
.TP 4n
\fRESCAPE_NOSPACE\fR
The escape sequence
"\ec".
.TP 4n
\fRESCAPE_IGNORE\fR
The escape sequences
"\ed"
and
"\eu".
.SH "FILES"
This function is implemented in
\fImandoc.c\fR.
.SH "SEE ALSO"
mchars_alloc(3),
mandoc_char(7),
roff(7)
.SH "HISTORY"
This function has been available since mandoc 1.11.2.
.SH "AUTHORS"
Kristaps Dzonsons <\fIkristaps@bsd.lv\fR>
.br
Ingo Schwarze <\fIschwarze@openbsd.org\fR>
.SH "BUGS"
The function doesn't cleanly distinguish between sequences that are
valid and supported, valid and ignored, valid and unsupported,
syntactically invalid, or undefined.
For sequences that are ignored or unsupported, it doesn't tell
whether that deficiency is likely to cause major formatting problems
and/or loss of document content.
The function is already rather complicated and still parses some
sequences incorrectly.