.TH "MANDOC" "3" "$Mdocdate: January 15 2015 $" "Debian" "Library Functions Manual" .nh .if n .ad l .SH "NAME" \fBmandoc\fR, \fBman_deroff\fR, \fBman_meta\fR, \fBman_mparse\fR, \fBman_node\fR, \fBmdoc_deroff\fR, \fBmdoc_meta\fR, \fBmdoc_node\fR, \fBmparse_alloc\fR, \fBmparse_free\fR, \fBmparse_getkeep\fR, \fBmparse_keep\fR, \fBmparse_open\fR, \fBmparse_readfd\fR, \fBmparse_reset\fR, \fBmparse_result\fR, \fBmparse_strerror\fR, \fBmparse_strlevel\fR \fBmparse_wait\fR, \- mandoc macro compiler library .SH "SYNOPSIS" \fB#include \fR .br \fB#include \fR .PP \fB#define ASCII_NBRSP\fR .br \fB#define ASCII_HYPH\fR .br \fB#define ASCII_BREAK\fR .sp \fIstruct mparse *\fR .PD 0 .HP 4n \fBmparse_alloc\fR(\fIint\ options\fR, \fIenum\ mandoclevel\ wlevel\fR, \fImandocmsg\ mmsg\fR, \fIconst\ struct\ mchars\ *mchars\fR, \fIchar\ *defos\fR); .PD .PP \fIvoid\fR .PD 0 .HP 4n \fB(*mandocmsg)\fR(\fIenum\ mandocerr\ errtype\fR, \fIenum\ mandoclevel\ level\fR, \fIconst\ char\ *file\fR, \fIint\ line\fR, \fIint\ col\fR, \fIconst\ char\ *msg\fR); .PD .PP \fIvoid\fR .PD 0 .HP 4n \fBmparse_free\fR(\fIstruct\ mparse\ *parse\fR); .PD .PP \fIconst char *\fR .PD 0 .HP 4n \fBmparse_getkeep\fR(\fIconst\ struct\ mparse\ *parse\fR); .PD .PP \fIvoid\fR .PD 0 .HP 4n \fBmparse_keep\fR(\fIstruct\ mparse\ *parse\fR); .PD .PP \fIenum mandoclevel\fR .PD 0 .HP 4n \fBmparse_open\fR(\fIstruct\ mparse\ *parse\fR, \fIint\ *fd\fR, \fIconst\ char\ *fname\fR); .PD .PP \fIenum mandoclevel\fR .PD 0 .HP 4n \fBmparse_readfd\fR(\fIstruct\ mparse\ *parse\fR, \fIint\ fd\fR, \fIconst\ char\ *fname\fR); .PD .PP \fIvoid\fR .PD 0 .HP 4n \fBmparse_reset\fR(\fIstruct\ mparse\ *parse\fR); .PD .PP \fIvoid\fR .PD 0 .HP 4n \fBmparse_result\fR(\fIstruct\ mparse\ *parse\fR, \fIstruct\ mdoc\ **mdoc\fR, \fIstruct\ man\ **man\fR, \fIchar\ **sodest\fR); .PD .PP \fIconst char *\fR .PD 0 .HP 4n \fBmparse_strerror\fR(\fIenum\ mandocerr\fR); .PD .PP \fIconst char *\fR .PD 0 .HP 4n \fBmparse_strlevel\fR(\fIenum\ mandoclevel\fR); .PD .PP \fIenum mandoclevel\fR .PD 0 .HP 4n \fBmparse_wait\fR(\fIstruct\ mparse\ *parse\fR); .PD .PP \fB#include \fR .br \fB#include \fR .br \fB#include \fR .sp \fIvoid\fR .PD 0 .HP 4n \fBmdoc_deroff\fR(\fIchar\ **dest\fR, \fIconst\ struct\ mdoc_node\ *node\fR); .PD .PP \fIconst struct mdoc_meta *\fR .PD 0 .HP 4n \fBmdoc_meta\fR(\fIconst\ struct\ mdoc\ *mdoc\fR); .PD .PP \fIconst struct mdoc_node *\fR .PD 0 .HP 4n \fBmdoc_node\fR(\fIconst\ struct\ mdoc\ *mdoc\fR); .PD .PP \fIextern const char * const * mdoc_argnames;\fR .br \fIextern const char * const * mdoc_macronames;\fR .sp \fB#include \fR .br \fB#include \fR .br \fB#include \fR .sp \fIvoid\fR .PD 0 .HP 4n \fBman_deroff\fR(\fIchar\ **dest\fR, \fIconst\ struct\ man_node\ *node\fR); .PD .PP \fIconst struct man_meta *\fR .PD 0 .HP 4n \fBman_meta\fR(\fIconst\ struct\ man\ *man\fR); .PD .PP \fIconst struct mparse *\fR .PD 0 .HP 4n \fBman_mparse\fR(\fIconst\ struct\ man\ *man\fR); .PD .PP \fIconst struct man_node *\fR .PD 0 .HP 4n \fBman_node\fR(\fIconst\ struct\ man\ *man\fR); .PD .PP \fIextern const char * const * man_macronames;\fR .SH "DESCRIPTION" The \fBmandoc\fR library parses a UNIX manual into an abstract syntax tree (AST). UNIX manuals are composed of mdoc(7) or man(7), and may be mixed with roff(7), tbl(7), and eqn(7) invocations. .PP The following describes a general parse sequence: .TP 5n 1.\& initiate a parsing sequence with mchars_alloc(3) and \fBmparse_alloc\fR(); .TP 5n 2.\& open a file with open(2) or \fBmparse_open\fR(); .TP 5n 3.\& parse it with \fBmparse_readfd\fR(); .TP 5n 4.\& retrieve the syntax tree with \fBmparse_result\fR(); .TP 5n 5.\& iterate over parse nodes with \fBmdoc_node\fR() or \fBman_node\fR(); .TP 5n 6.\& free all allocated memory with \fBmparse_free\fR() and mchars_free(3), or invoke \fBmparse_reset\fR() and parse new files. .SH "REFERENCE" This section documents the functions, types, and variables available via <\fImandoc.h\fR>, with the exception of those documented in mandoc_escape(3) and mchars_alloc(3). .SS "Types" .PP .R "\fIenum mandocerr\fR .br An error or warning message during parsing. .PP .R "\fIenum mandoclevel\fR .br A classification of an \fIenum mandocerr\fR as regards system operation. .PP .R "\fIstruct mchars\fR .br An opaque pointer to a a character table. Created with mchars_alloc(3) and freed with mchars_free(3). .PP .R "\fIstruct mparse\fR .br An opaque pointer to a running parse sequence. Created with \fBmparse_alloc\fR() and freed with \fBmparse_free\fR(). This may be used across parsed input if \fBmparse_reset\fR() is called between parses. .PP .R "\fImandocmsg\fR .br A prototype for a function to handle error and warning messages emitted by the parser. .SS "Functions" .PP .R "\fBman_deroff\fR() .br Obtain a text-only representation of a \fIstruct man_node\fR, including text contained in its child nodes. To be used on children of the pointer returned from \fBman_node\fR(). When it is no longer needed, the pointer returned from \fBman_deroff\fR() can be passed to free(3). .PP .R "\fBman_meta\fR() .br Obtain the meta-data of a successful man(7) parse. This may only be used on a pointer returned by \fBmparse_result\fR(). Declared in <\fIman.h\fR>, implemented in \fIman.c\fR. .PP .R "\fBman_mparse\fR() .br Get the parser used for the current output. Declared in <\fIman.h\fR>, implemented in \fIman.c\fR. .PP .R "\fBman_node\fR() .br Obtain the root node of a successful man(7) parse. This may only be used on a pointer returned by \fBmparse_result\fR(). Declared in <\fIman.h\fR>, implemented in \fIman.c\fR. .PP .R "\fBmdoc_deroff\fR() .br Obtain a text-only representation of a \fIstruct mdoc_node\fR, including text contained in its child nodes. To be used on children of the pointer returned from \fBmdoc_node\fR(). When it is no longer needed, the pointer returned from \fBmdoc_deroff\fR() can be passed to free(3). .PP .R "\fBmdoc_meta\fR() .br Obtain the meta-data of a successful mdoc parse. This may only be used on a pointer returned by \fBmparse_result\fR(). Declared in <\fImdoc.h\fR>, implemented in \fImdoc.c\fR. .PP .R "\fBmdoc_node\fR() .br Obtain the root node of a successful mdoc parse. This may only be used on a pointer returned by \fBmparse_result\fR(). Declared in <\fImdoc.h\fR>, implemented in \fImdoc.c\fR. .PP .R "\fBmparse_alloc\fR() .br Allocate a parser. The arguments have the following effect: .RS 5n .TP 9n \fIoptions\fR When the \fRMPARSE_MDOC\fR or \fRMPARSE_MAN\fR bit is set, only that parser is used. Otherwise, the document type is automatically detected. .sp When the \fRMPARSE_SO\fR bit is set, roff(7) \fB\&so\fR file inclusion requests are always honoured. Otherwise, if the request is the only content in an input file, only the file name is remembered, to be returned in the \fIsodest\fR argument of \fBmparse_result\fR(). .sp When the \fRMPARSE_QUICK\fR bit is set, parsing is aborted after the NAME section. This is for example useful in makewhatis(8) \fB\-Q\fR to quickly build minimal databases. .TP 9n \fIwlevel\fR Can be set to \fRMANDOCLEVEL_BADARG\fR, \fRMANDOCLEVEL_ERROR\fR, or \fRMANDOCLEVEL_WARNING\fR. Messages below the selected level will be suppressed. .TP 9n \fImmsg\fR A callback function to handle errors and warnings. See \fImain.c\fR for an example. .TP 9n \fImchars\fR An opaque pointer to a a character table obtained from mchars_alloc(3). .TP 9n \fIdefos\fR A default string for the mdoc(7) \(oq\&Os\(cq macro, overriding the \fROSNAME\fR preprocessor definition and the results of uname(3). .RE .sp The same parser may be used for multiple files so long as \fBmparse_reset\fR() is called between parses. \fBmparse_free\fR() must be called to free the memory allocated by this function. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_free\fR() .br Free all memory allocated by \fBmparse_alloc\fR(). Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_getkeep\fR() .br Acquire the keep buffer. Must follow a call of \fBmparse_keep\fR(). Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_keep\fR() .br Instruct the parser to retain a copy of its parsed input. This can be acquired with subsequent \fBmparse_getkeep\fR() calls. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_open\fR() .br If the \fIfname\fR ends in \fI.gz\fR, open with gunzip(1); otherwise, with open(2). If open(2) fails, append \fI.gz\fR and try with gunzip(1). Return a file descriptor open for reading in \fIfd\fR, or -1 on failure. It can be passed to \fBmparse_readfd\fR() or used directly. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_readfd\fR() .br Parse a file descriptor opened with open(2) or \fBmparse_open\fR(). Pass the associated filename in \fIfname\fR. Calls \fBmparse_wait\fR() before returning. This function may be called multiple times with different parameters; however, \fBmparse_reset\fR() should be invoked between parses. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_reset\fR() .br Reset a parser so that \fBmparse_readfd\fR() may be used again. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_result\fR() .br Obtain the result of a parse. One of the three pointers will be filled in. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_strerror\fR() .br Return a statically-allocated string representation of an error code. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_strlevel\fR() .br Return a statically-allocated string representation of a level code. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .PP .R "\fBmparse_wait\fR() .br Bury a gunzip(1) child process that was spawned with \fBmparse_open\fR(). To be called after the parse sequence is complete. Not needed after \fBmparse_readfd\fR(), but does no harm in that case, either. Returns \fRMANDOCLEVEL_OK\fR on success and \fRMANDOCLEVEL_SYSERR\fR on failure, that is, when wait(2) fails, or when gunzip(1) died from a signal or exited with non-zero status. Declared in <\fImandoc.h\fR>, implemented in \fIread.c\fR. .SS "Variables" .PP .R "\fIman_macronames\fR .br The string representation of a man macro as indexed by \fIenum mant\fR. .PP .R "\fImdoc_argnames\fR .br The string representation of a mdoc macro argument as indexed by \fIenum mdocargt\fR. .PP .R "\fImdoc_macronames\fR .br The string representation of a mdoc macro as indexed by \fIenum mdoct\fR. .SH "IMPLEMENTATION NOTES" This section consists of structural documentation for mdoc(7) and man(7) syntax trees and strings. .SS "Man and Mdoc Strings" Strings may be extracted from mdoc and man meta-data, or from text nodes (MDOC_TEXT and MAN_TEXT, respectively). These strings have special non-printing formatting cues embedded in the text itself, as well as roff(7) escapes preserved from input. Implementing systems will need to handle both situations to produce human-readable text. In general, strings may be assumed to consist of 7-bit ASCII characters. .PP The following non-printing characters may be embedded in text strings: .TP 8n \fRASCII_NBRSP\fR A non-breaking space character. .TP 8n \fRASCII_HYPH\fR A soft hyphen. .TP 8n \fRASCII_BREAK\fR A breakable zero-width space. .PP Escape characters are also passed verbatim into text strings. An escape character is a sequence of characters beginning with the backslash (\(oq\e\(cq). To construct human-readable text, these should be intercepted with mandoc_escape(3) and converted with one the functions described in mchars_alloc(3). .SS "Man Abstract Syntax Tree" This AST is governed by the ontological rules dictated in man(7) and derives its terminology accordingly. .PP The AST is composed of \fIstruct man_node\fR nodes with element, root and text types as declared by the \fItype\fR field. Each node also provides its parse point (the \fIline\fR, \fIsec\fR, and \fIpos\fR fields), its position in the tree (the \fIparent\fR, \fIchild\fR, \fInext\fR and \fIprev\fR fields) and some type-specific data. .PP The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes. .TP 11n ROOT \(<- mnode+ .PD 0 .TP 11n mnode \(<- ELEMENT | TEXT | BLOCK .TP 11n BLOCK \(<- HEAD BODY .TP 11n HEAD \(<- mnode* .TP 11n BODY \(<- mnode* .TP 11n ELEMENT \(<- ELEMENT | TEXT* .TP 11n TEXT \(<- [[:ascii:]]* .PD .PP The only elements capable of nesting other elements are those with next-line scope as documented in man(7). .SS "Mdoc Abstract Syntax Tree" This AST is governed by the ontological rules dictated in mdoc(7) and derives its terminology accordingly. "In-line" elements described in mdoc(7) are described simply as "elements". .PP The AST is composed of \fIstruct mdoc_node\fR nodes with block, head, body, element, root and text types as declared by the \fItype\fR field. Each node also provides its parse point (the \fIline\fR, \fIsec\fR, and \fIpos\fR fields), its position in the tree (the \fIparent\fR, \fIchild\fR, \fInchild\fR, \fInext\fR and \fIprev\fR fields) and some type-specific data, in particular, for nodes generated from macros, the generating macro in the \fItok\fR field. .PP The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes. .TP 11n ROOT \(<- mnode+ .PD 0 .TP 11n mnode \(<- BLOCK | ELEMENT | TEXT .TP 11n BLOCK \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]] .TP 11n ELEMENT \(<- TEXT* .TP 11n HEAD \(<- mnode* .TP 11n BODY \(<- mnode* [ENDBODY mnode*] .TP 11n TAIL \(<- mnode* .TP 11n TEXT \(<- [[:ascii:]]* .PD .PP Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the BLOCK production: these refer to punctuation marks. Furthermore, although a TEXT node will generally have a non-zero-length string, in the specific case of \(oq\&.Bd \-literal\(cq, an empty line will produce a zero-length string. Multiple body parts are only found in invocations of \(oq\&Bl \-column\(cq, where a new body introduces a new phrase. .PP The mdoc(7) syntax tree accommodates for broken block structures as well. The ENDBODY node is available to end the formatting associated with a given block before the physical end of that block. It has a non-null \fIend\fR field, is of the BODY \fItype\fR, has the same \fItok\fR as the BLOCK it is ending, and has a \fIpending\fR field pointing to that BLOCK's BODY node. It is an indirect child of that BODY node and has no children of its own. .PP An ENDBODY node is generated when a block ends while one of its child blocks is still open, like in the following example: .nf .sp .RS 6n \&.Ao ao \&.Bo bo ac \&.Ac bc \&.Bc end .RE .fi .PP This example results in the following block structure: .nf .sp .RS 6n BLOCK Ao HEAD Ao BODY Ao TEXT ao BLOCK Bo, pending -> Ao HEAD Bo BODY Bo TEXT bo TEXT ac ENDBODY Ao, pending -> Ao TEXT bc TEXT end .RE .fi .PP Here, the formatting of the \(oq\&Ao\(cq block extends from TEXT ao to TEXT ac, while the formatting of the \(oq\&Bo\(cq block extends from TEXT bo to TEXT bc. It renders as follows in \fB\-T\fR\fBascii\fR mode: .PP .RS 6n bc] end .RE .PP Support for badly-nested blocks is only provided for backward compatibility with some older mdoc(7) implementations. Using badly-nested blocks is \fIstrongly discouraged\fR; for example, the \fB\-T\fR\fBhtml\fR and \fB\-T\fR\fBxhtml\fR front-ends to mandoc(1) are unable to render them in any meaningful way. Furthermore, behaviour when encountering badly-nested blocks is not consistent across troff implementations, especially when using multiple levels of badly-nested blocks. .SH "SEE ALSO" mandoc(1), mandoc_escape(3), mandoc_malloc(3), mchars_alloc(3), eqn(7), man(7), mandoc_char(7), mdoc(7), roff(7), tbl(7) .SH "AUTHORS" The \fBmandoc\fR library was written by Kristaps Dzonsons <\fIkristaps@bsd.lv\fR>.