'\" t .\" Title: doclifter .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.79.1 .\" Date: 09/03/2018 .\" Manual: Documentation Tools .\" Source: doclifter .\" Language: English .\" .TH "DOCLIFTER" "1" "09/03/2018" "doclifter" "Documentation Tools" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" doclifter \- translate troff requests into DocBook .SH "SYNOPSIS" .HP \w'\fBdoclifter\fR\ 'u \fBdoclifter\fR [\-o\ \fIoutput\-location\fR] [\-e\ \fIoutput\-encoding\fR] [\-i\ \fIinput\-encodings\fR] [\-h\ \fIhintfile\fR] [\-q] [\-x] [\-v] [\-w] [\-V] [\-D\ \fItoken=type\fR] [\-I\ \fIpath\fR] [\-S\ \fIspoofname\fR] \fIfile\fR... .SH "DESCRIPTION" .PP \fBdoclifter\fR translates documents written in troff macros to DocBook\&. Structural subsets of the requests in \fBman\fR(7), \fBmdoc\fR(7), \fBms\fR(7), \fBme\fR(7), \fBmm\fR(7), and \fBtroff\fR(1) are supported\&. .PP The translation brings over all the structure of the original document at section, subsection, and paragraph level\&. Command and C function synopses are translated into DocBook markup, not just a verbatim display\&. Tables (TBL markup) are translated into DocBook table markup\&. PIC diagrams are translated into SVG\&. Troff\-level information that might have structural implications is preserved in XML comments\&. .PP Where possible, font\-change macros are translated into structural markup\&. \fBdoclifter\fR recognizes stereotyped patterns of markup and content (such as the use of italics in a FILES section to mark filenames) and lifts them\&. A means to edit, add, and save semantic hints about highlighting is supported\&. .PP Some cliches are recognized and lifted to structural markup even without highlighting\&. Patterns recognized include such things as URLs, email addresses, man page references, and C program listings\&. .PP The tag \fB\&.in\fR and \fB\&.ti\fR requests are passed through with complaints\&. They indicate presentation\-level markup that \fBdoclifter\fR cannot translate into structure; the output will require hand\-fixing\&. .PP The tag \fB\&.ta\fR is passed through with a complaint unless the immediarely following by text lines contains a tab, in which case the following span of lines containing tabs is lifted to a table\&. .PP Under some circumstances, \fBdoclifter\fR can even lift formatted manual pages and the text output produced by \fBlynx\fR(1) from HTML\&. If it finds no macros in the input, but does find a NAME section header, it tries to interpret the plain text as a manual page (skipping boilerplate headers and footers generated by \fBlynx\fR(1))\&. Translations produced in this way will be prone to miss structural features, but this fallback is good enough for simple man pages\&. .PP \fBdoclifter\fR does not do a perfect job, merely a surprisingly good one\&. Final polish should be applied by a human being capable of recognizing patterns too subtle for a computer\&. But \fBdoclifter\fR will almost always produce translations that are good enough to be usable before hand\-hacking\&. .PP See the Troubleshooting section for discussion of how to solve document conversion problems\&. .SH "OPTIONS" .PP If called without arguments \fBdoclifter\fR acts as a filter, translating troff source input on standard input to DocBook markup on standard output\&. If called with arguments, each argument file is translated separately (but hints are retained, see below); the suffix \&.xml is given to the translated output\&. .PP \-o .RS 4 Set the output location where files will be saved\&. Defaults to current working directory\&. .RE .PP \-h .RS 4 Name a file to which information on semantic hints gathered during analysis should be written\&. .RE .PP \-D .RS 4 The \fB\-D\fR allows you to post a hint\&. This may be useful, for example, if \fBdoclifter\fR is mis\-parsing a synopsis because it doesn\*(Aqt recognize a token as a command\&. This hint is merged after hints in the input source have been read\&. .RE .PP \-I .RS 4 The \fB\-I\fR option adds its argument to the include path used when docfilter searches for inclusions\&. The include path is initially just the current directory\&. .RE .PP \-S .RS 4 Set the filename to be used in error and warning messages\&. This is mainly inttended for use by test scripts\&. .RE .PP \-e .RS 4 The \fB\-e\fR allows you to set the output encoding of the XML and the encoding field to be emitted in its header\&. It defaults to UTF\-8\&. .RE .PP \-i .RS 4 The \fB\-i\fR allows you to set a comma\-separated list of encodings to be looked for in the input\&. The default is "ISO\-8859\-1,UTF\-8", which should cover almost all cases\&. .RE .PP \-q .RS 4 Normally, requests that \fBdoclifter\fR could not interpret (usually because they\*(Aqre presentation\-level) are passed through to XML comments in the output\&. The \-q option suppresses this\&. It also suppresses listing of macros\&. Messages about requests that are unrecognized or cannot be translated go to standard error whatever the state of this option\&. This option is intended to reduce clutter when you believe you have a clean lift of a document and want to lose the troff legacy\&. .RE .PP \-x .RS 4 The \-x option requests that \fBdoclifter\fR generate DocBook version 5 compatible xml content, rather than its default DocBook version 4\&.4 output\&. Inclusions and entities may not be handled correctly with this switch enabled\&. .RE .PP \-v .RS 4 The \-v option makes \fBdoclifter\fR noisier about what it\*(Aqs doing\&. This is mainly useful for debugging\&. .RE .PP \-w .RS 4 Enable strict portability checking\&. Multiple instances of \-w increase the strictness\&. See the section called \(lqPORTABILITY CHECKING\(rq\&. .RE .PP \-V .RS 4 With this option, the program emits a version message and exits\&. .RE .SH "TRANSLATION RULES" .PP Overall, you can expect that font changes will be turned into Emphasis macros with a Remap attribute taken from the troff font name\&. The basic font names are R, I, B, U, CW, and SM\&. .PP Troff and macro\-package special character escapes are mapped into ISO character entities\&. .PP When \fBdoclifter\fR encounters a \fB\&.so\fR directive, it searches for the file\&. If it can get read access to the file, and open it, and the file consists entirely of command lines and comments, then it is included\&. If any of these conditions fails, an entity reference for it is generated\&. .PP \fBdoclifter\fR performs special parsing when it recognizes a display such as is generated by \fB\&.DS/\&.DE\fR\&. It repeatedly tries to parse first a function synopsis, and then plain text off what remains in the display\&. Thus, most inline C function prototypes will be lifted to structured markup\&. .PP Some notes on specific translations: .SS "Man Translation" .PP \fBdoclifter\fR does a good job on most man pages, It knows about the extended \fBUR\fR/\fBUE\fR/\fBUN\fR and \fBURL\fR requests supported under Linux\&. If any \fB\&.UR\fR request is present, it will translate these but not wrap URLs outide them with Ulink tags\&. It also knows about the extended \fB\&.L\fR (literal) font markup from Bell Labs Version 8, and its friends\&. .PP The \fB\&.TH\fR macro is used to generate a RefMeta section\&. If present, the date/source/manual arguments (see \fBman\fR(7)) are wrapped in RefMiscInfo tag pairs with those class attributes\&. Note that \fBdoclifter\fR does not change the date\&. .PP \fBdoclifter\fR performs special parsing when it recognizes a synopsis section\&. It repeatedly tries to parse first a function synopsis, then a command synopsis, and then plain text off what remains in the section\&. .PP The following man macros are translated into emphasis tags with a remap attribute: \fB\&.B\fR, \fB\&.I\fR, \fB\&.L\fR, \fB\&.BI\fR, \fB\&.BR\fR, \fB\&.BL\fR, \fB\&.IB\fR, \fB\&.IR\fR, \fB\&.IL\fR, \fB\&.RB\fR, \fB\&.RI\fR, \fB\&.RL\fR, \fB\&.LB\fR, \fB\&.LI\fR, \fB\&.LR\fR, \fB\&.SB\fR, \fB\&.SM\fR\&. Some stereotyped patterns involving these macros are recognized and turned into semantic markup\&. .PP The following macros are translated into paragraph breaks: \fB\&.LP\fR, \fB\&.PP\fR, \fB\&.P\fR, \fB\&.HP\fR, and the single\-argument form of \fB\&.IP\fR\&. .PP The two\-argument form of \fB\&.IP\fR is translated either as a VariableList (usually) or ItemizedList (if the tag is the troff bullet or square character)\&. .PP The following macros are translated semantically: \fB\&.SH\fR,\fB\&.SS\fR, \fB\&.TP\fR, \fB\&.UR\fR, \fB\&.UE\fR, \fB\&.UN\fR, \fB\&.IX\fR\&. A \fB\&.UN\fR call just before \fB\&.SH\fR or \fB\&.SS\fR sets the ID for the new section\&. .PP The \fB\e*R\fR, \fB\e*(Tm\fR, \fB\e*(lq\fR, and \fB\e*(rq\fR symbols are translated\&. .PP The following (purely presentation\-level) macros are ignored: \fB\&.PD\fR,\fB\&.DT\fR\&. .PP The \fB\&.RS\fR/\fB\&.RE\fR macros are translated differently depending on whether or not they precede list markup\&. When \fB\&.RS\fR occurs just before \fB\&.TP\fR or \fB\&.IP\fR the result is nested lists\&. Otherwise, the \fB\&.RS\fR/\fB\&.RE\fR pair is translated into a Blockquote tag\-pair\&. .PP \fB\&.DS\fR/\fB\&.DE\fR is not part of the documented man macro set, but is recognized because it shows up with some frequency on legacy man pages from older Unixes\&. .PP Certain extension macros originally defined under Ultrix are translated structurally, including those that occasionally show up on the manual pages of Linux and other open\-source Unixes\&. \fB\&.EX\fR/\fB\&.EE\fR (and the synonyms \fB\&.Ex\fR/\fB\&.Ee\fR), \fB\&.Ds\fR/\fB\&.De\fR, \fB\&.NT\fR/\fB\&.NE\fR, \fB\&.PN\fR, and \fB\&.MS\fR are translated structurally\&. .PP The following extension macros used by the X distribution are also recognized and translated structurally: \fB\&.FD\fR, \fB\&.FN\fR, \fB\&.IN\fR, \fB\&.ZN\fR, \fB\&.hN\fR, and \fB\&.C{\fR/\fB\&.C}\fR The \fB\&.TA\fR and \fB\&.IN\fR requests are ignored\&. .PP When the man macros are active, any \fB\&.Pp\fR macro definition containing the request \fB\&.PP\fR will be ignored\&. and all instances of \fB\&.Pp\fR replaced with \fB\&.PP\fR\&. Similarly, \fB\&.Tp\fR will be replaced with \fB\&.TP\fR\&. This is the least painful way to deal with some frequently\-encountered stereotyped wrapper definitions that would otherwise cause serious interpretation problems .PP Known problem areas with man translation: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Weird uses of \fB\&.TP\fR\&. These will sometime generate invalid XML and sometimes result in a FIXME comment in the generated XML (a warning message will also go to standard error)\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} It is debatable how the man macros \fB\&.HP\fR and \fB\&.IP\fR without tag should be translated\&. We treat them as an ordinary paragraph break\&. We could visually simulate a hanging paragraph with list markup, but this would not be a structural translation\&. .RE .SS "Pod2man Translation" .PP \fBdoclifter\fR recognizes the extension macros produced by \fBpod2man\fR (\fB\&.Sh\fR, \fB\&.Sp\fR, \fB\&.Ip\fR, \fB\&.Vb\fR, \fB\&.Ve\fR) and translates them structurally\&. .PP The results of lifting pages produced by \fBpod2man\fR should be checked carefully by eyeball, especially the rendering of command and function synopses\&. \fBPod2man\fR generates rather perverse markup; \fBdoclifter\fR\*(Aqs struggle to untangle it is sometimes in vain\&. .PP If possible, generate your DocBook from the POD sources\&. There is a pod2docbook module on CPAN that does this\&. .SS "Tkman Translation" .PP \fBdoclifter\fR recognizes the extension macros used by the Tcl/Tk documentation system: \fB\&.AP\fR, \fB\&.AS\fR, \fB\&.BS\fR, \fB\&.BE\fR, \fB\&.CS\fR, \fB\&.CE\fR, \fB\&.DS\fR, \fB\&.DE\fR, \fB\&.SO\fR, \fB\&.SE\fR, \fB\&.UL\fR, \fB\&.VS\fR, \fB\&.VE\fR\&. The \fB\&.AP\fR, \fB\&.CS\fR, \fB\&.CE\fR, \fB\&.SO\fR, \fB\&.SE\fR, \fB\&.UL\fR, \fB\&.QW\fR and \fB\&.PQ\fR macros are translated structurally\&. .SS "Mandoc Translation" .PP \fBdoclifter\fR should be able to do an excellent job on most \fBmdoc\fR(7) pages, because this macro package expresses a lot of semantic structure\&. .PP Known problems with mandoc translation: All \fB\&.Bd\fR/\fB\&.Ed\fR display blocks are translated as LiteralLayout tag pairs \&. .SS "Ms Translation" .PP \fBdoclifter\fR does a good job on most ms pages\&. One weak spot to watch out for is the generation of Author and Affiliation tags\&. The heuristics used to mine this information out of the \fB\&.AU\fR section work for authors who format their names in the way usual for English (e\&.g\&. "M\&. E\&. Lesk", "Eric S\&. Raymond") but are quite brittle\&. .PP For a document to be recognized as containing ms markup, it must have the extension \&.ms\&. This avoids problems with false positives\&. .PP The \fB\&.TL\fR, \fB\&.AU\fR, \fB\&.AI\fR, and \fB\&.AE\fR macros turn into article metainformation in the expected way\&. The \fB\&.PP\fR, \fB\&.LP\fR, \fB\&.SH\fR, and \fB\&.NH\fR macros turn into paragraph and section structure\&. The tagged form of \fB\&.IP\fR is translated either as a VariableList (usually) or ItemizedList (if the tag is the troff bullet or square character); the untagged version is treated as an ordinary paragraph break\&. .PP The \fB\&.DS\fR/\fB\&.DE\fR pair is translated to a LiteralLayout tag pair \&. The \fB\&.FS\fR/\fB\&.FE\fR pair is translated to a Footnote tag pair\&. The \fB\&.QP\fR/\fB\&.QS\fR/\fB\&.QE\fR requests define BlockQuotes\&. .PP The \fB\&.UL\fR font change is mapped to U\&. \fB\&.SM\fR and \fB\&.LG\fR become numeric plus or minus size steps suffixed to the Remap attribute\&. .PP The \fB\&.B1\fR and \fB\&.B2\fR box macros are translated to a Sidebar tag pair\&. .PP All macros relating to page footers, multicolumn mode, and keeps are ignored (\fB\&.ND\fR, \fB\&.DA\fR, \fB\&.1C\fR, \fB\&.2C\fR, \fB\&.MC\fR, \fB\&.BX\fR, \fB\&.KS\fR, \fB\&.KE\fR, \fB\&.KF\fR)\&. The \fB\&.R\fR, \fB\&.RS\fR, and \fB\&.RE\fR macros are ignored as well\&. .SS "Me Translation" .PP Translation of me documents tends to produce crude results that need a lot of hand\-hacking\&. The format has little usable structure, and documents written in it tend to use a lot of low\-level troff macros; both these properties tend to confuse \fBdoclifter\fR\&. .PP For a document to be recognized as containing me markup, it must have the extension \&.me\&. This avoids problems with false positives\&. .PP The following macros are translated into paragraph breaks: \fB\&.lp\fR, \fB\&.pp\fR\&. The \fB\&.ip\fR macro is translated into a VariableList\&. The \fB\&.bp\fR macro is translated into an ItemizedList\&. The \fB\&.np\fR macro is translated into an OrderedList\&. .PP The b, i, and r fonts are mapped to emphasis tags with B, I, and R Remap attributes\&. The \fB\&.rb\fR ("real bold") font is treated the same as \fB\&.b\fR\&. .PP \fB\&.q(\fR/\fB\&.q)\fR is translated structurally \&. .PP Most other requests are ignored\&. .SS "Mm Translation" .PP Memorandum Macros documents translate well, as these macros carry a lot of structural information\&. The translation rules are tuned for Memorandum or Released Paper styles; information associated with external\-letter style will be preserved in comments\&. .PP For a document to be recognized as containing mm markup, it must have the extension \&.mm\&. This avoids problems with false positives\&. .PP The following highlight macros are translated int Emphasis tags: \fB\&.B\fR, \fB\&.I\fR, \fB\&.R\fR, \fB\&.BI\fR, \fB\&.BR\fR, \fB\&.IB\fR, \fB\&.IR\fR, \fB\&.RB\fR, \fB\&.RI\fR\&. .PP The following macros are structurally translated: \fB\&.AE\fR, \fB\&.AF\fR, \fB\&.AL\fR, \fB\&.RL\fR, \fB\&.APP\fR, \fB\&.APPSK\fR, \fB\&.AS\fR, \fB\&.AT\fR, \fB\&.AU\fR, \fB\&.B1\fR, \fB\&.B2\fR, \fB\&.BE\fR, \fB\&.BL\fR, \fB\&.ML\fR, \fB\&.BS\fR, \fB\&.BVL\fR, \fB\&.VL\fR, \fB\&.DE\fR, \fB\&.DL\fR \fB\&.DS\fR, \fB\&.FE\fR, \fB\&.FS\fR, \fB\&.H\fR, \fB\&.HU\fR, \fB\&.IA\fR, \fB\&.IE\fR, \fB\&.IND\fR, \fB\&.LB\fR, \fB\&.LC\fR, \fB\&.LE\fR, \fB\&.LI\fR, \fB\&.P\fR, \fB\&.RF\fR, \fB\&.SM\fR, \fB\&.TL\fR, \fB\&.VERBOFF\fR, \fB\&.VERBON\fR, \fB\&.WA\fR, \fB\&.WE\fR\&. .PP The following macros are ignored: .PP \ \&\fB\&.)E\fR, \fB\&.1C\fR, \fB\&.2C\fR, \fB\&.AST\fR, \fB\&.AV\fR, \fB\&.AVL\fR, \fB\&.COVER\fR, \fB\&.COVEND\fR, \fB\&.EF\fR, \fB\&.EH\fR, \fB\&.EDP\fR, \fB\&.EPIC\fR, \fB\&.FC\fR, \fB\&.FD\fR, \fB\&.HC\fR, \fB\&.HM\fR, \fB\&.GETR\fR, \fB\&.GETST\fR, \fB\&.HM\fR, \fB\&.INITI\fR, \fB\&.INITR\fR, \fB\&.INDP\fR, \fB\&.ISODATE\fR, \fB\&.MT\fR, \fB\&.NS\fR, \fB\&.ND\fR, \fB\&.OF\fR, \fB\&.OH\fR, \fB\&.OP\fR, \fB\&.PGFORM\fR, \fB\&.PGNH\fR, \fB\&.PE\fR, \fB\&.PF\fR, \fB\&.PH\fR, \fB\&.RP\fR, \fB\&.S\fR, \fB\&.SA\fR, \fB\&.SP\fR, \fB\&.SG\fR, \fB\&.SK\fR, \fB\&.TAB\fR, \fB\&.TB\fR, \fB\&.TC\fR, \fB\&.VM\fR, \fB\&.WC\fR\&. .PP The following macros generate warnings: \fB\&.EC\fR, \fB\&.EX\fR, \fB\&.GETHN\fR, \fB\&.GETPN\fR, \fB\&.GETR\fR, \fB\&.GETST\fR, \fB\&.LT\fR, \fB\&.LD\fR, \fB\&.LO\fR, \fB\&.MOVE\fR, \fB\&.MULB\fR, \fB\&.MULN\fR, \fB\&.MULE\fR, \fB\&.NCOL\fR, \fB\&.nP\fR, \fB\&.PIC\fR, \fB\&.RD\fR, \fB\&.RS\fR, \fB\&.RE\fR, \fB\&.SETR\fR .PP Pairs of \fB\&.DS\fR/\fB\&.DE\fR are interpreted as informal figures\&. If an \fB\&.FG\fR is present it becomes a caption element\&. .PP \ \&\fB\&.BS\fR/\fB\&.BE\fR and \fB\&.IA\fR/\fB\&.IE\fR pairs are passed through\&. The text inside them may need to be deleted or moved\&. .PP The mark argument of \fB\&.ML\fR is ignored; the following list id formatted as a normal ItemizedList\&. .PP The contents of \fB\&.DS\fR/\fB\&.DE\fR or \fB\&.DF\fR/\fB\&.DE\fR gets turned into a Screen display\&. Arguments controlling presentation\-level formatting are ignored\&. .SS "Mwww Translation" .PP The mwww macros are an extension to the man macros supported by \fBgroff\fR(1) for producing web pages\&. .PP The \fBURL\fR, \fBFTP\fR, \fBMAILTO\fR, \fBFTP\fR, \fBIMAGE\fR, \fBTAG\fR tags are translated structurally\&. The \fBHTMLINDEX\fR, \fBBODYCOLOR\fR, \fBBACKGROUND\fR, \fBHTML\fR, and \fBLINE\fR tags are ignored\&. .SS "TBL Translation" .PP All structural features of TBL tables are translated, including both horizontal and vertical spanning with \(oqs\(cq and \(oq^\(cq\&. The \(oql\(cq, \(oqr\(cq, and \(oqc\(cq formats are supported; the \(oqn\(cq column format is rendered as \(oqr\(cq\&. Line continuations with T{ and T} are handled correctly\&. So is \fB\&.TH\fR\&. .PP The \fBexpand\fR, \fBbox\fR, \fBdoublebox\fR, \fBallbox\fR, \fBcenter\fR, \fBleft\fR, and \fBright\fR options are supported\&. The GNU synonyms \fBframe\fR and \fBdoubleframe\fR are also recognized\&. But the distinction between single and double rules and boxes is lost\&. .PP Table continuations (\&.T&) are not supported\&. .PP If the first nonempty line of text immediately before a table is boldfaced, it is interpreted as a title for the table and the table is generated using a table and title\&. Otherwise the table is translated with informaltable\&. .PP Most other presentation\-level TBL commands are ignored\&. The \(oqb\(cq format qualifier is processed, but point size and width qualifiers are not\&. .SS "Pic Translation" .PP PIC sections are translated to SVG\&. doclifter calls out to \fBpic2plot\fR(1) to accomplish this; you must have that utility installed for PIC translation to work\&. .SS "Eqn Translation" .PP EQN sections are filtered into embedded MathML with \fBeqn \-TMathML\fR if possible, otherwise passed through enclosed in LiteralLayout tags\&. After a delim statement has been seen, inline eqn delimiters are translated into an XML processing instruction\&. Exception: inline eqn equations consisting of a single character are translated to an Emphasis with a Role attribute of eqn\&. .SS "Troff Translation" .PP The troff translation is meant only to support interpretation of the macro sets\&. It is not useful standalone\&. .PP The \fB\&.nf\fR and \fB\&.fi\fR macros are interpreted as literal\-layout boundaries\&. Calls to the \fB\&.so\fR macro either cause inclusion or are translated into XML entity inclusions (see above)\&. Calls to the \fB\&.ul\fR and \fB\&.cu\fR macros cause following lines to be wrapped in an Emphasis tag with a Remap attribute of "U"\&. Calls to \fB\&.ft\fR generate corresponding start or end emphasis tags\&. Calls to \fB\&.tr\fR cause character translation on output\&. Calls to \fB\&.bp\fR generate a BeginPage tag (in paragraphed text only)\&. Calls to \fB\&.sp\fR generate a paragraph break (in paragraphed text only)\&. Calls to \fB\&.ti\fR wrap the following line in a BlockQuote These are the only troff requests we translate to DocBook\&. The rest of the troff emulation exists because macro packages use it internally to expand macros into elements that might be structural\&. .PP Requests relating to macro definitions and strings (\fB\&.ds\fR, \fB\&.as\fR, \fB\&.de\fR, \fB\&.am\fR, \fB\&.rm\fR, \fB\&.rn\fR, \fB\&.em\fR) are processed and expanded\&. The \fB\&.ig\fR macro is also processed\&. .PP Conditional macros (\fB\&.if\fR, \fB\&.ie\fR, \fB\&.el\fR) are handled\&. The built\-in conditions o, n, t, e, and c are evaluated as if for nroff on page one of a document\&. The m, d, and r troff conditionals are also interpreted\&. String comparisons are evaluated by straight textual comparison\&. All numeric expressions evaluate to true\&. .PP The extended groff requests \fBcc\fR, \fBc2\fR, \fBab\fR, \fBals\fR, \fBdo\fR, \fBnop\fR, and \fBreturn\fR and \fBshift\fR are interpreted\&. Its \fB\&.PSPIC\fR extension is translated into a MediaObject\&. .PP The \fB\&.tm\fR macro writes its arguments to standard error (with \fB\-t\fR)\&. The \fB\&.pm\fR macro reports on defined macros and strings\&. These facilities may aid in debugging your translation\&. .PP Some troff escape sequences are lifted: .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} The \ee and \e\e escapes become a bare backslash, \e\&. a period, and \e\- a bare dash\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} The troff escapes \e^, \e`, \e\*(Aq \e&, \e0, and \e| are lifted to equivalent ISO special spacing characters\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 3.\h'+01'\c .\} .el \{\ .sp -1 .IP " 3." 4.2 .\} A \e followed by space is translated to an ISO non\-breaking space entity\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 4.\h'+01'\c .\} .el \{\ .sp -1 .IP " 4." 4.2 .\} A \e~ is also translated to an ISO non\-breaking space entity; properly this should be a space that can\*(Aqt be used for a linebreak but stretches like ordinary whitepace during line adjustment, but there is no ISO or Unicode entity for that\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 5.\h'+01'\c .\} .el \{\ .sp -1 .IP " 5." 4.2 .\} The \eu and \ed half\-line motion vertical motion escapes, when paired, become \fBSuperscript\fR or \fBSubscript\fR tags\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 6.\h'+01'\c .\} .el \{\ .sp -1 .IP " 6." 4.2 .\} The \ec escape is handled as a line continuation\&. in circumstances where that matters (e\&.g\&. for token\-pasting)\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 7.\h'+01'\c .\} .el \{\ .sp -1 .IP " 7." 4.2 .\} The \ef escape for font changes is translated in various context\-dependent ways\&. First, \fBdoclifter\fR looks for cliches involving font changes that have semantic meaning, and lifts to a structural tag\&. If it can\*(Aqt do that, it generates an Emphasis tag\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 8.\h'+01'\c .\} .el \{\ .sp -1 .IP " 8." 4.2 .\} The \em[] extension is translated into a phrase span with a remap attribute carrying the color\&. Note: Stylesheets typically won\*(Aqt render this! .RE .sp .RS 4 .ie n \{\ \h'-04' 9.\h'+01'\c .\} .el \{\ .sp -1 .IP " 9." 4.2 .\} Some uses of the \eo request are translated: pairs with a letter followed by one of the characters ` \*(Aq : ^ o ~ are translated to combining forms with diacriticals acute, grave, umlaut, circumflex, ring, and tilde respectively if the corresponding Latin\-1 or Latin\-2 character exists as an ISO literal\&. .RE .PP Other escapes than these will yield warnings or errors\&. .PP All other troff requests are ignored but passed through into XML comments\&. A few (such as \fB\&.ce\fR) also trigger a warning message\&. .SH "PORTABILITY CHECKING" .PP When portability checking is enabled, \fBdoclifter\fR emits portability warnings about markup which it can handle but which will break various other viewers and interpreters\&. .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} At level 1, it will warn about constructions that would break \fBman2html\fR(1), (the C program distributed with Linux \fBman\fR(1), not the older and much less capable Perl script)\&. A close derivative of this code is used in GNOME yelp\&. This should be the minimum level of portability you aim for, and corresponds to what is recommended on the \fBgroff_man\fR(7) manual page\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} At level 2, it will warn about constructions that will break portability back to the Unix classic tools (including long macro names and glyph references with \e[])\&. .RE .SH "SEMANTIC ANALYSIS" .PP \fBdoclifter\fR keeps two lists of semantic hints that it picks up from analyzing source documents (especially from parsing command and function synopses)\&. The local list includes: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Names of function formal arguments .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Names of command options .RE .PP Local hints are used to mark up the individual page from which they are gathered\&. The global list includes: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Names of functions .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Names of commands .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Names of function return types .RE .PP If \fBdoclifter\fR is applied to multiple files, the global list is retained in memory\&. You can dump a report of global hints at the end of the run with the \fB\-h\fR option\&. The format of the hints is as follows: .sp .if n \{\ .RS 4 .\} .nf \ \&\&.\e" | mark as .fi .if n \{\ .RE .\} .PP where \fB\fR is an item of text and \fB\fR is the DocBook markup text it should be wrapped with whenever it appeared either highlighted or as a word surrounded by whitespace in the source text\&. .PP Hints derived from earlier files are also applied to later ones\&. This behavior may be useful when lifting collections of documents that apply to a function or command library\&. What should be more useful is the fact that a hints file dumped with \fB\-h\fR can be one of the file arguments to \fBdoclifter\fR; the code detects this special case and does not write XML output for such a file\&. Thus, a good procedure for lifting a large library is to generate a hints file with a first run, inspect it to delete false positives, and use it as the first input to a second run\&. .PP It is also possible to include a hints file directly in a troff sourcefile\&. This may be useful if you want to enrich the file by stages before converting to XML\&. .SH "TROUBLESHOOTING" .PP \fBdoclifter\fR tries to warn about problems that it can can diagnose but not fix by itself\&. When it says "look for FIXME", do that in the generated XML; the markup around that token may be wrong\&. .PP Occasionally (less than 2% of the time) \fBdoclifter\fR will produce invalid DocBook markup even from correct troff markup\&. Usually this results from strange constructions in the source page, or macro calls that are beyond the ability of \fBdoclifter\fR\*(Aqs macro processor to get right\&. Here are some things to watch for, and how to fix them: .SS "Malformed command synopses\&." .PP If you get a message that says "command synopsis parse failed", try rewriting the synopsis in your manual page source\&. The most common cause of failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball\&. To assist with this, the error message includes a token number in parentheses indicating on which token the parse failed\&. .PP For more information, use the \-v option\&. This will trigger a dump telling you what the command synopsis looked like after preprocessing, and indicate on which token the parse failed (both with a token number and a caret sign inserted in the dump of the synopsis tokens)\&. Try rewriting the synopsis in your manual page source\&. The most common cause of failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball\&. To assist with this, the error token dump tries to insert \(oq$\(cq at the point of the last nesting\-depth increase, but the code that does this is failure\-prone\&. .SS "Confusing macro calls\&." .PP Some manual page authors replace standard requests (like \fB\&.PP\fR, \fB\&.SH\fR and \fB\&.TP\fR) with versions that do different things in \fBnroff\fR and \fBtroff\fR environments\&. While \fBdoclifter\fR tries to cope and usually does a good job, the quirks of [nt]roff are legion and confusing macro calls sometimes lead to bad XML being generated\&. A common symptom of such problems is unclosed Emphasis tags\&. .SS "Malformed list syntax\&." .PP The manual\-page parser can be confused by \fB\&.TP\fR constructs that have header tags but no following body\&. If the XML produced doesn\*(Aqt validate, and the problem seems to be a misplaced listitem tag, try using the verbose (\-v) option\&. This will enable line\-numbered warnings that may help you zero in on the problem\&. .SS "Section nesting problems with SS\&." .PP The message "possible section nesting error" means that the program has seen two adjacent subsection headers\&. In man pages, subsections don\*(Aqt have a depth argument, so \fBdoclifter\fR cannot be certain how subsections should be nested\&. Any subsection heading between the indicated line and the beginning of the next top\-level section might be wrong and require correcting by hand\&. .SS "Bad output with no doclifter error message" .PP If you\*(Aqre translating a page that uses user\-defined macros, and doclifter fails to complain about it but you get bad output, the first thing to do is simplify or eliminate the user\-defined macros\&. Replace them with stock requests where possible\&. .SH "IMPROVING TRANSLATION QUALITY" .PP There are a few constructions that are a good idea to check by hand after lifting a page\&. .PP Look near the BlockQuote tags\&. The troff temporary indent request (\fB\&.ti\fR) is translated into a BlockQuote wrapper around the following line\&. Sometimes LiteralLayout or ProgramListing would be a better translation, but \fBdoclifter\fR has no way to know this\&. .PP It is not possible to unambiguously detect candidates for wrapping in a DocBook option tag in running text\&. If you care, you\*(Aqll have to check for these and fix them by hand\&. .SH "BUGS AND LIMITATIONS" .PP About 3% of man pages will either make this program throw error status 1 or generate invalid XML\&. In almost all such cases the misbehavior is triggered by markup bugs in the source that are too severe to be coped with\&. .PP Equation number arguments of EQN calls are ignored\&. .PP Semicolon used as a TBL field separator will lead to garbled tables\&. The easiest way to fix this is by patching the source\&. .PP The function\-synopsis parser is crude (it\*(Aqs not a compiler) and prone to errors\&. Function\-synopsis markup should be checked carefully by a human\&. .PP If a man page has both paragraphed text in a Synopsis section and also a body section before the Synopis section, bad things will happen\&. .PP Running text (e\&.g\&., explanatory notes) at the end of a Synopsis section cannot reliably be distinguished from synopsis\-syntax markup\&. (This problem is AI\-complete\&.) .PP Some firewalls put in to cope with common malformations in troff code mean that the tail end of a span between two \fB\ef{B,I,U,(CW}\fR or \fB\&.ft\fR highlight changes may not be completely covered by corresponding Emphasis macros if (for example) the span crosses a boundary between filled and unfilled (\fB\&.nf\fR/\fB\&.fi\fR) text\&. .PP The treatment of conditionals relies on the assumption that conditional macros never generate structural or font\-highlight markup that differs between the if and else branches\&. This appears to be true of all the standard macro packages, but if you roll any of your own macros you\*(Aqre on your own\&. .PP Macro definitions in a manual page NAME section are not interpreted\&. .PP Uses of \ec for line continuation sometimes are not translated, leaving the \ec in the output XML\&. The program will print a warning when this occurs\&. .PP It is not possible to unambiguously detect candidates for wrapping in a DocBook option tag in running text\&. If you care, you\*(Aqll have to check for these and fix them by hand\&. .PP The line numbers in \fBdoclifter\fR error messages are unreliable in the presence of \fB\&.EQ/\&.EN\fR, \fB\&.PS/\&.PE\fR, and quantum fluctuations\&. .SH "OLD MACRO SETS" .PP There is a conflict between Berkeley ms\*(Aqs documented \fB\&.P1\fR print\-header\-on\-page request and an undocumented Bell Labs use for displayed program and equation listings\&. The \fBms\fR translator uses the Bell Labs interpretation when \fB\&.P2\fR is present in the document, and otherwise ignores the request\&. .SH "RETURN VALUES" .PP On successful completion, the program returns status 0\&. It returns 1 if some file or standard input could not be translated\&. It returns 2 if one of the input sources was a \fB\&.so\fR inclusion\&. It returns 3 if there is an error in reading or writing files\&. It returns 4 to indicate an internal error\&. It returns 5 when aborted by a keyboard interrupt\&. .PP Note that a zero return does not guarantee that the output is valid DocBook\&. It will almost always (as in, more than 98% of cases) be syntactically valid XML, but in some rare cases fixups by hand may be necessary to meet the semantics of the DocBook DTD\&. Validation problems are most likely to occur with complicated list markup\&. .SH "REQUIREMENTS" .PP The \fBpic2plot\fR(1) utility must be installed in order to translate PIC diagrams to SVG\&. .SH "SEE ALSO" .PP \fBman\fR(7), \fBmdoc\fR(7), \fBms\fR(7), \fBme\fR(7), \fBmm\fR(7), \fBmwww\fR(7), \fBtroff\fR(1)\&. .SH "AUTHOR" .PP Eric S\&. Raymond .PP There is a project web page at \m[blue]\fBhttp://www\&.catb\&.org/~esr/doclifter/\fR\m[]\&.