.\" Copyright (c) 2002 Bill C. Riemers .\" .\" This is free documentation; you can redistribute it and/or .\" modify it under the terms of the GNU General Public License as .\" GNU General Public License, either Version 2 of the license, .\" or (at your option) any later version. The license should have .\" published by the Free Software Foundation; either version 2 of .\" the License, or (at your option) any later version. .\" .\" The GNU General Public License's references to "object code" .\" and "executables" are to be interpreted as the output of any .\" document formatting or typesetting system, including .\" intermediate and printed output. .\" .\" This manual is distributed in the hope that it will be useful, .\" but WITHOUT ANY WARRANTY; without even the implied warranty of .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the .\" GNU General Public License for more details. .\" .\" You should have received a copy of the GNU General Public .\" License along with this manual. Otherwise check the web site .\" of the Free Software Foundation at http://www.fsf.org. .\" .\" I, Bill C. Riemers, hereby grant all rights to this code, .\" provided usage complies with the GPL or a written exception to .\" the GPL granted by any of Bill C. Riemers, Leon Bottou, .\" Yann Le Cun, or the Free Source Foundation. .\" .\" ------------------------------------------------------------------ .\" DjVuLibre-3.5 is derived from the DjVu(r) Reference Library from .\" Lizardtech Software. Lizardtech Software has authorized us to .\" replace the original DjVu(r) Reference Library notice by the following .\" text (see doc/lizard2002.djvu and doc/lizardtech2007.djvu): .\" .\" ------------------------------------------------------------------ .\" | DjVu (r) Reference Library (v. 3.5) .\" | Copyright (c) 1999-2001 LizardTech, Inc. All Rights Reserved. .\" | The DjVu Reference Library is protected by U.S. Pat. No. .\" | 6,058,214 and patents pending. .\" | .\" | This software is subject to, and may be distributed under, the .\" | GNU General Public License, either Version 2 of the license, .\" | or (at your option) any later version. The license should have .\" | accompanied the software or you may obtain a copy of the license .\" | from the Free Software Foundation at http://www.fsf.org . .\" | .\" | The computer code originally released by LizardTech under this .\" | license and unmodified by other parties is deemed "the LIZARDTECH .\" | ORIGINAL CODE." Subject to any third party intellectual property .\" | claims, LizardTech grants recipient a worldwide, royalty-free, .\" | non-exclusive license to make, use, sell, or otherwise dispose of .\" | the LIZARDTECH ORIGINAL CODE or of programs derived from the .\" | LIZARDTECH ORIGINAL CODE in compliance with the terms of the GNU .\" | General Public License. This grant only confers the right to .\" | infringe patent claims underlying the LIZARDTECH ORIGINAL CODE to .\" | the extent such infringement is reasonably necessary to enable .\" | recipient to make, have made, practice, sell, or otherwise dispose .\" | of the LIZARDTECH ORIGINAL CODE (or portions thereof) and not to .\" | any greater extent that may be necessary to utilize further .\" | modifications or combinations. .\" | .\" | The LIZARDTECH ORIGINAL CODE is provided "AS IS" WITHOUT WARRANTY .\" | OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED .\" | TO ANY WARRANTY OF NON-INFRINGEMENT, OR ANY IMPLIED WARRANTY OF .\" | MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. .\" +------------------------------------------------------------------ .TH DJVUXML 1 "11/15/2002" "DjVuLibre XML Tools" "DjVuLibre XML Tools" .de SS .SH \\0\\0\\0\\$* .. .SH NAME djvutoxml, djvuxmlparser \- DjVuLibre XML Tools. .SH SYNOPSIS .BI "djvutoxml [" options "] " inputdjvufile " [" outputxmlfile "]" .br .BI "djvuxmlparser [ -o " djvufile " ] " inputxmlfile .SH DESCRIPTION The DjVuLibre XML Tools provide for editing the metadata, hyperlinks and hidden text associated with DjVu files. Unlike .BR djvused (1) the DjVuLibre XML Tools rely on the XML technology and can take advantage of XML editors and verifiers. .SH DJVUTOXML Program .B djvutoxml creates a XML file .I outputxmlfile containing a reference to the original DjVu document .I inputdjvufile as well as tags describing the metadata, hyperlinks, and hidden text associated with the DjVu file. The following options are supported: .TP .BI "--page " "pagenum" Select a page in a multi-page document. Without this option, .B djvutoxml outputs the XML corresponding to all pages of the document. .TP .BI "--with-text" Specifies the .B HIDDENTEXT element for each page should be included in the output. If specified without the .B --with-anno flag then the .B --without-anno is implied. If none of the .B --with-text, .B --without-text, .B --with-anno, or .B --without-anno, flags are specified, then the .B --with-text and .B --with-anno flags are implied. .TP .BI "--without-text" Specifies not to output the .B HIDDENTEXT element for each page. If specified without the .B --without-anno flag then the .B --with-anno flag is implied. .TP .BI "--with-anno" Specifies the area .B MAP element for each page should be included in the output. If specified without the .B --with-text flag then the .B --without-text flag is implied. .TP .BI "--without-anno" Specifies the area .B MAP element for each page should not be included in the output. If specified without the .B --without-text flag then the .B --with-text flag is implied. .SH DJVUXMLPARSER Files produced by .B djvutoxml can then be modified using either a text editor or a XML editor. Program .B djvuxmlparser parses the XML file .I inputxmlfile in order to modify the metadata of the corresponding DjVu file. .TP .BI "-o " "djvufile" In principle the target DjVu file is the file referenced by the .I OBJECT element of the XML file. This option provides the means to override the filename specified in the .I OBJECT element. .SH DJVUXML DOCUMENT TYPE DEFINITION The document type definition file (DTD) .IP "" 2 .B /usr/share/djvu/pubtext/DjVuXML-s.dtd .PP defines the input and output of the DjVu XML tools. The DjVuXML-s DTD is a simplification of the HTML DTD: .IP "" 2 .B \%http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html .PP with a few new attributes added specific to DjVu. Each of the specified pages of a DjVu document are represented as .B OBJECT elements within the .B BODY element of the XML file. Each .B OBJECT element may contain multiple .B PARAM elements to specify attributes like page name, resolution, and gamma factor. Each .B OBJECT element may also contain one .B HIDDENTTEXT element to specify the hidden text (usually generated with an OCR engine) within the DjVu page. In addition each .B OBJECT element may reference a single area .B MAP element which contains multiple .B AREA elements to represent all the hyperlink and highlight areas within the DjVu document. .SS PARAM Elements Legal .B PARAM elements of a DjVu .B OBJECT include but are not limited to .B PAGE for specifying the page-name, .B GAMMA for specifying the gamma correction factor (normally 2.2), and .B DPI for specifying the page resolution. .SS HIDDENTEXT Elements The .B HIDDENTEXT elements consists of nested elements of .B PAGECOLUMNS, .B REGION, .B PARAGRAPH, .B LINE, and .B WORD. The most deeply nested element specified, should specify the bounding coordinates of the element in top-down orientation. The body of the most deeply nested element should contain the text. Most DjVu documents use either .B LINE or .B WORD as the lowest level element, but any element is legal as the lowest level element. A white space is always added between .B WORD elements and a line feed is always added between .B LINE elements. Since languages such as Japanese do not use spaces between words, it is quite common for Asian OCR engines to use .B WORD as characters instead. .SS MAP Elements The body of the .B MAP elements consist of .B AREA elements. In addition to the attributes listed in .IP "" 2 .BR \%http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA , .PP the attributes .BR bordertype , .BR bordercolor , .BR border , and .B highlight have been added to specify border type, border color, border width, and highlight colors respectively. Legal values for each of these attributes are listed in the DjVuXML-s DTD. In addition, the shape .B oval has been added to the legal list of shapes. An oval uses a rectangular bounding box. .SH BUGS Perhaps it would have been better to use CC2 style sheets with standard HTML elements instead of defining the .B HIDDENTEXT element. .SH CREDITS The DjVu XML tools and DTD were written by Bill C. Riemers and Fred Crary. .SH SEE ALSO .BR djvu (1), .BR djvused (1), and .BR utf8 (7).