NAME¶
djvused - Multi-purpose DjVu document editor.
SYNOPSIS¶
djvused [options] djvufile
DESCRIPTION¶
Program
djvused is a powerful command line tool for manipulating
multi-page documents, creating or editing annotation chunks, creating or
editing hidden text layers, pre-computing thumbnail images, and more. The
program first reads the DjVu document
djvufile and executes a number of
djvused commands.
Djvused commands can be read from a specific file (when option
-f is
specified), read from the command line (when option
-e is specified),
or read from the standard input (the default).
OPTIONS¶
- -v
- Cause djvused to print a command line prompt before
reading commands and a brief message describing how each command was
executed. This option is very useful for debugging djvused scripts and
also for interactively entering djvused commands on the standard
input.
- -f scriptfile
- Cause djvused to read commands from file
scriptfile.
- -e command
- Cause djvused to execute the commands specified by
the option argument commands. It is advisable to surround the
djvused commands by single quotes in order to prevent unwanted shell
expansion.
- -s
- Cause djvused to save the file djvufile after
executing the specified commands. This is similar to executing command
save immediately before terminating the program.
- -u
- Cause djvused to print hidden text and annotations
as UTF-8 instead of encoding non-ASCII characters with octal escape
sequences for maximal portability. This option is convenient for manually
editing or viewing the djvused output. This option also causes the
emission of an UTF-8 BOM under Windows.
- -n
- Cause djvused to disregard save commands. This is
useful for debugging djvused scripts without overwriting files on your
disk.
DJVUSED EXAMPLES¶
There are many ways to use program
djvused. The following examples
illustrate some common uses of this program.
Obtaining the size of a page¶
Command
size outputs the width and height of the selected pages using a
HTML friendly syntax. For instance, the following command
prints the size of page
3 of document
myfile.djvu.
-
- djvused myfile.djvu -e 'select 3;
size'
Command
print-pure-txt outputs the text associated with a page or a
document. For instance, the following shell command outputs the text for the
entire document. Lines and pages are delimited by the usual control
characters.
-
- djvused myfile.djvu -e
'print-pure-txt'
Command
print-txt produces a more extensive output describing the
structure and the location of the text components. The syntax of this output
is described later in this man page. For instance, the following shell command
outputs extended text information for page
3 of document
myfile.djvu.
-
- djvused myfile.djvu -e 'select 3;
print-txt'
Annotation data can be extracted using command
print-ant. The syntax of
the annotation data is described later in this man page. For instance, the
following shell command outputs the annotation data for the first page of
document
myfile.djvu.
-
- djvused myfile.djvu -e 'select 1;
print-ant'
Command
print-ant only prints the annotations stored in the selected
component file. Command
print-merged-ant also retrieves annotations
from all the component files referenced by the current page (using
INCL chunks) and prints the merged information.
Dumping/restoring annotations and text¶
Three commands,
output-txt,
output-ant, and
output-all,
produce djvused scripts. For instance, the following shell command produces a
djvused script,
myfile.dsed, that recreates all the text and annotation
data in document
myfile.djvu.
-
- djvused myfile.djvu -e 'output-all' >
myfile.dsed
Script
myfile.dsed is a text file that can be easily edited. The
following shell command then recreates the text and annotation information in
file
myfile.djvu.
-
- djvused myfile.djvu -f
myfile.dsed -s
Both commands
save-page and
save-page-with create a DjVu file
representing the selected component file of a document. The following shell
command, for instance, creates a file
p05.djvu containing page
5
of document
myfile.djvu.
-
- djvused myfile.djvu -e 'select 5;
save-page p05.djvu'
Each page of a document might import data from another component file using the
so-called inclusion (
INCL ) chunks. Command
save-page
then produces a file with unresolved references to imported data. Such a file
should then be made part of a multi-page document containing the required data
in other component files. On the other hand, command
save-page-with
copies all the imported data into the output file. This file is directly
usable. Yet collecting several such files into a multi-page document might
lead to useless data replication.
Pre-computing thumbnails¶
Commands
set-thumbnails constructs thumbnails that can be later displayed
by DjVu viewers. The following shell command, for instance, computes
thumbnails of size
64x
64 pixels for all pages of file
myfile.djvu.
-
- djvused myfile.djvu -e 'set-thumbnails 64'
-s
DJVUSED COMMANDS¶
Command lines might contain zero, one, or more djvused commands and an optional
comment. Multiple djvused commands must be separated by a semicolon character
';'. Comments are introduced by the '#' character and extend until the end of
the command line.
Selection commands¶
Multi-page DjVu documents are composed of a number of component files. Most
component files describe a specific page of a document. Some component files
contain information shared by several pages such as shared image data, shared
annotations or thumbnails. Many djvused commands operate on selected component
files. All component files are initially selected. The following commands are
useful for changing the selection.
- n
- Print the total number of pages in the document.
- ls
- List all component files in the document. Each line
contains an optional page number, a letter describing the component file
type, the size of the component file, and identifier of the component
file. Component file type letters P, I, A, and
T respectively stand for page data, shared image data, shared
annotation data, and thumbnail data. Page numbers are only listed for
component files containing page data. When it is set, the optional page
title (see command set-page-title below) is displayed after the
component file identifier.
- select [fileid]
- Select the component file identified by argument
fileid. Argument fileid must be either a page number or a
component file identifier. The select command selects all component
files when the argument fileid is omitted.
- select-shared-ant
- Select a component file containing shared annotations. Only
one such component file is supported by the current DjVu software. This
component file usually contains annotations pertaining to the whole
document as opposed to specific pages. An error message is displayed if
there is no such component file.
- create-shared-ant
- Create and select a component file containing shared
annotations. This command only selects the shared annotation component
file if such a component file already exists. Otherwise it creates a new
shared annotation component file and makes sure that it is imported by all
pages in the document.
- showsel
- Shows the currently selected component files with the same
format as command ls.
Text and annotation commands¶
- print-pure-txt
- Print the text stored in the hidden text layer of the
selected pages. A similar capability is offered by program djvutxt.
Structural information is sometimes represented by control characters.
Text from different pages is delimited by form feed characters
("\f"). Lines are delimited by newline characters
("\n"). Columns, regions, and paragraphs are sometimes delimited
by vertical tab ("\013"), group separators ("\035")
and unit separators ("\037") respectively.
- print-txt
- Prints extensive hidden text information for the selected
pages. This information describes the structure of the text on the
document page and locates the structural elements in the page image. The
syntax of this output is described later in this man page.
- remove-txt
- Remove the hidden text information from the selected
component files. For instance, executing commands select and
remove-txt removes all hidden text information from the DjVu
document.
- set-txt [djvusedtxtfile]
- Insert hidden text information into the selected pages. The
optional argument djvusedtxtfile names a file containing the hidden
text information. This file must contain data similar to what is produced
by command print-txt. When the optional argument is omitted, the
program reads the hidden text information from the djvused script until
reaching an end-of-file or a line containing a single period.
- output-txt
- Prints a djvused script that reconstructs the hidden text
information for the selected pages. This script can later be edited and
executed by invoking program djvused with option -f.
- print-ant
- Prints the annotations of the selected component file. The
annotation data is represented using a simple syntax described later in
this document.
- print-merged-ant
- Merge the annotations stored in the selected component
files with the annotations imported from other component files such as the
shared annotation component file.. The annotation data is represented
using a simple syntax described later in this document.
- remove-ant
- Remove the annotation information from the selected
component files. For instance, executing commands select and
remove-ant removes all annotation information from the DjVu
document.
- set-ant [djvusedantfile]
- Insert annotations into the selected component file. The
optional argument djvusedantfile names a file containing the
annotation data. This file must contain data similar to what is produced
by command print-ant. When the optional argument is omitted, the
program reads the annotation data from the djvused script itself until
reaching an end-of-file or a line containing a single period.
- output-ant
- Print a djvused script that reconstructs the annotation
information for the selected pages. This script can later be edited and
executed by invoking program djvused with option -f.
- print-meta
- Print the meta-data part of the annotations for the
selected component file. This command displays a subset of the information
printed by command print-ant using a different syntax. Meta-data
are organized as key-value pairs. Each printed line contains the key name
such as author, title,etc., followed by a tab character
("\t") and a double-quoted string representing the
UTF-8 encoded meta-data value.
- remove-meta
- Remove the meta-data part of the annotations of the
selected component files.
- set-meta [djvusedmetafile]
- Set the meta-data part of the annotations of the selected
component file. The remaining part of the annotations is left unchanged.
The optional argument djvusedmetafile names a file containing the
meta-data. This file must contain data similar to what is produced by
command print-meta. When the optional argument is omitted, the
program reads the annotation data from the djvused script itself until
reaching an end-of-file or a line containing a single period.
- print-xmp
- Print the XMP metadata string contained in the annotation
chunk of the selected component file. This command displays in fact a
subset of the information printed by command print-ant.
- remove-xmp
- Removes the XMP tag from the annotation chunk of the
selected component file.
- set-xmp [xmpfile]
- Set the XMP metadata part of the annotations of the
selected component file. The remaining part of the annotations is left
unchanged. The optional argument xmpfile names a file containing
the XMP metadata in a format similar to that produced by command
print-xmp. When the optional argument is omitted, the program reads
the XMP annotation data from the djvused script itself until reaching an
end-of-file or a line containing a single period.
- output-all
- Print a djvused script that reconstructs both the hidden
text and the annotation information for the selected pages. This script
can later be edited and executed by invoking program djvused with
option -f.
Outline/bookmarks commands¶
- print-outline
- Print the outline of the document. Nothing is printed if
the document contains no outline.
- remove-outline
- Removes the outline from the document.
- set-outline [djvusedoutlinefile]
- Insert outline information into the document. The optional
argument djvusedoutlinefile names a file containing the outline
information. This file must contain data similar to what is produced by
command print-outline. When the optional argument is omitted, the
program reads the hidden text information from the djvused script until
reaching an end-of-file or a line containing a single period.
Thumbnail commands¶
- set-thumbnails sz
- Compute thumbnails of size szxsz pixels and
insert them into the document. DjVu viewers can later display these
thumbnails very efficiently without need to download the data for each
page. Typical thumbnail size range from 48 to 128 pixels.
- remove-thumbnails
- Remove the pre-computed thumbnails from the DjVu document.
New thumbnails can then be computed using command set-thumbnails.
Save commands¶
The above commands only modify the memory image of the DjVu document. The
following commands provide means to save the modified data into the file
system.
- save
- Save the modified DjVu document back into the input file
djvufile specified by the arguments of the program djvused.
Nothing is done if the DjVu file was not modified. Passing option
-s program djvused is equivalent to executing command
save before exiting the program.
- save-bundled filename
- Save the current DjVu document as a bundled multi-page DjVu
document named filename. A similar capability is offered by program
djvmcvt.
- save-indirect filename
- Save the current DjVu document as an indirect multi-page
DjVu document. The index file of the indirect document will be named
filename. All other files composing the indirect document will be
saved into the same directory as the index file. A similar capability is
offered by program djvmcvt.
- save-page filename
- Save the selected component file into DjVu file
filename. The selected component file might import data from
another component file using the so-called inclusion ( INCL
) chunks. This command then produces a file with unresolved references to
imported data. Such a file should then be made part of a multi-page
document containing the required data in other component files.
- save-page-with filename
- Save the selected component file into DjVu file
filename. All data imported from other component files is copied
into the output file as well. This command always produces a usable DjVu
file. On the other hand, collecting several such files into a multi-page
document might lead to useless data replication.
Miscellaneous commands¶
- help
- Display a help message listing all commands supported by
djvused.
- dump
- Display the EA IFF 85 structure of the
document or of the selected component file. A similar capability is
offered by program djvudump.
- size
- Display the width and the height of the selected pages. The
dimensions of each page are displayed using a syntax suitable for direct
insertion into the <EMBED...></EMBED>
tags.
- set-page-title title
- Sets a page title for the selected page. When page titles
are available, recent versions of the DjVuLibre viewers display these page
titles instead of page numbers and also accept them in page selection
options. Command ls can be used to see both the page titles and
page identifiers. To unset a page title, simply make it equal to the page
identifier.
Djvused uses a simple parenthesized syntax to represent both annotations and
hidden text.
- *
- This syntax is the native syntax used by DjVu for storing
annotations. Program djvused simply compresses the annotation data
using the bzz(1) algorithm.
- *
- This syntax differs from the native syntax used by DjVu for
storing the hidden text. Program djvused performs the translations
between the compact binary representation used by DjVu and the easily
modifiable parenthesized syntax.
General syntax¶
Djvused files are
ASCII text files. The legal characters in
djvused files are the printable
ASCII characters and the space,
tab, cr, and nl characters. Using other characters has undefined results.
Djvused files are composed of a sequence of expressions separated by blank
characters (space, tab, cr, or nl). There are four kind of expressions, namely
integers, symbols, strings and lists.
- Integers:
- Integer numbers are represented by one or more digits, with
the usual interpretation.
- Symbols:
- Symbols, or identifiers, are sequences of printable ascii
characters representing a name or a keyword. Acceptable characters are the
alpha-numeric characters, the underscore "_", the minus
character "-", and the hash character "#". Names
should not begin with a digit or a minus character.
- Strings:
- Strings denote an arbitrary sequence of bytes, usually
interpreted as a sequence of UTF-8 encoded characters.
Strings in djvused files are similar to strings in the C language. They
are surrounded by double quote characters. Certain sequences of characters
starting with a backslash ("\") have a special meaning. A
backslash followed by letter "a", "b", "t",
"n", "v", "f", "r", "\",
and stands for the ascii character BEL(007), BS(008), HT(009), LF(010),
VT(011), FF(012), CR(013), BACKSLASH(134) and DOUBLEQUOTE(042)
respectively. A backslash followed by one to three digits stands for the
byte whose octal code is expressed by the digits. All other backslash
sequences are illegal. All non printable ascii characters must be
escaped.
- Lists:
- Lists are sequence of expressions separated by blanks and
surrounded by parentheses. All expressions types are acceptable within a
list, including sub-lists.
Hidden text syntax¶
The building blocks of the hidden text syntax are lists representing each
structural component of the hidden text. Structural components have the
following form:
-
- (type xmin
ymin xmax ymax ... )
The symbol
type must be one of
page,
column,
region,
para,
line,
word, or
char, listed here by
decreasing order of importance. The integers
xmin,
ymin,
xmax, and
ymax represent the coordinates of a rectangle
indicating the position of the structural component in the page. Coordinates
are measured in pixels and have their origin at the bottom left corner of the
page. The remaining expressions in the list either is a single string
representing the encoded text associated with this structural component, or is
a sequence of structural components with a lesser type.
The hidden text for each page is simply represented by a single structural
element of type
page. Various level of structural information are
acceptable. For instance, the page level component might only specify a page
level string, or might only provide a list of lines, or might provide a full
hierarchy down to the individual characters.
Outline/Bookmark syntax¶
The outline syntax is a single list of the form
-
- (bookmarks ...)
The first element of the list is symbol
bookmarks. The subsequent
elements are lists representing the toplevel outline entries. Each outline
entry is represented by a list with the following form:
-
- (title url ... )
The string
title is the title of the outline entry. The destination
string
url can be either an arbitrary percent encoded
URL, or composed of the hash character ("#") followed
by a page name or number, or composed of the question mark character
("?") followed by cgi-style arguments interpreted by the djvu
viewer. The remaining expressions in the list describe subentries of this
outline entry.
Annotation syntax¶
Annotations are represented by a sequence of annotation expressions. The
following annotation expressions are recognized:
- (background color)
- Specify the color of the viewer area surrounding the DjVu
image. Colors are represented with the X11 hexadecimal syntax
#RRGGBB. For instance, #000000 is black and #FFFFFF
is white.
- (zoom zoomvalue)
- Specify the initial zoom factor of the image. Argument
zoomvalue can be one of stretch, one2one,
width, page, or composed of the letter d followed by
a number in range 1 to 999 representing a zoom factor (such as in
d300 or d150 for instance.)
- (mode modevalue)
- Specify the initial display mode of the image. Argument
modevalue is one of color, bw, fore, or
back.
- (align horzalign
vertalign)
- Specify how the image should be aligned on the viewer
surface. By default the image is located in the center. Argument
horzalign can be one of left, center, or
right. Argument vertalign can be one of top,
center, or bottom.
- (maparea url comment
area ...)
- Define an hyper-link for the specified destination.
Argument
url can have one of the following forms:
-
- href
(url href target)
where
href is a string representing the destination and
target is
a string representing the target frame for the hyper-link, as defined by the
HTML anchor tag
<A>. The destination
string
href can be either an arbitrary percent encoded
URL, or composed of the hash character ("#") followed
by a page name or number, or composed of the question mark character
("?") followed by cgi-style arguments interpreted by the djvu
viewer. Page numbers may be prefixed with an optional sign to represent a page
displacement. For instance the strings
"#-1" and
"#+1" can be used to access the previous page and the next
page.
Argument
comment is a string that might be displayed by the viewer when
the user moves the mouse over the hyper-link.
Argument
area defines the shape and the location of the hyperlink. The
following forms are recognized:
-
- (rect xmin ymin
width height)
(oval xmin ymin width
height)
(poly x0 y0 x1 y1
... )
(text xmin ymin width
height)
(line x0 y0 x1
y1)
All parameters are numbers representing coordinates. Coordinates are measured in
pixels and have their origin at the bottom left corner of the page.
The remaining expressions in the
maparea list represent the visual effect
associated with the hyper-link.
A first set of options defines how borders are drawn for
rect,
oval,
polygon, or
text hyperlink areas.
-
- (none)
(xor)
(border color)
(shadow_in [thickness])
(shadow_out [thickness])
(shadow_ein [thickness])
(shadow_eout [thickness])
where parameter
color has syntax
#RRGGBB as described above, and
parameter thickness is an integer in range 1 to 32. The last four border
options are only supported for
rect hyperlink areas. The default border
is a simple black line. Border options do not apply to
line areas.
When a border option is specified, the border becomes visible when the user
moves the mouse over the hyperlink. The border may be made always visible by
using the following option:
-
- (border_avis)
The following two options may be used with
rect hyperlink areas. The
complete area will be highlighted using the specified color at the specified
opacity (0-100, default 50).
-
- (hilite color)
(opacity op)
This is often used with an empty
URL for simply emphasizing a
specific segment of an image.
The following three options may be used with line areas to specify an optional
ending arrow, the line width and color. The default is a black line with width
1 and without arrow.
-
- (arrow)
(width w)
(lineclr color)
Finally the following three options can be used with text areas. The default
background color is transparent. The default text color is black. The
pushpin option indicates that the text is symbolized by a small pushpin
icon. Clicking the icon reveals the text.
-
- (backclr bkcolor)
(textclr txtcolor)
(pushpin)
- (metadata ... (key value)
... )
- Define meta-data entries. Each entry is identified by a
symbol key representing the nature of the meta data entry. The
string value represents the value associated with the corresponding
key. Two sets of keys are noteworthy: keys borrowed from the BibTex
bibliography system, and keys borrowed from the PDF DocInfo metadata.
BibTex keys are always expressed in lowercase, such as year,
booktitle, editor, author, etc.. DocInfo keys start
with an uppercase letter, such as Title, Author,
Subject, Creator, Produced, Trapped,
CreationDate, and ModDate. The values associated with the
last two keys should be dates expressed according to RFC 3339.
LIMITATIONS¶
The current version of program
djvused only supports selecting one
component file or all component files. There is no way to select only a few
component files.
CREDITS¶
This program was initially written by Léon Bottou
<leonb@users.sourceforge.net> and was improved by Yann Le Cun
<profshadoko@users.sourceforge.net>, Florin Nicsa, Bill Riemers
<docbill@sourceforge.net> and many others.
SEE ALSO¶
djvu(1),
djvutxt(1),
djvmcvt(1),
djvudump(1),
bzz(1), Emacs djvused front end
djvu.el on
GNU
Elpa repository.