table of contents
DJVU2HOCR(1) | djvu2hocr manual | DJVU2HOCR(1) |
NAME¶
djvu2hocr - DjVu to hOCR converterSYNOPSIS¶
djvu2hocr
[ option...] djvu-file
djvu2hocr
{ --version | --help | -h}
DESCRIPTION¶
djvu2hocr converts hidden text from a DjVu file to the hOCR[1] format.OPTIONS¶
Text segmentation options¶
--word-segmentation=simpleUse the same word segmentation as found in the
DjVu file.
This is the default.
--word-segmentation=uax29
Use the Unicode Text Segmentation[2]
algorithm to break lines into words, possibly fixing word segmentation found
in the DjVu file.
Other options¶
--versionOutput version information and exit.
-h, --help
Display help and exit.
PORTABILITY¶
djvu2hocr uses a custom extension to hOCR to retain characters which cannot be directly represented in an HTML/XML document. For example, control character BEL (^G, U+0007), is converted into the following HTML chunk: <span class="djvu_char" title="#x07"> </span>SEE ALSO¶
AUTHOR¶
Jakub Wilk <jwilk@jwilk.net>Author.
NOTES¶
- 1.
- hOCR
- 2.
- Unicode Text Segmentation
03/10/2012 | djvu2hocr 0.7.9 |