HOCR2PDF(1) ExactImage Manual HOCR2PDF(1)


hocr2pdf - hOCR to PDF converter of the ExactImage toolkit


hocr2pdf [option...] {-i | --inputinput-file {-o | --outputoutput-file

hocr2pdf {-h | --help}


ExactImage is a fast C++ image processing library. Unlike many other library frameworks it allows operation in several color spaces and bit depths natively, resulting in low memory and computational requirements.

hocr2pdf creates well layouted, searchable PDF files from hOCR (annotated HTML) input obtained from an OCR system.


-i file, --input file
Read image from the specified file. Note that input hOCR is read from the standard input.

-o file, --output file

Save output PDF to the specified file.

-n, --no-image

Don't place the image over the text. By default the text layer is hidden behind the image.

-s, --sloppy-text

Sloppily place text, group words, do not draw single glyphs.

-r n, --resolution n

Override resolution of the input image to n dpi. The default resolution (if not specified in the input file) is 300 dpi.


Quality setting used for writing compressed images. Integer range 0-100, the default is 75


Compression method for writing images e.g. ascii85, hex, flate, jpeg, jpeg2000, ... Default based on bit-depth

-h, --help

Display help text and exit.


$ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr




