Scroll to navigation

OCR4GAMERA(1) OCR4GAMERA(1)

NAME

ocr4gamera - OCR system using the Gamera framework

USAGE

ocr4gamera -x <traindata> [options] <imagefile>

OPTIONS

Set verbosity level to <int>. Possible values are 0 (default): silent operation; 1: information on progress; >2: segmentation info is written to PNG files with prefix debug_.
Display help and exit.
Print version and exit.
Do a skew correction (recommended).

Smooth the input image with a median filter with window size <ws>. Default is <ws>=0, which means no smoothing
Remove all speckle with size <= <s>. Default is <s> = 0, which means no despeckling.

Filter out very large (images) and very small components (noise).
Autogroup glyphs with classifier.
Read training data from <file>.
Write recognized text to file <xml> (otherwise it is written to stdout).

Writes for each input image <img> the recognized text to <dir>/<img>.txt. Note that this option cannot be used in combination with -o (--outfile).

Read additional class name conversions from file <csv>. <csv> must contain one conversion per line.
Apply heuristic rules <rules> for disambiguation of some chars. <rules> can be roman (default) or none (for no rules).
Correct words using a dictionary (requires aspell or ispell).
Use <lang> as language for aspell (when option -D is set).
Correct words only when edit distance not more than <int>.
Writes output as hocr file (only works with the -o option).

Uses an hocr input file for textline segmentation.