Scroll to navigation



ocr4gamera - OCR system using the Gamera framework


ocr4gamera -x <traindata> [options] <imagefile>


-v <int>, --verbosity=<int>
Set verbosity level to <int>. Possible values are 0 (default): silent operation; 1: information on progress; >2: segmentation info is written to PNG files with prefix debug_.
-h, --help
Display help and exit.
Print version and exit.
-d, --deskew
Do a skew correction (recommended).

-mf <ws>, --median_filter=<ws>
Smooth the input image with a median filter with window size <ws>. Default is <ws>=0, which means no smoothing
-ds <s>, --despeckle=<s>
Remove all speckle with size <= <s>. Default is <s> = 0, which means no despeckling.

-f, --filter
Filter out very large (images) and very small components (noise).
-a, --automatic-group
Autogroup glyphs with classifier.
-x <file>, --xmlfile=<file>
Read training data from <file>.
-o <xml>, --output=<xml>
Write recognized text to file <xml> (otherwise it is written to stdout).

-od <dir>, --output_directory=<dir>
Writes for each input image <img> the recognized text to <dir>/<img>.txt. Note that this option cannot be used in combination with -o (--outfile).

-c <csv>, --extra_chars_csvfile=<csv>
Read additional class name conversions from file <csv>. <csv> must contain one conversion per line.
-R <rules>, --heuristic_rules=<rules>
Apply heuristic rules <rules> for disambiguation of some chars. <rules> can be roman (default) or none (for no rules).
-D, --dictionary-correction
Correct words using a dictionary (requires aspell or ispell).
-L <lang>, --dictionary-language=<lang>
Use <lang> as language for aspell (when option -D is set).
-e <int>, --edit-distance=<int>
Correct words only when edit distance not more than <int>.
-ho, --hocr_out
Writes output as hocr file (only works with the -o option).

-hi <hocrfile>, --hocr_in=<hocrfile>
Uses an hocr input file for textline segmentation.