OCR4GAMERA(1)

NAME¶

ocr4gamera - OCR system using the Gamera framework

USAGE¶

ocr4gamera -x <traindata> [options] <imagefile>

-v <int>, --verbosity=<int>: Set verbosity level to <int>. Possible values are 0 (default): silent operation; 1: information on progress; >2: segmentation info is written to PNG files with prefix debug_.
-h, --help: Display help and exit.
--version: Print version and exit.
-d, --deskew: Do a skew correction (recommended).

-mf <ws>, --median_filter=<ws>: Smooth the input image with a median filter with window size <ws>. Default is <ws>=0, which means no smoothing
-ds <s>, --despeckle=<s>: Remove all speckle with size <= <s>. Default is <s> = 0, which means no despeckling.

-f, --filter: Filter out very large (images) and very small components (noise).
-a, --automatic-group: Autogroup glyphs with classifier.
-x <file>, --xmlfile=<file>: Read training data from <file>.
-o <xml>, --output=<xml>: Write recognized text to file <xml> (otherwise it is written to stdout).

-od <dir>, --output_directory=<dir>: Writes for each input image <img> the recognized text to <dir>/<img>.txt. Note that this option cannot be used in combination with -o (--outfile).

-c <csv>, --extra_chars_csvfile=<csv>: Read additional class name conversions from file <csv>. <csv> must contain one conversion per line.
-R <rules>, --heuristic_rules=<rules>: Apply heuristic rules <rules> for disambiguation of some chars. <rules> can be roman (default) or none (for no rules).
-D, --dictionary-correction: Correct words using a dictionary (requires aspell or ispell).
-L <lang>, --dictionary-language=<lang>: Use <lang> as language for aspell (when option -D is set).
-e <int>, --edit-distance=<int>: Correct words only when edit distance not more than <int>.
-ho, --hocr_out: Writes output as hocr file (only works with the -o option).

-hi <hocrfile>, --hocr_in=<hocrfile>: Uses an hocr input file for textline segmentation.

Source file:	ocr4gamera.1.en.gz (from python-gamera.toolkits.ocr 1.2.2-6)
Source last updated:	2018-12-30T18:03:33Z
Converted to HTML:	2020-08-08T10:08:23Z