'\" t .\" Title: classifier_tester .\" Author: [see the "AUTHOR" section] .\" Generator: DocBook XSL Stylesheets v1.79.1 .\" Date: 01/21/2019 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" .TH "CLASSIFIER_TESTER" "1" "01/21/2019" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" classifier_tester \- for *legacy tesseract* engine\&. .SH "SYNOPSIS" .sp \fBclassifier_tester\fR \-U \fIunicharset_file\fR \-F \fIfont_properties_file\fR \-X \fIxheights_file\fR \-classifier \fIx\fR \-lang \fIlang\fR [\-output_trainer trainer] *\&.tr .SH "DESCRIPTION" .sp classifier_tester(1) runs Tesseract in a special mode\&. It takes a list of \&.tr files and tests a character classifier on data as formatted for training, but it doesn\(cqt have to be the same as the training data\&. .SH "IN/OUT ARGUMENTS" .sp a list of \&.tr files .SH "OPTIONS" .PP \-l \fIlang\fR .RS 4 (Input) three character language code; default value \fIeng\fR\&. .RE .PP \-classifier \fIx\fR .RS 4 (Input) One of "pruner", "full"\&. .RE .PP \-U \fIunicharset\fR .RS 4 (Input) The unicharset for the language\&. .RE .PP \-F \fIfont_properties_file\fR .RS 4 (Input) font properties file, each line is of the following form, where each field other than the font name is 0 or 1: .sp .if n \{\ .RS 4 .\} .nf *font_name* *italic* *bold* *fixed_pitch* *serif* *fraktur* .fi .if n \{\ .RE .\} .RE .PP \-X \fIxheights_file\fR .RS 4 (Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi\&. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ] .sp .if n \{\ .RS 4 .\} .nf *font_name* *xheight* .fi .if n \{\ .RE .\} .RE .PP \-output_trainer \fItrainer\fR .RS 4 (Output, Optional) Filename for output trainer\&. .RE .SH "SEE ALSO" .sp tesseract(1) .SH "COPYING" .sp Copyright (C) 2012 Google, Inc\&. Licensed under the Apache License, Version 2\&.0 .SH "AUTHOR" .sp The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985\-1995) and Google (2006\-present)\&.