NAME¶
patgen - generate patterns for TeX hyphenation
SYNOPSIS¶
patgen dictionary_file pattern_file patout_file translate_file
DESCRIPTION¶
This manual page is not meant to be exhaustive. See also the Info file or manual
Web2C: A TeX implementation.
The
patgen program reads the
dictionary_file containing a list of
hyphenated words and the
pattern_file containing previously-generated
patterns (if any) for a particular language (not a complete TeX source file;
see below), and produces the
patout_file with (previously- plus
newly-generated) hyphenation patterns for that language. The
translate_file defines language specific values for the parameters
left_hyphen_min and
right_hyphen_min used by TeX's hyphenation
algorithm and the external representation of the lower and upper case
version(s) of all `letters' of that language. Further details of the pattern
generation process such as hyphenation levels and pattern lengths are
requested interactively from the user's terminal. Optionally
patgen
creates a new dictionary file
pattmp.n showing the good and bad
hyphens found by the generated patterns, where
n is the highest
hyphenation level.
The patterns generated by
patgen can be read by
initex for use in
hyphenating words. For a real-life example of
patgen's output, see
$TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains the patterns
TeX uses for English by default. At some sites, patterns for (many) other
languages may be available, and the local
tex programs may have them
preloaded.
All filenames must be complete; no adding of default extensions or path
searching is done.
- Letters
- When initex digests hyphenation patterns, TeX first
expands macros and the result must entirely consist of digits (hyphenation
levels), dots (`.', edge of a word), and letters. In pattern files for
non-English languages letters are often represented by macros or other
expandable constructs. For the purpose of patgen these are just
character sequences, subject to the condition that no such sequence is a
prefix of another one.
- Dictionary file
- A dictionary file contains a weighted list of hyphenated
words, one word per line starting in column 1. A digit in column 1
indicates a global word weight (initially =1) applicable to all following
words up to the next global word weight. A digit at some intercharacter
position indicates a weight for that position only.
The hyphens in a word are indicated by `-', `*', or `.' (or their
replacements as defined in the translate file) for hyphens yet to be
found, `good' hyphens (correctly found by the patterns), and `bad' hyphens
(erroneously found by the patterns) respectively; when reading a
dictionary file `*' is treated like `-' and `.' is ignored.
- Pattern file
- A pattern file contains only patterns in the format above,
e.g., from a previous run of patgen. It may not contain any TeX
comments or control sequences. For instance, this is not a valid pattern
file:
% this is a pattern file read by TeX.
\patterns{%
...
}
It can only contain the actual patterns, i.e., the `...'.
- Translate file
- A translate file starts with a line containing the values
of left_hyphen_min in columns 1-2, right_hyphen_min in
columns 3-4, and either a blank or the replacement for one of the
"hyphen" characters `-', `*', and `.' in columns 5, 6, and 7.
(Input lines are padded with blanks as for many TeX related programs.)
Each following line defines one `letter': an arbitrary delimiter character
in column 1, followed by one or more external representations of that
character (first the `lower' case one used for output), each one
terminated by the delimiter and the whole sequence terminated by another
delimiter.
If the translate file is empty, the values left_hyphen_min=2,
right_hyphen_min=3, and the 26 lower case letters
a...z with their upper case representations
A...Z are assumed.
- Terminal input
- After reading the translate_file and any
previously-generated patterns from pattern_file, patgen
requests input from the user's terminal.
First the integer values of hyph_start and hyph_finish, the
lowest and highest hyphenation level for which patterns are to be
generated. The value of hyph_start should be larger than any
hyphenation level already present in pattern_file.
Then, for each hyphenation level, the integer values of pat_start and
pat_finish, the smallest and largest pattern length to be analyzed,
as well as good weight, bad weight, and threshold,
the weights for good and bad hyphens and a weight threshold for useful
patterns.
Finally the decision (`y' or `Y' vs. anything else) whether or not to
produce a hyphenated word list.
FILES¶
- $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
- The original hyphenation patterns for English, by Donald
Knuth and Frank Liang.
- $TEXMFMAIN/tex/generic/hyphen/ushyphmax.tex
- Maximal hyphenation patterns for English, extended by
Gerard Kuiken.
- http://www.ctan.org/tex-archive/language/
- Patterns and support for many other languages
SEE ALSO¶
Frank Liang and Peter Breitenlohner, patgen.web.
Frank Liang,
Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
University Ph.D. thesis, 1983,
http://tug.org/docs/liang.
Donald E. Knuth,
The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0,
Appendix H.
AUTHORS¶
Frank Liang wrote the first version of this program. Peter Breitenlohner made a
substantial revision in 1991 for TeX 3. The first version was published as the
appendix to the
TeXware technical report. Howard Trickey originally
ported it to Unix.