NAME¶
Text::Reflow - Perl module for reflowing text files using Knuth's paragraphing
algorithm.
SYNOPSIS¶
use Text::Reflow qw(reflow_file reflow_string reflow_array);
reflow_file($infile, $outfile, key => value, ...);
$output = reflow_string($input, key => value, ...);
$output = reflow_array(\@input, key => value, ...);
DESCRIPTION¶
These routines will reflow the paragraphs in the given file, filehandle, string
or array using Knuth's paragraphing algorithm (as used in TeX) to pick
"good" places to break the lines.
Each routine takes ascii text data with paragraphs separated by blank lines and
reflows the paragraphs. If two or more lines in a row are "indented"
then they are assumed to be a quoted poem and are passed through unchanged
(but see below)
The reflow algorithm tries to keep the lines the same length but also tries to
break at punctuation, and avoid breaking within a proper name or after certain
connectives ("a", "the", etc.). The result is a
file with a more "ragged" right margin than is produced by
"fmt" or "Text::Wrap" but it is easier to read since fewer
phrases are broken across line breaks.
For "reflow_file", if $infile is the empty string, then the input is
taken from STDIN and if $outfile is the empty string, the output is written to
STDOUT. Otherwise, $infile and $outfile may be a string, a FileHandle
reference or a FileHandle glob.
A typical invocation is:
reflow_file("myfile", "");
which reflows the whole of
myfile and prints the result to STDOUT.
KEYWORD OPTIONS¶
The behaviour of Reflow can be adjusted by setting various keyword options.
These can be set globally by referencing the appropriate variable in the
Text::Reflow package, for example:
$Text::Reflow::maximum = 80;
$Text::Reflow::optimum = 75;
will set the maximum line length to 80 characters and the optimum line length to
75 characters for all subsequent reflow operations. Or they can be passed to a
reflow_ function as a keyword parameter, for example:
$out = reflow_string($in, maximum => 80, optimum => 75);
in which case the new options only apply to this call.
The following options are currently implemented, with their default values:
- optimum => [65]
- The optimum line length in characters. This can be either a
number or a reference to an array of numbers: in the latter case, each
optimal line length is tried in turn for each paragraph, and the one which
leads to the best overall paragraph is chosen. This results in less ragged
paragraphs, but some paragraphs will be wider or narrower overall than
others.
- maximum => 75
- The maximum allowed line length.
- indent => ""
- Each line of output has this string prepended. "indent
=> string" is equivalent to "indent1 => string, indent2
=> string".
- indent1 => ""
- A string which is used to indent the first line in any
paragraph.
- indent2 => ""
- A string which is used to indent the second and subsequent
line in any paragraph.
- quote => ""
- Characters to strip from the beginning of a line before
processing. To reflow a quoted email message and then restore the quotes
you might want to use
quote => "> ", indent => "> "
- skipto => ""
- Skip to the first line starting with the given pattern
before starting to reflow. This is useful for skipping Project Gutenberg
headers or contents tables.
- skipindented => 2
- If "skipindented" = 0 then all indented lines are
flowed in with the surrounding paragraph. If "skipindented" = 1
then any indented line will not be reflowed. If "skipindented" =
2 then any two or more adjacent indented lines will not be reflowed. The
purpose of the default value is to allow poetry to pass through unchanged,
but not to allow a paragraph indentation from preventing the first line of
the paragraph from being reflowed.
- noreflow => ""
- A pattern to indicate that certain lines should not be
reflowed. For example, a table of contents might have a line of dots. The
option:
noreflow => '(\.\s*){4}\.'
will not reflow any lines containing five or more consecutive dots.
- frenchspacing => 'n'
- Normally two spaces are put at the end of a sentance or a
clause. The "frenchspacing" option (taken from the TeX macro of
the same name) disables this feature.
- oneparagraph => 'n'
- Set this to 'y' if you want the whole input to be flowed
into a single paragraph, ignoring blank lines in the input.
- semantic => 30
- This parameter indicates the extent to which semantic
factors matter (breaking on punctuation, avoiding a break within a clause
etc.). Set this to zero to minimise the raggedness of the right margin, at
the expense of readability.
- namebreak => 10
- Penalty for splitting up a name
- sentence => 20
- Penalty for sentence widows and orphans (ie splitting a
line immediately after the first word in a sentence, or before the last
word in a sentence)
- independent => 10
- Penalty for independent clause widows and orphans.
- dependent => 6
- Penalty for dependent clause widows and orphans.
- shortlast => 5
- Penalty for a short last line in a paragraph (one or two
words).
- connpenalty => 1
- Multiplier for the "negative penalty" for
breaking at a connective. In other words, increasing this value makes
connectives an even more attractive place to break a line.
EXPORT¶
None by default.
AUTHOR¶
Original "reflow" perl script written by Michael Larsen,
larsen@edu.upenn.math.
Modified, enhanced and converted to a perl module with XSUB by Martin Ward,
martin@gkc.org.uk
SEE ALSO¶
perl(1).
See "TeX the Program" by Donald Knuth for a description of the
algorithm used.