NAME¶
unifuzz - Emit strings designed to test Unicode handling
SYNOPSIS¶
unifuzz ([option flags])
DESCRIPTION¶
unifuzz emits strings designed to test the ability of programs intended
to accept Unicode input to handle unexpected input. These include: characters
from all Unicode ranges, Private Use characters, surrogates, undefined
characters, non-characters, control characters, exotic space characters,
sequences violating normalization rules, unexpected sequences (e.g. a base
character from one range followed by a combining character from another
range), and long sequences of combining characters. It can also generate very
long lines, strings containing embedded nulls, and ill-formed UTF-8.
COMMAND LINE FLAGS¶
- -b
- Restrict the output to the Basic Multilingual Plane (Plane
0).
- -g
- Do not emit specific characters.
- -h
- Print usage information.
- -l
- Emit very long lines.
- -n
- Emit string with embedded nulls.
- -q
- Be quiet. Omit commentary.
- -r <number>
- Set the number of random characters to emit.
- -S
- Scan ranges - emit a character from each range.
- -s <seed>
- Set the seed for the random number generator.
- -u
- Emit ill-formed UTF-8.
- -v
- Print version information.
The sequence of random characters is determined by a pseudorandom number
generator, so the same sequence can be obtained by setting the seed to the
same value. If not set on the command line, a seed is chosen based on the time
of execution. The seed used is included in the output in a line of the form
"Seed = NNNNNN" immediately preceding the random character sequence.
Note that in order to obtain the same sequence it is necessary to keep the
same setting for restriction of output to the BMP.
REFERENCES¶
Unicode Standard, version 5.0
AUTHOR¶
Bill Poser
billposer@alum.mit.edu
LICENSE¶
GNU General Public License