NAME¶

dirconv — locate and transcode mixed-encoding file names

SYNOPSIS¶

dirconv [-078dFhnpruvw] [-f charset] [-x regex] [path ...]

DESCRIPTION¶

The dirconv utility recursively scans the specified path(s) and classifies files and directories according to whether their names are pure 7-bit ASCII, non-ASCII but valid UTF-8, double-UTF-8 (WTF-8), or neither.

Names in the latter category are assumed to be Latin-1, unless a different encoding is specified with the -f option.

By default, the dirconv utility then prints the names that are neither pure 7-bit ASCII nor valid UTF-8.

The following options are available:

-0: Print a NUL character rather than a newline after each path. This option has no effect if the -n option was also specified.
-7: Select names that are pure 7-bit ASCII.
-8: Select names that contain non-ASCII characters but are not valid UTF-8. This is the default unless the -7, -u and / or -w options are specified.
-d: Show debugging information. This option can be specified multiple times to increase the level of detail.
-F: In conjunction with the -r option, force renaming a file when the target already exists.
-f charset: Specify the assumed character set for non-ASCII, non-UTF-8 names. The default is “iso8859-1”.
-h: Print a usage message and exit.
-n: In conjunction with the -r option, show what would have happened, but do not actually rename any files.
-p: Print the selected names.
-r: Attempt to convert the selected names to UTF-8 and rename the files and directories.
-u: Select names which contain non-ASCII characters and are valid UTF-8 but not WTF-8.
-v: Print the source reversion number and exit.
-w: Select names which seem to be WTF-8-encoded.
-x regex: Do not inspect files and directories whose unconverted names match the specified POSIX extended regular expression.

AUTHORS¶

The dirconv utility and this manual page were written by Dag-Erling Smørgrav ⟨des@des.no⟩ for the University of Oslo.

NOTES¶

The dirconv utility works by attempting to decode each name as if it were a sequence of UTF-8 characters. It is possible, but highly unlikely, that a random string of characters in a non-UTF single-byte encoding would look like a valid UTF-8 sequence.

Reliable detection of WTF-8 is only possible if the original 8-bit encoding is known.

The exclusion filter is applied before name conversion. Character classes are unlikely to work as expected on unconverted names.

November 18, 2014

Source file:	dirconv.1.en.gz (from conv-tools 20160905-5)
Source last updated:	2024-04-14T01:59:53Z
Converted to HTML:	2024-04-14T09:35:40Z

NAME¶

SYNOPSIS¶

DESCRIPTION¶

SEE ALSO¶

AUTHORS¶

NOTES¶