UNICODE(1)

General Commands Manual

UNICODE(1)

NAME¶

unicode - command line unicode database query tool

SYNOPSIS¶

unicode [options] string

DESCRIPTION¶

This manual page documents the unicode command.

unicode is a command line unicode database query tool.

OPTIONS¶

-h: --help
Show help and exit.

-x: --hexadecimal
Assume string to be a hexadecimal number

-d: --decimal
Assume string to be a decimal number

-o: --octal
Assume string to be an octal number

-b: --binary
Assume string to be a binary number

-r: --regexp
Assume string to be a regular expression

-s: --string
Assume string to be a sequence of characters

-a: --auto
Try to guess type of string from one of the above (default)

-mMAXCOUNT: --max=MAXCOUNT
Maximal number of codepoints to display, default: 20; use 0 for unlimited

-iCHARSET: --io=IOCHARSET
I/O character set. For maximal pleasure, run unicode on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. unicode tries to guess this value from your locale, so with properly set up locale, you should not need to specify it.

--fcp=CHARSET: --fromcp=CHARSET
Convert numerical arguments from this encoding, default: no conversion. Multibyte encodings are supported. This is ignored for non-numerical arguments.

-cADDCHARSET: --charset-add=ADDCHARSET
Show hexadecimal reprezentation of displayed characters in this additional charset.

-CUSE_COLOUR: --colour=USE_COLOUR
USE_COLOUR is one of on off auto
--colour=on will use ANSI colour codes to colourise the output
--colour=off won't use colours.
--colour=auto will test if standard output is a tty, and use colours only when it is.
--color is a synonym of --colour

-v: --verbose
Be more verbose about displayed characters, e.g. display Unihan information, if available.

-w: --wikipedia
Spawn browser pointing to Wikipedia entry about the character.

--list: List (approximately) all known encodings.

USAGE¶

unicode tries to guess the type of an argument. In particular, if the arguments looks like a valid hexadecimal representation of a Unicode codepoint, it will be considered to be such. Using

unicode face

will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE, and it will not search for 'face' in character descriptions - for the latter, use:

unicode -r face

For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (á):

unicode 00E1

unicode U+00E1

unicode á

unicode 'latin small letter a with acute'

You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g.

unicode 0450..0520

will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)

unicode 0400..

will display just characters from U+0400 up to U+04FF

Use --fromcp to query codepoints from other encodings:

unicode --fromcp cp1250 -d 200

Multibyte encodings are supported: unicode --fromcp big5 -x aff3

and multi-char strings are supported, too:

unicode --fromcp utf-8 -x c599c3adc5a5

BUGS¶

Tabular format does not deal well with full-width, combining, control and RTL characters.

AUTHOR¶

Radovan Garabík <garabik @ kassiopeia.juls.savba.sk>

2003-01-31

Source file:	unicode.1.en.gz (from unicode 0.9.8)
Source last updated:	2014-08-28T09:06:53Z
Converted to HTML:	2019-03-01T22:23:30Z