.\" Hey, EMACS: -*- nroff -*- .TH UNICODE 1 "2003-01-31" .SH NAME unicode \- command line unicode database query tool .SH SYNOPSIS .B unicode .RI [ options ] string .SH DESCRIPTION This manual page documents the .B unicode command. .PP \fBunicode\fP is a command line unicode database query tool. .SH OPTIONS .TP .BI \-h .BI \-\-help Show help and exit. .TP .BI \-x .BI \-\-hexadecimal Assume .I string to be a hexadecimal number .TP .BI \-d .BI \-\-decimal Assume .I string to be a decimal number .TP .BI \-o .BI \-\-octal Assume .I string to be an octal number .TP .BI \-b .BI \-\-binary Assume .I string to be a binary number .TP .BI \-r .BI \-\-regexp Assume .I string to be a regular expression .TP .BI \-s .BI \-\-string Assume .I string to be a sequence of characters .TP .BI \-a .BI \-\-auto Try to guess type of .I string from one of the above (default) .TP .BI \-mMAXCOUNT .BI \-\-max=MAXCOUNT Maximal number of codepoints to display, default: 20; use 0 for unlimited .TP .BI \-iCHARSET .BI \-\-io=IOCHARSET I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP tries to guess this value from your locale, so with properly set up locale, you should not need to specify it. .TP .BI \-\-fcp=CHARSET .BI \-\-fromcp=CHARSET Convert numerical arguments from this encoding, default: no conversion. Multibyte encodings are supported. This is ignored for non-numerical arguments. .TP .BI \-cADDCHARSET .BI \-\-charset\-add=ADDCHARSET Show hexadecimal reprezentation of displayed characters in this additional charset. .TP .BI \-CUSE_COLOUR .BI \-\-colour=USE_COLOUR USE_COLOUR is one of .I on .I off .I auto .B \-\-colour=on will use ANSI colour codes to colourise the output .B \-\-colour=off won't use colours. .B \-\-colour=auto will test if standard output is a tty, and use colours only when it is. .BI \-\-color is a synonym of .BI \-\-colour .TP .BI \-v .BI \-\-verbose Be more verbose about displayed characters, e.g. display Unihan information, if available. .TP .BI \-w .BI \-\-wikipedia Spawn browser pointing to English Wikipedia entry about the character. .TP .BI \-\-wt .BI \-\-wiktionary Spawn browser pointing to English Wiktionary entry about the character. .TP .BI \-\-brief Display character information in brief format .TP .BI \-\-format=fmt Use your own format for character information display. See the README for details. .TP .BI \-\-list List (approximately) all known encodings. .SH USAGE \fBunicode\fP tries to guess the type of an argument. In particular, if the arguments looks like a valid hexadecimal representation of a Unicode codepoint, it will be considered to be such. Using \fBunicode\fP face will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE, and it will not search for 'face' in character descriptions \- for the latter, use: \fBunicode\fP -r face For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a): \fBunicode\fP 00E1 \fBunicode\fP U+00E1 \fBunicode\fP \('a \fBunicode\fP 'latin small letter a with acute' You can specify a range of characters as argumets, \fBunicode\fP will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g. \fBunicode\fP 0450..0520 will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF) \fBunicode\fP 0400.. will display just characters from U+0400 up to U+04FF Use --fromcp to query codepoints from other encodings: \fBunicode\fP --fromcp cp1250 -d 200 Multibyte encodings are supported: \fBunicode\fP --fromcp big5 -x aff3 and multi-char strings are supported, too: \fBunicode\fP --fromcp utf-8 -x c599c3adc5a5 .SH BUGS Tabular format does not deal well with full-width, combining, control and RTL characters. .SH SEE ALSO ascii(1) .SH AUTHOR Radovan Garab\('ik