.TH KONWERT 1 "30 Jul 1998" "Konwert" "Linux User's Manual" .SH NAME konwert \- interface for various character encoding conversions .SH SYNOPSIS .B konwert .I FILTER .RI [ FILE ].\|.\|.\| .RB [ \-o .I DEST | .BR \-O ] .SH DESCRIPTION Konwert allows filtering multiple files through multiple filters. It filters the specified .IR FILE s, or stdin if none are given. .PP Simple .I FILTER is the name of an executable file from the directory .I ~/.konwert/filters or the system-wide one, normally .IR /usr/share/konwert/filters . Such program itself filters stdin to stdout. .PP The filtering rule can be more complex: .PP .B konwert .IB FILTER1 + FILTER2 means .B konwert .I FILTER1 | .B konwert .IR FILTER2 . .PP .B konwert .IB FORMAT1 \- FORMAT2\fR, unless such filter exists, tries to find a common .IR FORMAT3 , such that both filters .IB FORMAT1 \- FORMAT3 and .IB FORMAT3 \- FORMAT1 do exist. .PP .B konwert .IB FILTER / ARG /\fR.\|.\|.\| passes arguments to the filter. Arguments can also be specified here: .IB FORMAT1 / ARGS \- FORMAT2\fR. The meaning of arguments depends on the particular filter. .PP .B konwert .BI '( "COMMAND ARGS\fR.\|.\|.\|" )' executes this arbitrary shell command. This is useful with .B \-o or .B \-O options. The command cannot contain the string .BR )+ , which will terminate this filter's specification. .SS OPTIONS .TP 10 \fB\-o\fP \fIDEST\fP output goes to this file/directory instead of stdout .TP .B \-O every input file is replaced with its translation .TP .B \-\-help display help and exit .TP .B \-\-version output version information and exit .PP Redirecting output to one of the source files with either .B \-o or .B > instead of .B \-O will corrupt it! Option .B \-O creates a temporary file in .I /tmp and later copies it back onto the source. .SH CHARACTER ENCODING CONVERSIONS You can convert text between any two charsets, for example .B konwert .BR cp437\-iso2 . .PP Characters unavailable in the target charset will be substituted with approximations with available ones. The approximations need not be single characters. .PP The following character sets are currently supported: .TP .B ascii 7bit ASCII .TP 16 .B utf8\fR = \fPunicode Unicode UTF-8 .TP 7 .PD 0 .B iso1\fR = \fPisolatin1 ISO-8859-1 aka ISO Latin 1 (Western European) .TP .B iso2\fR = \fPisolatin2 ISO-8859-2 aka ISO Latin 2 (Central European) .TP .B iso3\fR = \fPisolatin3 ISO-8859-3 aka ISO Latin 3 (Esperanto) .TP .B iso4\fR = \fPisolatin4 ISO-8859-4 aka ISO Latin 4 (Baltic) .TP .B iso5\fR = \fPisolatincyr ISO-8859-5 (Cyrillic) .TP .B iso6\fR = \fPisolatinarabic ISO-8859-6 (Arabic) .TP .B iso7\fR = \fPisolatingreek ISO-8859-7 (Greek) .TP .B iso8\fR = \fPisolatinhebrew ISO-8859-8 (Hebrew) .TP .B iso9\fR = \fPisolatin5\fR = \fPisolatintur ISO-8859-9 aka ISO Latin 5 (Turkish) .TP .B iso10\fR = \fPisolatin6\fR = \fPisolatinnordic ISO-8859-10 aka ISO Latin 6 (Nordic) .TP .B iso12\fR = \fPisolatin7\fR = \fPisolatinceltic ISO-8859-12 aka ISO Latin 6 (Celtic) - Draft .TP .B iso13\fR = \fPisolatin8\fR = \fPisolatinbaltic ISO-8859-13 aka ISO Latin 6 (Baltic) - Draft .TP .B iso14\fR = \fPisolatin9\fR = \fPisolatinsami ISO-8859-14 aka ISO Latin 6 (Sámi) - Draft .TP .B iso15 ISO-8859-15 - Draft .PD .TP 9 .PD 0 .B koi8r KOI8-R (Russian) .TP .B koi8u KOI8-U (Ukrainian, Byelorussian) .TP .B koi8uni KOI8-Uni (Cyrillic) .PD .TP 30 .PD 0 .B cp1250\fR = \fPwince\fR = \fPwinlatin2 Windows CP-1250 aka Win Latin 2 (Central European) .TP .B cp1251\fR = \fPwincyr Windows CP-1251 (Cyrillic) .TP .B cp1252\fR = \fPwinwest\fR = \fPwinlatin1 Windows CP-1252 aka Win Latin 1 (Western European) .TP .B cp1253\fR = \fPwingr Windows CP-1253 (Greek) .TP .B cp1254\fR = \fPwintur Windows CP-1254 (Turkish) .TP .B cp1255\fR = \fPwinhebrew Windows CP-1255 (Hebrew) .TP .B cp1256\fR = \fPwinarabic Windows CP-1256 (Arabic) .TP .B cp1257\fR = \fPwinbaltic Windows CP-1257 (Baltic) .TP .B cp1258\fR = \fPwinviet Windows CP-1258 (Vietnamese) .PD .TP 29 .PD 0 .B cp437\fR = \fPicmeng DOS CP-437 (English) .TP .B cp737\fR = \fPdosgreek DOS CP-737 (Greek) .TP .B cp775\fR = \fPdosbaltic DOS CP-775 (Baltic) .TP .B cp850\fR = \fPdoswest\fR = \fPdoslatin1 DOS CP-850 aka DOS Latin 1 (Western European) .TP .B cp852\fR = \fPdosce\fR = \fPdoslatin2 DOS CP-852 aka DOS Latin 2 (Central European) .TP .B cp855\fR = \fPdoscyr DOS CP-855 (Cyrillic) .TP .B cp857\fR = \fPdostur DOS CP-857 (Turkish) .TP .B cp860\fR = \fPdosportugal DOS CP-860 (Portugal) .TP .B cp861\fR = \fPdosiceland DOS CP-861 (Icelandic) .TP .B cp862\fR = \fPdoshebrew DOS CP-862 (Hebrew) .TP .B cp863\fR = \fPdoscanadfr DOS CP-863 (Canadian French) .TP .B cp864\fR = \fPdosarabic DOS CP-864 (Arabic) .TP .B cp865\fR = \fPdosnordic DOS CP-865 (Nordic) .TP .B cp866\fR = \fPdosrussian DOS CP-866 (Russian) .TP .B cp869\fR = \fPdosgreek2 DOS CP-869 (Greek2) .TP .B cp874\fR = \fPdosthai DOS CP-874 (Thai) .PD .TP 12 .PD 0 .B mac Macintosh Roman (Western European) .TP .B macce Macintosh Central European .TP .B maccyr Macintosh Cyrillic .TP .B macgreek Macintosh Greek .TP .B maciceland Macintosh Icelandic .TP .B mactur Macintosh Turkish .PD .TP 13 .PD 0 .BR csk , .TP .BR cyfromat , .TP .BR dhn , .TP .BR fidomazovia , .TP .BR iea , .TP .BR logic , .TP .BR mazovia , .TP .B microvex DOS charsets for Polish .PD .TP .PD 0 .BR amigapl , .TP .BR fat , .TP 9 .B xjp Amiga charsets for Polish .PD .TP 11 .B kamenicky DOS charset for Czech and Slovak .TP 10 .B wingreek WinGreek (Windows font-based encoding for ancient Greek) .TP 9 .PD 0 .B babelpl TeX [polish]{babel}: .I \&"a"c"e"l"n"o"s"z"r .TP .B ciachy TeX \\prefixing: .I /a/c/e/l/n/o/s/x/z .PD .TP 15 .PD 0 .B xmetodo Esperanto: .I cx gx hx jx sx ux .RI ( vx\ w ) .TP .B hmetodo Esperanto: .I ch gh hh jh sh u .TP .B antauxcxap Esperanto: .I ^c ^g ^h ^j ^s ^u .RI ( ~u ) .TP .B postcxap Esperanto: .I c^ g^ h^ j^ s^ u^ .RI ( u~ ) .TP .B apostrofoj Esperanto: .I c' g' h' j' s' u' .TP .B malapostrofoj Esperanto: .I c` g` h` j` s` u` .PD .TP .PD 0 .TP 8 .B viscii VISCII (Vietnamese) .TP .B viqri Vietnamese Quoted Readable Implicit .PD .TP 9 .PD 0 .B htmldec SGML/HTML character references (decimal): .I Æ ě → .TP .B htmlhex SGML/HTML character references (hexadecimal): .I Æ ě → .TP .B htmlent SGML/HTML character entities (names): .I Æ &ecaron → .TP .B html All three above (only as input format) .PD .TP 7 .B tex TeX with some LaTeX or AMS-TeX extensions. There is no distinction between normal and math mode - you will probably have to insert some .IR $ 's manually. .TP 11 .PD 0 .B mnemonic RFC 1345 mnemonics preceded by .I & .TP .B mnemonic1 RFC 1345 mnemonics preceded by .I ` .PD .TP 7 \fBany/\fILANGUAGE\fR (e.g. \fBany/pl-iso2\fP) This special input format will detect the encoding automatically, basing on the frequencies of characters found in text. Every language is associated with a set of possible encodings used for it and average frequencies of its letters (excluding ASCII letters). The best fitting encoding is used for conversion. Currently supported languages are .B cs (Czech), .B de (German), .B el (Greek), .B eo (Esperanto), .B es (Spanish), .B fr (French), .B he (Hebrew), .B it (Italian), .B pl (Polish), .B pt (Portuguese), .B ru (Russian), and .B sv (Swedish). .PD .TP 7 .B varpl Mixed Polish ISO-8859-2, CP-1250, and UTF-8. If you are reading Polish newsgroups I suggest putting it as a filter in your newsreader (for speed improvement it's better to call it directly, rather than through konwert). .TP .B vareo Mixed various Esperanto encodings. .SH OPTIONS CONTROLLING THE ABOVE CONVERSIONS .TP \fB/1\fP (e.g. \fBkonwert iso2-ascii/1\fP) Each unavailable character will be replaced only with a single approximate char, not string. This is useful with the filterm program or with preformatted text. This option is automatically turned on when a filter is used as output for filterm. .TP .B /html Text is assumed to be HTML. The characters .I \&" & < > resulting from other characters' approximations will be properly escaped as .I " & < .IR > . The .I header will be fixed if present. .TP .B /htmldec Convert META as above. Unavailable characters will be encoded in &#Unicode;. .TP .B /htmlhex Convert META as above. Unavailable characters will be encoded in hexadecimal &#xUnicode;. .TP .B /tex Unavailable characters will be described in TeX. Characters .I # $ % & \\\\ ^ _ { | } ~ resulting from some characters' approximations will be properly escaped into .I \\\\# \\\\$ \\\\% \\\\& $\\\\backslash$ \\\\^{} \\\\_ \\\\{ $|$ \\\\} .IR \\\\\\\\~{} . .TP .B /asciichar Recognizes some ASCII representations of characters, e.g.\| .I (c) .\|.\|.\| 1/2 .IR >= . .TP .B /rosyjski Russian text will be replaced with its Polish phonetic transcription. .PP Some output filters can use the language information for choosing better approximations of unavailable letters, for example .B /de (German): .I \(:a \(-> .I ae instead of .IR a . .SH OTHER FILTERS .TP .B any/\fILANGUAGE\fP-test Detects the encoding, but instead of text conversion only shows the encoding's name. The additional option .B /all shows all possible encodings, sorted from better to worse ones. .TP .PD 0 .B cr .TP .B lf .TP .B crlf .PD Force specific end-of-line marker convention. .B cr = Macintosh, .B lf = Unix and Amiga, .B crlf = Windows and DOS. The input convention is detected automatically. .TP .B expand Expands tabs into spaces (uses the textutils program expand). .TP .B unexpand Compresses spaces into tabs (uses the textutils program unexpand). .TP .B rmspacesateol Removes spaces and tabs at end of line. .TP .PD 0 .B qp-8bit .TP .B 8bit-qp .PD MIME Quoted Printable encoding: .IR =A3=F3d=BC . .TP .PD 0 .B rtf-8bit .TP .B 8bit-rtf .PD Rich Text Format: .IR \\\\\\\\'a3\\\\\\\\'f3d\\\\\\\\'9f . .TP .B txt-htmlchar Escapes .I \&" & < > into SGML/HTML entities .I " & < .IR > . Useful for including a text file inside HTML
 
tags. .TP .B htmlchar-txt Reverse. .TP .B rot13 Guvf vf n qrzbafgengvba bs ebg13. .TP .PD 0 .B toupper .TP .B tolower .PD Self-explanatory. Currently ASCII only. .TP .B prn7pl Converts polish chars to control sequences for EPSON-compatible printer. Using only 7-bit chars, backspacing printer's head and vertical positioning chars ,.'` it creates pseudo-polish glyphs. You can specify options: .B /nlq (default) which optimizes output for better quality printers and .B /draft - useful for ex. for 9-nails printer. .SH FILES .TP .PD 0 .I /usr/share/konwert/filters/* .TP .I ~/.konwert/filters/* .PD .SH "SEE ALSO" .BR trs (1), .BR filterm (1) .SH BUGS APPLE character in mac* charsets, and CH and ch characters in koi8cs are not preserved in conversion even when they are available. Also they don't respect the /1 option. Reason: they are not in Unicode. .SH COPYRIGHT Konwert is a package for conversion between various character encodings. .PP Copyright (c) 1998 Marcin 'Qrczak' Kowalczyk .PP This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. .PP This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. .PP You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA .SH AUTHOR .ft CW .nf __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.home.ml.org/ \\__/ GCS/M d- s+:-- a21 C+++>+++$ UL++>++++$ P+++ L++>++++$ E->++ ^^ W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t QRCZAK 5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y- .fi .ft R