.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "NKF 3pm" .TH NKF 3pm 2024-03-07 "perl v5.38.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME .SH SYNOPSIS .IX Header "SYNOPSIS" .SH DESCRIPTION .IX Header "DESCRIPTION" \&\fBNkf\fR is a yet another kanji code converter among networks, hosts and terminals. It converts input kanji code to designated kanji code such as ISO\-2022\-JP, Shift_JIS, EUC-JP, UTF\-8, UTF\-16 or UTF\-32. .PP One of the most unique faculty of \fBnkf\fR is the guess of the input kanji encodings. It currently recognizes ISO\-2022\-JP, Shift_JIS, EUC-JP, UTF\-8, UTF\-16 and UTF\-32. So users needn't set the input kanji code explicitly. .PP By default, X0201 kana is converted into X0208 kana. For X0201 kana, SO/SI, SSO and ESC\-(\-I methods are supported. For automatic code detection, nkf assumes no X0201 kana in Shift_JIS. To accept X0201 in Shift_JIS, use \fB\-X\fR, \fB\-x\fR or \fB\-S\fR. .PP multiple options are specifed as seprate strings, such as .PP .Vb 1 \& print nkf(\*(Aq\-\-ic=UTF8\-MAC\*(Aq, \*(Aq\-w\*(Aq, $string), "\en"; .Ve .PP except the last arguments. .SH OPTIONS .IX Header "OPTIONS" .IP "\fB\-J \-S \-E \-W \-W16 \-W32 \-j \-s \-e \-w \-w16 \-w32\fR" 4 .IX Item "-J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32" Specify input and output encodings. Upper case is input. cf. \-\-ic and \-\-oc. .RS 4 .IP \fB\-J\fR 4 .IX Item "-J" ISO\-2022\-JP (JIS code). .IP \fB\-S\fR 4 .IX Item "-S" Shift_JIS and JIS X 0201 kana. EUC-JP is recognized as X0201 kana. Without \fB\-x\fR flag, JIS X 0201 Katakana (a.k.a.halfwidth kana) is converted into JIS X 0208. If you use Windows, see Windows\-31J (CP932). .IP \fB\-E\fR 4 .IX Item "-E" EUC-JP. .IP \fB\-W\fR 4 .IX Item "-W" UTF\-8N. .IP \fB\-W16[BL][0]\fR 4 .IX Item "-W16[BL][0]" UTF\-16. B or L gives whether Big Endian or Little Endian. 0 gives whther put BOM or not. .IP \fB\-W32[BL][0]\fR 4 .IX Item "-W32[BL][0]" UTF\-32. B or L gives whether Big Endian or Little Endian. 0 gives whther put BOM or not. .RE .RS 4 .RE .IP "\fB\-b \-u\fR" 4 .IX Item "-b -u" Output is buffered (DEFAULT), Output is unbuffered. .IP \fB\-t\fR 4 .IX Item "-t" No conversion. .IP \fB\-i[@B]\fR 4 .IX Item "-i[@B]" Specify the escape sequence for JIS X 0208. .RS 4 .IP \fB\-i@\fR 4 .IX Item "-i@" Use ESC ( @. (JIS X 0208\-1978) .IP \fB\-iB\fR 4 .IX Item "-iB" Use ESC ( B. (JIS X 0208\-1983/1990 DEFAULT) .RE .RS 4 .RE .IP \fB\-o[BJ]\fR 4 .IX Item "-o[BJ]" Specify the escape sequence for US\-ASCII/JIS X 0201 Roman. (DEFAULT B) .IP \fB\-r\fR 4 .IX Item "-r" {de/en}crypt ROT13/47 .IP "\fB\-h[123] \-\-hiragana \-\-katakana \-\-katakana\-hiragana\fR" 4 .IX Item "-h[123] --hiragana --katakana --katakana-hiragana" .RS 4 .PD 0 .IP "\fB\-h1 \-\-hiragana\fR" 4 .IX Item "-h1 --hiragana" .PD Katakana to Hiragana conversion. .IP "\fB\-h2 \-\-katakana\fR" 4 .IX Item "-h2 --katakana" Hiragana to Katakana conversion. .IP "\fB\-h3 \-\-katakana\-hiragana\fR" 4 .IX Item "-h3 --katakana-hiragana" Katakana to Hiragana and Hiragana to Katakana conversion. .RE .RS 4 .RE .IP \fB\-T\fR 4 .IX Item "-T" Text mode output (MS-DOS) .IP "\fB\-f[\fR\f(BIm\fR\fB [\- \fR\f(BIn\fR\fB]]\fR" 4 .IX Item "-f[m [- n]]" Folding on \fIm\fR length with \fIn\fR margin in a line. Without this option, fold length is 60 and fold margin is 10. .IP \fB\-F\fR 4 .IX Item "-F" New line preserving line folding. .IP \fB\-Z[0\-3]\fR 4 .IX Item "-Z[0-3]" Convert X0208 alphabet (Fullwidth Alphabets) to ASCII. .RS 4 .IP "\fB\-Z \-Z0\fR" 4 .IX Item "-Z -Z0" Convert X0208 alphabet to ASCII. .IP \fB\-Z1\fR 4 .IX Item "-Z1" Convert X0208 kankaku to single ASCII space. .IP \fB\-Z2\fR 4 .IX Item "-Z2" Convert X0208 kankaku to double ASCII spaces. .IP \fB\-Z3\fR 4 .IX Item "-Z3" Replacing fullwidth >, <, ", & into '>', '<', '"', '&' as in HTML. .RE .RS 4 .RE .IP "\fB\-X \-x\fR" 4 .IX Item "-X -x" With \fB\-X\fR or without this option, X0201 is converted into X0208 Kana. With \fB\-x\fR, try to preserve X0208 kana and do not convert X0201 kana to X0208. In JIS output, ESC\-(\-I is used. In EUC output, SS2 is used. .IP \fB\-B[0\-2]\fR 4 .IX Item "-B[0-2]" Assume broken JIS-Kanji input, which lost ESC. Useful when your site is using old B\-News Nihongo patch. .RS 4 .IP \fB\-B1\fR 4 .IX Item "-B1" allows any chars after ESC\-( or ESC\-$. .IP \fB\-B2\fR 4 .IX Item "-B2" force ASCII after NL. .RE .RS 4 .RE .IP \fB\-I\fR 4 .IX Item "-I" Replacing non iso\-2022\-jp char into a geta character (substitute character in Japanese). .IP \fB\-m[BQN0]\fR 4 .IX Item "-m[BQN0]" MIME ISO\-2022\-JP/ISO8859\-1 decode. (DEFAULT) To see ISO8859\-1 (Latin\-1) \-l is necessary. .RS 4 .IP \fB\-mB\fR 4 .IX Item "-mB" Decode MIME base64 encoded stream. Remove header or other part before conversion. .IP \fB\-mQ\fR 4 .IX Item "-mQ" Decode MIME quoted stream. '_' in quoted stream is converted to space. .IP \fB\-mN\fR 4 .IX Item "-mN" Non-strict decoding. It allows line break in the middle of the base64 encoding. .IP \fB\-m0\fR 4 .IX Item "-m0" No MIME decode. .RE .RS 4 .RE .IP \fB\-M\fR 4 .IX Item "-M" MIME encode. Header style. All ASCII code and control characters are intact. .RS 4 .IP \fB\-MB\fR 4 .IX Item "-MB" MIME encode Base64 stream. Kanji conversion is performed before encoding, so this cannot be used as a picture encoder. .IP \fB\-MQ\fR 4 .IX Item "-MQ" Perform quoted encoding. .RE .RS 4 .RE .IP \fB\-l\fR 4 .IX Item "-l" Input and output code is ISO8859\-1 (Latin\-1) and ISO\-2022\-JP. \&\fB\-s\fR, \fB\-e\fR and \fB\-x\fR are not compatible with this option. .IP "\fB\-L[uwm] \-d \-c\fR" 4 .IX Item "-L[uwm] -d -c" Convert line breaks. .RS 4 .IP "\fB\-Lu \-d\fR" 4 .IX Item "-Lu -d" unix (LF) .IP "\fB\-Lw \-c\fR" 4 .IX Item "-Lw -c" windows (CRLF) .IP \fB\-Lm\fR 4 .IX Item "-Lm" mac (CR) .Sp Without this option, nkf doesn't convert line breaks. .RE .RS 4 .RE .IP "\fB\-\-fj \-\-unix \-\-mac \-\-msdos \-\-windows\fR" 4 .IX Item "--fj --unix --mac --msdos --windows" Convert for these systems. .IP "\fB\-\-jis \-\-euc \-\-sjis \-\-mime \-\-base64\fR" 4 .IX Item "--jis --euc --sjis --mime --base64" Convert to named code. .IP "\fB\-\-jis\-input \-\-euc\-input \-\-sjis\-input \-\-mime\-input \-\-base64\-input\fR" 4 .IX Item "--jis-input --euc-input --sjis-input --mime-input --base64-input" Assume input system .IP "\fB\-\-ic=\fR\f(BIinput codeset\fR\fB \-\-oc=\fR\f(BIoutput codeset\fR" 4 .IX Item "--ic=input codeset --oc=output codeset" Set the input or output codeset. NKF supports following codesets and those codeset names are case insensitive. .RS 4 .IP ISO\-2022\-JP 4 .IX Item "ISO-2022-JP" a.k.a. RFC1468, 7bit JIS, JUNET .IP "EUC-JP (eucJP-nkf)" 4 .IX Item "EUC-JP (eucJP-nkf)" a.k.a. AT&T JIS, Japanese EUC, UJIS .IP eucJP-ascii 4 .IX Item "eucJP-ascii" .PD 0 .IP eucJP-ms 4 .IX Item "eucJP-ms" .IP CP51932 4 .IX Item "CP51932" .PD Microsoft Version of EUC-JP. .IP Shift_JIS 4 .IX Item "Shift_JIS" a.k.a. SJIS, MS_Kanji .IP Windows\-31J 4 .IX Item "Windows-31J" a.k.a. CP932 .IP UTF\-8 4 .IX Item "UTF-8" same as UTF\-8N .IP UTF\-8N 4 .IX Item "UTF-8N" UTF\-8 without BOM .IP UTF\-8\-BOM 4 .IX Item "UTF-8-BOM" UTF\-8 with BOM .IP "UTF8\-MAC (input only)" 4 .IX Item "UTF8-MAC (input only)" decomposed UTF\-8 .IP UTF\-16 4 .IX Item "UTF-16" same as UTF\-16BE .IP UTF\-16BE 4 .IX Item "UTF-16BE" UTF\-16 Big Endian without BOM .IP UTF\-16BE\-BOM 4 .IX Item "UTF-16BE-BOM" UTF\-16 Big Endian with BOM .IP UTF\-16LE 4 .IX Item "UTF-16LE" UTF\-16 Little Endian without BOM .IP UTF\-16LE\-BOM 4 .IX Item "UTF-16LE-BOM" UTF\-16 Little Endian with BOM .IP UTF\-32 4 .IX Item "UTF-32" same as UTF\-32BE .IP UTF\-32BE 4 .IX Item "UTF-32BE" UTF\-32 Big Endian without BOM .IP UTF\-32BE\-BOM 4 .IX Item "UTF-32BE-BOM" UTF\-32 Big Endian with BOM .IP UTF\-32LE 4 .IX Item "UTF-32LE" UTF\-32 Little Endian without BOM .IP UTF\-32LE\-BOM 4 .IX Item "UTF-32LE-BOM" UTF\-32 Little Endian with BOM .RE .RS 4 .RE .IP "\fB\-\-fb\-{skip, html, xml, perl, java, subchar}\fR" 4 .IX Item "--fb-{skip, html, xml, perl, java, subchar}" Specify the way that nkf handles unassigned characters. Without this option, \-\-fb\-skip is assumed. .IP "\fB\-\-prefix=\fR\f(BIescape character\fR\f(BItarget character\fR\fB..\fR" 4 .IX Item "--prefix=escape charactertarget character.." When nkf converts to Shift_JIS, nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters. 1st byte of argument is the escape character and following bytes are target characters. .IP \fB\-\-no\-cp932ext\fR 4 .IX Item "--no-cp932ext" Handle the characters extended in CP932 as unassigned characters. .IP \fB\-\-no\-best\-fit\-chars\fR 4 .IX Item "--no-best-fit-chars" When Unicode to Encoded byte conversion, don't convert characters which is not round trip safe. When Unicode to Unicode conversion, with this and \-x option, nkf can be used as UTF converter. (In other words, without this and \-x option, nkf doesn't save some characters) .Sp When nkf converts strings that related to path, you should use this opion. .IP \fB\-\-cap\-input\fR 4 .IX Item "--cap-input" Decode hex encoded characters. .IP \fB\-\-url\-input\fR 4 .IX Item "--url-input" Unescape percent escaped characters. .IP \fB\-\-numchar\-input\fR 4 .IX Item "--numchar-input" Decode character reference, such as "&#....;". .IP \fB\-\-\fR 4 .IX Item "--" Ignore rest of \-option. .SH AUTHOR .IX Header "AUTHOR" Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA). .PP Copyright (c) 1996\-2018, The nkf Project.