.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "NKF 3pm"
.TH NKF 3pm 2024-03-07 "perl v5.38.2" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
.SH SYNOPSIS
.IX Header "SYNOPSIS"
.SH DESCRIPTION
.IX Header "DESCRIPTION"
\&\fBNkf\fR is a yet another kanji code converter among networks, hosts and terminals.
It converts input kanji code to designated kanji code
such as ISO\-2022\-JP, Shift_JIS, EUC-JP, UTF\-8, UTF\-16 or UTF\-32.
.PP
One of the most unique faculty of \fBnkf\fR is the guess of the input kanji encodings.
It currently recognizes ISO\-2022\-JP, Shift_JIS, EUC-JP, UTF\-8, UTF\-16 and UTF\-32.
So users needn't set the input kanji code explicitly.
.PP
By default, X0201 kana is converted into X0208 kana.
For X0201 kana, SO/SI, SSO and ESC\-(\-I methods are supported.
For automatic code detection, nkf assumes no X0201 kana in Shift_JIS.
To accept X0201 in Shift_JIS, use \fB\-X\fR, \fB\-x\fR or \fB\-S\fR.
.PP
multiple options are specifed as seprate strings, such as
.PP
.Vb 1
\&  print nkf(\*(Aq\-\-ic=UTF8\-MAC\*(Aq, \*(Aq\-w\*(Aq, $string), "\en";
.Ve
.PP
except the last arguments.
.SH OPTIONS
.IX Header "OPTIONS"
.IP "\fB\-J \-S \-E \-W \-W16 \-W32 \-j \-s \-e \-w \-w16 \-w32\fR" 4
.IX Item "-J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32"
Specify input and output encodings. Upper case is input.
cf. \-\-ic and \-\-oc.
.RS 4
.IP \fB\-J\fR 4
.IX Item "-J"
ISO\-2022\-JP (JIS code).
.IP \fB\-S\fR 4
.IX Item "-S"
Shift_JIS and JIS X 0201 kana.
EUC-JP is recognized as X0201 kana. Without \fB\-x\fR flag,
JIS X 0201 Katakana (a.k.a.halfwidth kana) is converted into JIS X 0208.
If you use Windows, see Windows\-31J (CP932).
.IP \fB\-E\fR 4
.IX Item "-E"
EUC-JP.
.IP \fB\-W\fR 4
.IX Item "-W"
UTF\-8N.
.IP \fB\-W16[BL][0]\fR 4
.IX Item "-W16[BL][0]"
UTF\-16.
B or L gives whether Big Endian or Little Endian.
0 gives whther put BOM or not.
.IP \fB\-W32[BL][0]\fR 4
.IX Item "-W32[BL][0]"
UTF\-32.
B or L gives whether Big Endian or Little Endian.
0 gives whther put BOM or not.
.RE
.RS 4
.RE
.IP "\fB\-b \-u\fR" 4
.IX Item "-b -u"
Output is buffered (DEFAULT), Output is unbuffered.
.IP \fB\-t\fR 4
.IX Item "-t"
No conversion.
.IP \fB\-i[@B]\fR 4
.IX Item "-i[@B]"
Specify the escape sequence for JIS X 0208.
.RS 4
.IP \fB\-i@\fR 4
.IX Item "-i@"
Use ESC ( @. (JIS X 0208\-1978)
.IP \fB\-iB\fR 4
.IX Item "-iB"
Use ESC ( B. (JIS X 0208\-1983/1990 DEFAULT)
.RE
.RS 4
.RE
.IP \fB\-o[BJ]\fR 4
.IX Item "-o[BJ]"
Specify the escape sequence for US\-ASCII/JIS X 0201 Roman. (DEFAULT B)
.IP \fB\-r\fR 4
.IX Item "-r"
{de/en}crypt ROT13/47
.IP "\fB\-h[123] \-\-hiragana \-\-katakana \-\-katakana\-hiragana\fR" 4
.IX Item "-h[123] --hiragana --katakana --katakana-hiragana"
.RS 4
.PD 0
.IP "\fB\-h1 \-\-hiragana\fR" 4
.IX Item "-h1 --hiragana"
.PD
Katakana to Hiragana conversion.
.IP "\fB\-h2 \-\-katakana\fR" 4
.IX Item "-h2 --katakana"
Hiragana to Katakana conversion.
.IP "\fB\-h3 \-\-katakana\-hiragana\fR" 4
.IX Item "-h3 --katakana-hiragana"
Katakana to Hiragana and Hiragana to Katakana conversion.
.RE
.RS 4
.RE
.IP \fB\-T\fR 4
.IX Item "-T"
Text mode output (MS-DOS)
.IP "\fB\-f[\fR\f(BIm\fR\fB [\- \fR\f(BIn\fR\fB]]\fR" 4
.IX Item "-f[m [- n]]"
Folding on \fIm\fR length with \fIn\fR margin in a line.
Without this option, fold length is 60 and fold margin is 10.
.IP \fB\-F\fR 4
.IX Item "-F"
New line preserving line folding.
.IP \fB\-Z[0\-3]\fR 4
.IX Item "-Z[0-3]"
Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.
.RS 4
.IP "\fB\-Z \-Z0\fR" 4
.IX Item "-Z -Z0"
Convert X0208 alphabet to ASCII.
.IP \fB\-Z1\fR 4
.IX Item "-Z1"
Convert X0208 kankaku to single ASCII space.
.IP \fB\-Z2\fR 4
.IX Item "-Z2"
Convert X0208 kankaku to double ASCII spaces.
.IP \fB\-Z3\fR 4
.IX Item "-Z3"
Replacing fullwidth >, <, ", & into '&gt;', '&lt;', '&quot;', '&amp;' as in HTML.
.RE
.RS 4
.RE
.IP "\fB\-X \-x\fR" 4
.IX Item "-X -x"
With \fB\-X\fR or without this option, X0201 is converted into X0208 Kana.
With \fB\-x\fR, try to preserve X0208 kana and do not convert X0201 kana to X0208.
In JIS output, ESC\-(\-I is used. In EUC output, SS2 is used.
.IP \fB\-B[0\-2]\fR 4
.IX Item "-B[0-2]"
Assume broken JIS-Kanji input, which lost ESC.
Useful when your site is using old B\-News Nihongo patch.
.RS 4
.IP \fB\-B1\fR 4
.IX Item "-B1"
allows any chars after ESC\-( or ESC\-$.
.IP \fB\-B2\fR 4
.IX Item "-B2"
force ASCII after NL.
.RE
.RS 4
.RE
.IP \fB\-I\fR 4
.IX Item "-I"
Replacing non iso\-2022\-jp char into a geta character
(substitute character in Japanese).
.IP \fB\-m[BQN0]\fR 4
.IX Item "-m[BQN0]"
MIME ISO\-2022\-JP/ISO8859\-1 decode. (DEFAULT)
To see ISO8859\-1 (Latin\-1) \-l is necessary.
.RS 4
.IP \fB\-mB\fR 4
.IX Item "-mB"
Decode MIME base64 encoded stream. Remove header or other part before
conversion.
.IP \fB\-mQ\fR 4
.IX Item "-mQ"
Decode MIME quoted stream. '_' in quoted stream is converted to space.
.IP \fB\-mN\fR 4
.IX Item "-mN"
Non-strict decoding.
It allows line break in the middle of the base64 encoding.
.IP \fB\-m0\fR 4
.IX Item "-m0"
No MIME decode.
.RE
.RS 4
.RE
.IP \fB\-M\fR 4
.IX Item "-M"
MIME encode. Header style. All ASCII code and control characters are intact.
.RS 4
.IP \fB\-MB\fR 4
.IX Item "-MB"
MIME encode Base64 stream.
Kanji conversion is performed before encoding, so this cannot be used as a picture encoder.
.IP \fB\-MQ\fR 4
.IX Item "-MQ"
Perform quoted encoding.
.RE
.RS 4
.RE
.IP \fB\-l\fR 4
.IX Item "-l"
Input and output code is ISO8859\-1 (Latin\-1) and ISO\-2022\-JP.
\&\fB\-s\fR, \fB\-e\fR and \fB\-x\fR are not compatible with this option.
.IP "\fB\-L[uwm] \-d \-c\fR" 4
.IX Item "-L[uwm] -d -c"
Convert line breaks.
.RS 4
.IP "\fB\-Lu \-d\fR" 4
.IX Item "-Lu -d"
unix (LF)
.IP "\fB\-Lw \-c\fR" 4
.IX Item "-Lw -c"
windows (CRLF)
.IP \fB\-Lm\fR 4
.IX Item "-Lm"
mac (CR)
.Sp
Without this option, nkf doesn't convert line breaks.
.RE
.RS 4
.RE
.IP "\fB\-\-fj \-\-unix \-\-mac \-\-msdos \-\-windows\fR" 4
.IX Item "--fj --unix --mac --msdos --windows"
Convert for these systems.
.IP "\fB\-\-jis \-\-euc \-\-sjis \-\-mime \-\-base64\fR" 4
.IX Item "--jis --euc --sjis --mime --base64"
Convert to named code.
.IP "\fB\-\-jis\-input \-\-euc\-input \-\-sjis\-input \-\-mime\-input \-\-base64\-input\fR" 4
.IX Item "--jis-input --euc-input --sjis-input --mime-input --base64-input"
Assume input system
.IP "\fB\-\-ic=\fR\f(BIinput codeset\fR\fB \-\-oc=\fR\f(BIoutput codeset\fR" 4
.IX Item "--ic=input codeset --oc=output codeset"
Set the input or output codeset.
NKF supports following codesets and those codeset names are case insensitive.
.RS 4
.IP ISO\-2022\-JP 4
.IX Item "ISO-2022-JP"
a.k.a. RFC1468, 7bit JIS, JUNET
.IP "EUC-JP (eucJP-nkf)" 4
.IX Item "EUC-JP (eucJP-nkf)"
a.k.a. AT&T JIS, Japanese EUC, UJIS
.IP eucJP-ascii 4
.IX Item "eucJP-ascii"
.PD 0
.IP eucJP-ms 4
.IX Item "eucJP-ms"
.IP CP51932 4
.IX Item "CP51932"
.PD
Microsoft Version of EUC-JP.
.IP Shift_JIS 4
.IX Item "Shift_JIS"
a.k.a. SJIS, MS_Kanji
.IP Windows\-31J 4
.IX Item "Windows-31J"
a.k.a. CP932
.IP UTF\-8 4
.IX Item "UTF-8"
same as UTF\-8N
.IP UTF\-8N 4
.IX Item "UTF-8N"
UTF\-8 without BOM
.IP UTF\-8\-BOM 4
.IX Item "UTF-8-BOM"
UTF\-8 with BOM
.IP "UTF8\-MAC (input only)" 4
.IX Item "UTF8-MAC (input only)"
decomposed UTF\-8
.IP UTF\-16 4
.IX Item "UTF-16"
same as UTF\-16BE
.IP UTF\-16BE 4
.IX Item "UTF-16BE"
UTF\-16 Big Endian without BOM
.IP UTF\-16BE\-BOM 4
.IX Item "UTF-16BE-BOM"
UTF\-16 Big Endian with BOM
.IP UTF\-16LE 4
.IX Item "UTF-16LE"
UTF\-16 Little Endian without BOM
.IP UTF\-16LE\-BOM 4
.IX Item "UTF-16LE-BOM"
UTF\-16 Little Endian with BOM
.IP UTF\-32 4
.IX Item "UTF-32"
same as UTF\-32BE
.IP UTF\-32BE 4
.IX Item "UTF-32BE"
UTF\-32 Big Endian without BOM
.IP UTF\-32BE\-BOM 4
.IX Item "UTF-32BE-BOM"
UTF\-32 Big Endian with BOM
.IP UTF\-32LE 4
.IX Item "UTF-32LE"
UTF\-32 Little Endian without BOM
.IP UTF\-32LE\-BOM 4
.IX Item "UTF-32LE-BOM"
UTF\-32 Little Endian with BOM
.RE
.RS 4
.RE
.IP "\fB\-\-fb\-{skip, html, xml, perl, java, subchar}\fR" 4
.IX Item "--fb-{skip, html, xml, perl, java, subchar}"
Specify the way that nkf handles unassigned characters.
Without this option, \-\-fb\-skip is assumed.
.IP "\fB\-\-prefix=\fR\f(BIescape character\fR\f(BItarget character\fR\fB..\fR" 4
.IX Item "--prefix=escape charactertarget character.."
When nkf converts to Shift_JIS,
nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters.
1st byte of argument is the escape character and following bytes are target characters.
.IP \fB\-\-no\-cp932ext\fR 4
.IX Item "--no-cp932ext"
Handle the characters extended in CP932 as unassigned characters.
.IP \fB\-\-no\-best\-fit\-chars\fR 4
.IX Item "--no-best-fit-chars"
When Unicode to Encoded byte conversion,
don't convert characters which is not round trip safe.
When Unicode to Unicode conversion,
with this and \-x option, nkf can be used as UTF converter.
(In other words, without this and \-x option, nkf doesn't save some characters)
.Sp
When nkf converts strings that related to path, you should use this opion.
.IP \fB\-\-cap\-input\fR 4
.IX Item "--cap-input"
Decode hex encoded characters.
.IP \fB\-\-url\-input\fR 4
.IX Item "--url-input"
Unescape percent escaped characters.
.IP \fB\-\-numchar\-input\fR 4
.IX Item "--numchar-input"
Decode character reference, such as "&#....;".
.IP \fB\-\-\fR 4
.IX Item "--"
Ignore rest of \-option.
.SH AUTHOR
.IX Header "AUTHOR"
Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).
.PP
Copyright (c) 1996\-2018, The nkf Project.