.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.32 .\" .\" Standard preamble: .\" ======================================================================== .de Sh \" Subsection heading .br .if t .Sp .ne 5 .PP \fB\\$1\fR .PP .. .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .\" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .hy 0 .if n .na .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Jcode 3pm" .TH Jcode 3pm "2005-02-19" "perl v5.8.8" "User Contributed Perl Documentation" .SH "NAME" Jcode \- Japanese Charset Handler .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 6 \& use Jcode; \& # \& # traditional \& Jcode::convert(\e$str, $ocode, $icode, "z"); \& # or OOP! \& print Jcode\->new($str)\->h2z\->tr($from, $to)\->utf8; .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\fB .PP Jcode.pm supports both object and traditional approach. With object approach, you can go like; .PP .Vb 1 \& $iso_2022_jp = Jcode\->new($str)\->h2z\->jis; .Ve .PP Which is more elegant than: .PP .Vb 2 \& $iso_2022_jp = $str; \& &jcode::convert(\e$iso_2022_jp, 'jis', &jcode::getcode(\e$str), "z"); .Ve .PP For those unfamiliar with objects, Jcode.pm still supports \f(CW\*(C`getcode()\*(C'\fR and \f(CW\*(C`convert().\*(C'\fR .PP If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the standard charset handler module for Perl 5.8 or later. .SH "Methods" .IX Header "Methods" Methods mentioned here all return Jcode object unless otherwise mentioned. .Sh "Constructors" .IX Subsection "Constructors" .ie n .IP "$j = Jcode\->new($str [, $icode])" 2 .el .IP "$j = Jcode\->new($str [, \f(CW$icode\fR])" 2 .IX Item "$j = Jcode->new($str [, $icode])" Creates Jcode object \f(CW$j\fR from \f(CW$str\fR. Input code is automatically checked unless you explicitly set \f(CW$icode\fR. For available charset, see getcode below. .Sp For perl 5.8.1 or better, \f(CW$icode\fR can be \fIany encoding name\fR that Encode understands. .Sp .Vb 1 \& $j = Jcode\->new($european, 'iso\-latin1'); .Ve .Sp When the object is stringified, it returns the EUC-converted string so you can instead of euc>. .RS 2 .IP "Passing Reference" 2 .IX Item "Passing Reference" Instead of scalar value, You can use reference as .Sp Jcode\->new(\e$str); .Sp This saves time a little bit. In exchange of the value of \f(CW$str\fR being converted. (In a way, \f(CW$str\fR is now \*(L"tied\*(R" to jcode object). .RE .RS 2 .RE .ie n .IP "$j\->set($str [, $icode])" 2 .el .IP "$j\->set($str [, \f(CW$icode\fR])" 2 .IX Item "$j->set($str [, $icode])" Sets \f(CW$j\fR's internal string to \f(CW$str\fR. Handy when you use Jcode object repeatedly (saves time and memory to create object). .Sp .Vb 6 \& # converts mailbox to SJIS format \& my $jconv = new Jcode; \& $/ = 00; \& while(<>){ \& print $jconv\->set(\e$_)\->mime_decode\->sjis; \& } .Ve .ie n .IP "$j\->append($str [, $icode]);" 2 .el .IP "$j\->append($str [, \f(CW$icode\fR]);" 2 .IX Item "$j->append($str [, $icode]);" Appends \f(CW$str\fR to \f(CW$j\fR's internal string. .ie n .IP "$j = jcode($str [, $icode]);" 2 .el .IP "$j = jcode($str [, \f(CW$icode\fR]);" 2 .IX Item "$j = jcode($str [, $icode]);" shortcut for Jcode\->\fInew()\fR so you can go like; .Sh "Encoded Strings" .IX Subsection "Encoded Strings" In general, you can retrieve \fIencoded\fR string as \f(CW$j\fR\->\fIencoded\fR. .IP "$sjis = jcode($str)\->sjis" 2 .IX Item "$sjis = jcode($str)->sjis" .PD 0 .ie n .IP "$euc = $j\->euc" 2 .el .IP "$euc = \f(CW$j\fR\->euc" 2 .IX Item "$euc = $j->euc" .ie n .IP "$jis = $j\->jis" 2 .el .IP "$jis = \f(CW$j\fR\->jis" 2 .IX Item "$jis = $j->jis" .ie n .IP "$sjis = $j\->sjis" 2 .el .IP "$sjis = \f(CW$j\fR\->sjis" 2 .IX Item "$sjis = $j->sjis" .ie n .IP "$ucs2 = $j\->ucs2" 2 .el .IP "$ucs2 = \f(CW$j\fR\->ucs2" 2 .IX Item "$ucs2 = $j->ucs2" .ie n .IP "$utf8 = $j\->utf8" 2 .el .IP "$utf8 = \f(CW$j\fR\->utf8" 2 .IX Item "$utf8 = $j->utf8" .PD What you code is what you get :) .ie n .IP "$iso_2022_jp = $j\->iso_2022_jp" 2 .el .IP "$iso_2022_jp = \f(CW$j\fR\->iso_2022_jp" 2 .IX Item "$iso_2022_jp = $j->iso_2022_jp" Same as \f(CW\*(C`$j\->h2z\->jis\*(C'\fR. Hankaku Kanas are forcibly converted to Zenkaku. .Sp For perl 5.8.1 and better, you can also use any encoding names and aliases that Encode supports. For example: .Sp .Vb 1 \& $european = $j\->iso_latin1; # replace '\-' with '_' for names. .Ve .Sp \&\fB\s-1FYI\s0\fR: Encode::Encoder uses similar trick. .RS 2 .IP "$j\->fallback($fallback)" 2 .IX Item "$j->fallback($fallback)" For perl is 5.8.1 or better, Jcode stores the internal string in \&\s-1UTF\-8\s0. Any character that does not map to \fI\->encoding\fR are replaced with a '?', which is Encode standard. .Sp .Vb 2 \& my $unistr = "\ex{262f}"; # YIN YANG \& my $j = jcode($unistr); # $j\->euc is '?' .Ve .Sp You can change this behavior by specifying fallback like Encode. Values are the same as Encode. \f(CW\*(C`Jcode::FB_PERLQQ\*(C'\fR, \&\f(CW\*(C`Jcode::FB_XMLCREF\*(C'\fR, \f(CW\*(C`Jcode::FB_HTMLCREF\*(C'\fR are aliased to those of Encode for convenice. .Sp .Vb 3 \& print $j\->fallback(Jcode::FB_PERLQQ)\->euc; # '\ex{262f}' \& print $j\->fallback(Jcode::FB_XMLCREF)\->euc; # '☯' \& print $j\->fallback(Jcode::FB_HTMLCREF)\->euc; # '☯' .Ve .Sp The global variable \f(CW$Jcode::FALLBACK\fR stores the default fallback so you can override that by assigning the value. .Sp .Vb 1 \& $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme .Ve .RE .RS 2 .RE .ie n .IP "[@lines =] $jcode\fR\->jfold([$width, \f(CW$newline_str\fR, \f(CW$kref])" 2 .el .IP "[@lines =] \f(CW$jcode\fR\->jfold([$width, \f(CW$newline_str\fR, \f(CW$kref\fR])" 2 .IX Item "[@lines =] $jcode->jfold([$width, $newline_str, $kref])" folds lines in jcode string every \f(CW$width\fR (default: 72) where \f(CW$width\fR is the number of \*(L"halfwidth\*(R" character. Fullwidth Characters are counted as two. .Sp with a newline string spefied by \f(CW$newline_str\fR (default: \*(L"\en\*(R"). .Sp Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better. .ie n .IP "$length = $jcode\fR\->\fIjlength();" 2 .el .IP "$length = \f(CW$jcode\fR\->\fIjlength()\fR;" 2 .IX Item "$length = $jcode->jlength();" returns character length properly, rather than byte length. .Sh "Methods that use MIME::Base64" .IX Subsection "Methods that use MIME::Base64" To use methods below, you need MIME::Base64. To install, simply .PP .Vb 1 \& perl \-MCPAN \-e 'CPAN::Shell\->install("MIME::Base64")' .Ve .PP If your perl is 5.6 or better, there is no need since MIME::Base64 is bundled. .ie n .IP "$mime_header = $j\fR\->mime_encode([$lf, \f(CW$bpl])" 2 .el .IP "$mime_header = \f(CW$j\fR\->mime_encode([$lf, \f(CW$bpl\fR])" 2 .IX Item "$mime_header = $j->mime_encode([$lf, $bpl])" Converts \f(CW$str\fR to MIME-Header documented in \s-1RFC1522\s0. When \f(CW$lf\fR is specified, it uses \f(CW$lf\fR to fold line (default: \en). When \f(CW$bpl\fR is specified, it uses \f(CW$bpl\fR for the number of bytes (default: 76; this number must be smaller than 76). .Sp For Perl 5.8.1 or better, you can also encode \s-1MIME\s0 Header as: .Sp .Vb 1 \& $mime_header = $j\->MIME_Header; .Ve .Sp In which case the resulting \f(CW$mime_header\fR is MIME-B-encoded \s-1UTF\-8\s0 whereas \f(CW\*(C`$j\->mime_encode()\*(C'\fR returnes MIME-B-encoded \s-1ISO\-2022\-JP\s0. Most modern MUAs support both. .IP "$j\->mime_decode;" 2 .IX Item "$j->mime_decode;" Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you can also do the same as: .Sp .Vb 1 \& Jcode\->new($str, 'MIME\-Header') .Ve .Sh "Hankaku vs. Zenkaku" .IX Subsection "Hankaku vs. Zenkaku" .IP "$j\->h2z([$keep_dakuten])" 2 .IX Item "$j->h2z([$keep_dakuten])" Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When \f(CW$keep_dakuten\fR is set, it leaves dakuten as is (That is, \*(L"ka + dakuten\*(R" is left as is instead of being converted to \*(L"ga\*(R") .Sp You can retrieve the number of matches via \f(CW$j\fR\->nmatch; .IP "$j\->z2h" 2 .IX Item "$j->z2h" Converts X208 kana (Zenkaku) to X201 kana (Hankaku). .Sp You can retrieve the number of matches via \f(CW$j\fR\->nmatch; .Sh "Regexp emulators" .IX Subsection "Regexp emulators" To use \f(CW\*(C`\->m()\*(C'\fR and \f(CW\*(C`\->s()\*(C'\fR, you need perl 5.8.1 or better. .ie n .IP "$j\->tr($from, $to\fR, \f(CW$opt);" 2 .el .IP "$j\->tr($from, \f(CW$to\fR, \f(CW$opt\fR);" 2 .IX Item "$j->tr($from, $to, $opt);" Applies \f(CW\*(C`tr/$from/$to/\*(C'\fR on Jcode object where \f(CW$from\fR and \f(CW$to\fR are EUC-JP strings. On perl 5.8.1 or better, \f(CW$from\fR and \f(CW$to\fR can also be flagged \s-1UTF\-8\s0 strings. .Sp If \f(CW$opt\fR is set, \f(CW\*(C`tr/$from/$to/$opt\*(C'\fR is applied. \f(CW$opt\fR must be 'c', 'd' or the combination thereof. .Sp You can retrieve the number of matches via \f(CW$j\fR\->nmatch; .Sp The following methods are available only for perl 5.8.1 or better. .ie n .IP "$j\->s($patter, $replace\fR, \f(CW$opt);" 2 .el .IP "$j\->s($patter, \f(CW$replace\fR, \f(CW$opt\fR);" 2 .IX Item "$j->s($patter, $replace, $opt);" Applies \f(CW\*(C`s/$pattern/$replace/$opt\*(C'\fR. \f(CW$pattern\fR and \f(CW\*(C`replace\*(C'\fR must be in EUC-JP or flagged \s-1UTF\-8\s0. \f(CW$opt\fR are the same as regexp options. See perlre for regexp options. .Sp Like \f(CW\*(C`$j\->tr()\*(C'\fR, \f(CW\*(C`$j\->s()\*(C'\fR returns the object itself so you can nest the operation as follows; .Sp .Vb 1 \& $j\->tr("a\-z", "A\-Z")\->s("foo", "bar"); .Ve .ie n .IP "[@match = ] $j\fR\->m($pattern, \f(CW$opt);" 2 .el .IP "[@match = ] \f(CW$j\fR\->m($pattern, \f(CW$opt\fR);" 2 .IX Item "[@match = ] $j->m($pattern, $opt);" Applies \f(CW\*(C`m/$patter/$opt\*(C'\fR. Note that this method \s-1DOES\s0 \s-1NOT\s0 \s-1RETURN\s0 \&\s-1AN\s0 \s-1OBJECT\s0 so you can't chain the method like \f(CW\*(C`$j\->s()\*(C'\fR. .Sh "Instance Variables" .IX Subsection "Instance Variables" If you need to access instance variables of Jcode object, use access methods below instead of directly accessing them (That's what \s-1OOP\s0 is all about) .PP \&\s-1FYI\s0, Jcode uses a ref to array instead of ref to hash (common way) to optimize speed (Actually you don't have to know as long as you use access methods instead; Once again, that's \s-1OOP\s0) .IP "$j\->r_str" 2 .IX Item "$j->r_str" Reference to the EUC-coded String. .IP "$j\->icode" 2 .IX Item "$j->icode" Input charcode in recent operation. .IP "$j\->nmatch" 2 .IX Item "$j->nmatch" Number of matches (Used in \f(CW$j\fR\->tr, etc.) .SH "Subroutines" .IX Header "Subroutines" .IP "($code, [$nmatch]) = getcode($str)" 2 .IX Item "($code, [$nmatch]) = getcode($str)" Returns char code of \f(CW$str\fR. Return codes are as follows .Sp .Vb 7 \& ascii Ascii (Contains no Japanese Code) \& binary Binary (Not Text File) \& euc EUC\-JP \& sjis SHIFT_JIS \& jis JIS (ISO\-2022\-JP) \& ucs2 UCS2 (Raw Unicode) \& utf8 UTF8 .Ve .Sp When array context is used instead of scaler, it also returns how many character codes are found. As mentioned above, \f(CW$str\fR can be \e$str instead. .Sp \&\fBjcode.pl Users:\fR This function is 100% upper-conpatible with \&\fIjcode::getcode()\fR \*(-- well, almost; .Sp .Vb 2 \& * When its return value is an array, the order is the opposite; \& jcode::getcode() returns $nmatch first. .Ve .Sp .Vb 3 \& * jcode::getcode() returns 'undef' when the number of EUC characters \& is equal to that of SJIS. Jcode::getcode() returns EUC. for \& Jcode.pm there is no in\-betweens. .Ve .ie n .IP "Jcode::convert($str, [$ocode, $icode\fR, \f(CW$opt])" 2 .el .IP "Jcode::convert($str, [$ocode, \f(CW$icode\fR, \f(CW$opt\fR])" 2 .IX Item "Jcode::convert($str, [$ocode, $icode, $opt])" Converts \f(CW$str\fR to char code specified by \f(CW$ocode\fR. When \f(CW$icode\fR is specified also, it assumes \f(CW$icode\fR for input string instead of the one checked by \&\fIgetcode()\fR. As mentioned above, \f(CW$str\fR can be \e$str instead. .Sp \&\fBjcode.pl Users:\fR This function is 100% upper-conpatible with \&\fIjcode::convert()\fR ! .SH "BUGS" .IX Header "BUGS" For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning Jcode is subject to bugs therein. .SH "ACKNOWLEDGEMENTS" .IX Header "ACKNOWLEDGEMENTS" This package owes a lot in motivation, design, and code, to the jcode.pl for Perl4 by Kazumasa Utashiro . .PP Hiroki Ohzaki has helped me polish regexp from the very first stage of development. .PP JEncode by makamaka@donzoko.net has inspired me to integrate Encode to Jcode. He has also contributed Japanese \s-1POD\s0. .PP And folks at Jcode Mailing list . Without them, I couldn't have coded this far. .SH "SEE ALSO" .IX Header "SEE ALSO" Encode .PP Jcode::Nihongo .PP .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright 1999\-2005 Dan Kogai .PP This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.