.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Map 3pm" .TH Map 3pm "2020-11-09" "perl v5.32.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Unicode::Map V0.112 \- maps charsets from and to utf16 unicode .SH "SYNOPSIS" .IX Header "SYNOPSIS" .RS 4 use \fBUnicode::Map()\fR; .Sp \&\fI\f(CI$Map\fI\fR = new Unicode::Map(\*(L"\s-1ISO\-8859\-1\*(R"\s0); .Sp \&\fI\f(CI$utf16\fI\fR = \fI\f(CI$Map\fI\fR \-> to_unicode (\*(L"Hello world!\*(R"); => \f(CW$utf16\fR == \*(L"\e0H\e0e\e0l\e0l\e0o\e0 \e0w\e0o\e0r\e0l\e0d\e0!\*(R" .Sp \&\fI\f(CI$locale\fI\fR = \fI\f(CI$Map\fI\fR \-> from_unicode (\fI\f(CI$utf16\fI\fR); => \f(CW$locale\fR == \*(L"Hello world!\*(R" .RE .PP A more detailed description below. .PP 2do: short note about perl's Unicode perspectives. .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module converts strings from and to 2\-byte Unicode \s-1UCS2\s0 format. All mappings happen via 2 byte \s-1UTF16\s0 encodings, not via 1 byte \s-1UTF8\s0 encoding. To transform these use Unicode::String. .PP For historical reasons this module coexists with Unicode::Map8. Please use Unicode::Map8 unless you need to care for two byte character sets, e.g. chinese \s-1GB2312.\s0 Anyway, if you stick to the basic functionality (see documentation) you can use both modules equivalently. .PP Practically this module will disappear from earth sooner or later as Unicode mapping support needs somehow to get into perl's core. If you like to work on this field please don't hesitate contacting Gisle Aas! .PP This module can't deal directly with utf8. Use Unicode::String to convert utf8 to utf16 and vice versa. .PP Character mapping is according to the data of binary mapfiles in Unicode::Map hierarchy. Binary mapfiles can also be created with this module, enabling you to install own specific character sets. Refer to mkmapfile or file \s-1REGISTRY\s0 in the Unicode::Map hierarchy. .SH "CONVERSION METHODS" .IX Header "CONVERSION METHODS" Probably these are the only methods you will need from this module. Their usage is compatible with Unicode::Map8. .IP "new" 4 .IX Item "new" \&\fI\f(CI$Map\fI\fR = new Unicode::Map(\*(L"\s-1GB2312\-80\*(R"\s0) .Sp Returns a new Map object for \s-1GB2312\-80\s0 encoding. .IP "from_unicode" 4 .IX Item "from_unicode" \&\fI\f(CI$dest\fI\fR = \fI\f(CI$Map\fI\fR \-> from_unicode (\fI\f(CI$src\fI\fR) .Sp Creates a string in locale charset representation from utf16 encoded string \fI\f(CI$src\fI\fR. .IP "to_unicode" 4 .IX Item "to_unicode" \&\fI\f(CI$dest\fI\fR = \fI\f(CI$Map\fI\fR \-> to_unicode (\fI\f(CI$src\fI\fR) .Sp Creates a string in utf16 representation from \fI\f(CI$src\fI\fR. .IP "to8" 4 .IX Item "to8" Alias for \fIfrom_unicode\fR. For compatibility with Unicode::Map8 .IP "to16" 4 .IX Item "to16" Alias for \fIto_unicode\fR. For compatibility with Unicode::Map8 .SH "WARNINGS" .IX Header "WARNINGS" You can demand Unicode::Map to issue warnings at deprecated or incompatible usage with the constants \s-1WARN_DEFAULT, WARN_DEPRECATION\s0 or \s-1WARN_COMPATIBILITY.\s0 The latter both can be ored together. .IP "No special warnings:" 4 .IX Item "No special warnings:" \&\f(CW$Unicode::Map::WARNINGS\fR = Unicode::Map::WARN_DEFAULT .IP "Warnings for deprecated usage:" 4 .IX Item "Warnings for deprecated usage:" \&\f(CW$Unicode::Map::WARNINGS\fR = Unicode::Map::WARN_DEPRECATION .IP "Warnings for incompatible usage:" 4 .IX Item "Warnings for incompatible usage:" \&\f(CW$Unicode::Map::WARNINGS\fR = Unicode::Map::WARN_COMPATIBILITY .SH "MAINTAINANCE METHODS" .IX Header "MAINTAINANCE METHODS" \&\fINote:\fR These methods are solely for the maintainance of Unicode::Map. Using any of these methods will lead to programs incompatible with Unicode::Map8. .IP "alias" 4 .IX Item "alias" \&\fI\f(CI@list\fI\fR = \fI\f(CI$Map\fI\fR \-> alias (\fI\f(CI$csid\fI\fR) .Sp Returns a list of alias names of character set \fI\f(CI$csid\fI\fR. .IP "mapping" 4 .IX Item "mapping" \&\fI\f(CI$path\fI\fR = \fI\f(CI$Map\fI\fR \-> mapping (\fI\f(CI$csid\fI\fR) .Sp Returns the absolute path of binary character mapping for character set \&\fI\f(CI$csid\fI\fR according to \s-1REGISTRY\s0 file of Unicode::Map. .IP "id" 4 .IX Item "id" \&\fI\f(CI$real_id\fI\fR||\f(CW""\fR = \fI\f(CI$Map\fI\fR \-> id (\fI\f(CI$test_id\fI\fR) .Sp Returns a valid character set identifier \fI\f(CI$real_id\fI\fR, if \fI\f(CI$test_id\fI\fR is a valid character set name or alias name according to \s-1REGISTRY\s0 file of Unicode::Map. .IP "ids" 4 .IX Item "ids" \&\fI\f(CI@ids\fI\fR = \fI\f(CI$Map\fI\fR \-> \fBids()\fR .Sp Returns a list of all character set names defined in \s-1REGISTRY\s0 file. .IP "read_text_mapping" 4 .IX Item "read_text_mapping" \&\f(CW1\fR||\f(CW0\fR = \fI\f(CI$Map\fI\fR \-> read_text_mapping (\fI\f(CI$csid\fI\fR, \fI\f(CI$path\fI\fR, \fI\f(CI$style\fI\fR) .Sp Read a text mapping of style \fI\f(CI$style\fI\fR named \fI\f(CI$csid\fI\fR from filename \fI\f(CI$path\fI\fR. The mapping then can be saved to a file with method: write_binary_mapping. <$style> can be: .Sp .Vb 1 \& style description \& \& "unicode" A text mapping as of ftp://ftp.unicode.org/MAPPINGS/ \& "" Same as "unicode" \& "reverse" Similar to unicode, but both columns are switched \& "keld" A text mapping as of ftp://dkuug.dk/i18n/charmaps/ .Ve .IP "src" 4 .IX Item "src" \&\fI\f(CI$path\fI\fR = \fI\f(CI$Map\fI\fR \-> src (\fI\f(CI$csid\fI\fR) .Sp Returns the path of textual character mapping for character set \fI\f(CI$csid\fI\fR according to \s-1REGISTRY\s0 file of Unicode::Map. .IP "style" 4 .IX Item "style" \&\fI\f(CI$path\fI\fR = \fI\f(CI$Map\fI\fR \-> style (\fI\f(CI$csid\fI\fR) .Sp Returns the style of textual character mapping for character set \fI\f(CI$csid\fI\fR according to \s-1REGISTRY\s0 file of Unicode::Map. .IP "write_binary_mapping" 4 .IX Item "write_binary_mapping" \&\f(CW1\fR||\f(CW0\fR = \fI\f(CI$Map\fI\fR \-> write_binary_mapping (\fI\f(CI$csid\fI\fR, \fI\f(CI$path\fI\fR) .Sp Stores a mapping that has been loaded via method read_text_mapping in file \fI\f(CI$path\fI\fR. .SH "DEPRECATED METHODS" .IX Header "DEPRECATED METHODS" Some functionality is no longer promoted. .IP "noise" 4 .IX Item "noise" Deprecated! Don't use any longer. .IP "reverse_unicode" 4 .IX Item "reverse_unicode" Deprecated! Use Unicode::String::byteswap instead. .SH "BINARY MAPPINGS" .IX Header "BINARY MAPPINGS" Structure of binary Mapfiles .PP Unicode character mapping tables have sequences of sequential key and sequential value codes. This property is used to crunch the maps easily. n (0:" 4 .IX Item "
:" .Vb 1 \& offset structure value \& \& 0x00 word 0x27b8 (magic) \& 0x02 @( || ) .Ve .Sp The mapfile ends with extended mode in main stream. .IP ":" 4 .IX Item ":" .Vb 5 \& 0x00 byte != 0 charsize1 (bits) \& 0x01 byte n1 number of chars for one entry \& 0x02 byte charsize2 (bits) \& 0x03 byte n2 number of chars for one entry \& 0x04 @( || || entry occurs. .IP ":" 4 .IX Item ":" .Vb 6 \& 0x00 size=0|1|2|4 n, number of sequential characters \& size bs1 key1 \& +bs1 bs2 value1 \& +bs2 bs1 key2 \& +bs1 bs2 value2 \& ... .Ve .Sp key_val_seq ends, if either file ends (n = infinite mode) or n pairs are read. .IP ":" 4 .IX Item ":" .Vb 3 \& 0x00 byte n, number of sequential characters \& 0x01 bs1 key_start, first character of sequence \& 1+bs1 @( || ) .Ve .Sp A key sequence starts with a byte count telling how long the sequence is. It is followed by the key start code. After this comes a list of value sequences. The list of value sequences ends, if sum(m) equals n. .IP ":" 4 .IX Item ":" .Vb 2 \& 0x00 byte m, number of sequential characters \& 0x01 bs2 val_start, first character of sequence .Ve .IP ":" 4 .IX Item ":" .Vb 4 \& 0x00 byte 0 \& 0x01 byte ftype \& 0x02 byte fsize, size of following structure \& 0x03 fsize bytes something .Ve .Sp For future extensions or private use one can insert here 1..255 byte long streams. ftype can have values 30..255, values 0..29 are reserved. Modi are not fully defined now and could change. They will be explained later. .SH "TO BE DONE" .IX Header "TO BE DONE" .IP "\-" 4 Something clever, when a character has no translation. .IP "\-" 4 Direct charset \-> charset mapping. .IP "\-" 4 Better performance. .IP "\-" 4 Support for mappings according to \s-1RFC 1345.\s0 .SH "SEE ALSO" .IX Header "SEE ALSO" .IP "\-" 4 File \f(CW\*(C`REGISTRY\*(C'\fR and binary mappings in directory \f(CW\*(C`Unicode/Map\*(C'\fR of your perl library path .IP "\-" 4 \&\fBrecode\fR\|(1), \fBmap\fR\|(1), \fBmkmapfile\fR\|(1), \fBUnicode::Map\fR\|(3), \fBUnicode::Map8\fR\|(3), \&\fBUnicode::String\fR\|(3), \fBUnicode::CharName\fR\|(3), \fBmirrorMappings\fR\|(1) .IP "\-" 4 \&\s-1RFC 1345\s0 .IP "\-" 4 Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/ .IP "\-" 4 Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/ .IP "\-" 4 2do: more references .SH "AUTHOR" .IX Header "AUTHOR" Martin Schwartz <\fImartin@nacho.de\fR>