.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "UM 3pm" .TH UM 3pm "2021-01-05" "perl v5.32.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" XML::UM \- Convert UTF\-8 strings to any encoding supported by XML::Encoding .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use XML::UM; \& \& # Set directory with .xml files that comes with XML::Encoding distribution \& # Always include the trailing slash! \& $XML::UM::ENCDIR = \*(Aq/home1/enno/perlModules/XML\-Encoding\-1.01/maps/\*(Aq; \& \& # Create the encoding routine \& my $encode = XML::UM::get_encode ( \& Encoding => \*(AqISO\-8859\-2\*(Aq, \& EncodeUnmapped => \e&XML::UM::encode_unmapped_dec); \& \& # Convert a string from UTF\-8 to the specified Encoding \& my $encoded_str = $encode\->($utf8_str); \& \& # Remove circular references for garbage collection \& XML::UM::dispose_encoding (\*(AqISO\-8859\-2\*(Aq); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module provides methods to convert \s-1UTF\-8\s0 strings to any \s-1XML\s0 encoding that XML::Encoding supports. It creates mapping routines from the .xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does install the \&.enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify \f(CW$ENCDIR\fR as in the \s-1SYNOPSIS.\s0 .PP This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps \s-1UTF\-8\s0 characters (each consisting of up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory! .PP Future implementations may parse the .enc files directly, or do the conversions entirely in \s-1XS\s0 (i.e. C code.) .SH "get_encode (Encoding => STRING, EncodeUnmapped => SUB)" .IX Header "get_encode (Encoding => STRING, EncodeUnmapped => SUB)" The central entry point to this module is the \fBXML::UM::get_encode()\fR method. It forwards the call to the global \f(CW$XML::UM::FACTORY\fR, which is defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory. .PP The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encoding file and creates a hash that maps \s-1UTF\-8\s0 characters to encoded characters. .PP The \fBget_encode()\fR method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert multi-character \s-1UTF\-8\s0 blocks to the proper encoding. .SH "dispose_encoding ($encoding_name)" .IX Header "dispose_encoding ($encoding_name)" Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user should no longer have references to the subroutines generated by \fBget_encode()\fR. .PP The parameters to the \fBget_encode()\fR method (defined as name/value pairs) are: .IP "\(bu" 4 Encoding .Sp The name of the desired encoding, e.g. '\s-1ISO\-8859\-2\s0' .IP "\(bu" 4 EncodeUnmapped (Default: \e&XML::UM::encode_unmapped_dec) .Sp Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted to decimal entity references, like '{' .Sp Use \e&XML::UM::encode_unmapped_hex for hexadecimal constants, like '«' .SH "CAVEATS" .IX Header "CAVEATS" I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in \s-1XML/UM\s0.pm near \&\f(CW%DEFAULT_ASCII_MAPPINGS\fR. .PP The encodings that expat supports by default are currently not supported, (e.g. \s-1UTF\-16, ISO\-8859\-1\s0), because there are no .enc files available for these encodings. This module needs some more work. If you have the time, please help! .SH "AUTHOR" .IX Header "AUTHOR" Original Author is Enno Derksen. .PP Send bug reports, hints, tips, suggestions to T.J Mather at <\fItjmather@tjmather.com\fR>.