.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "utf8::all 3pm" .TH utf8::all 3pm "2022-11-20" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" utf8::all \- turn on Unicode \- all of it .SH "VERSION" .IX Header "VERSION" version 0.024 .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use utf8::all; # Turn on UTF\-8, all of it. \& \& open my $in, \*(Aq<\*(Aq, \*(Aqcontains\-utf8\*(Aq; # UTF\-8 already turned on here \& print length \*(Aqføø bār\*(Aq; # 7 UTF\-8 characters \& my $utf8_arg = shift @ARGV; # @ARGV is UTF\-8 too (only for main) .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The \f(CW\*(C`use utf8\*(C'\fR pragma tells the Perl parser to allow \s-1UTF\-8\s0 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions. .PP \&\f(CW\*(C`utf8::all\*(C'\fR goes further: .IP "\(bu" 4 \&\f(CW\*(C`charnames\*(C'\fR are imported so \f(CW\*(C`\eN{...}\*(C'\fR sequences can be used to compile Unicode characters based on names. .IP "\(bu" 4 On Perl \f(CW\*(C`v5.11.0\*(C'\fR or higher, the \f(CW\*(C`use feature \*(Aqunicode_strings\*(Aq\*(C'\fR is enabled. .IP "\(bu" 4 \&\f(CW\*(C`use feature fc\*(C'\fR and \f(CW\*(C`use feature unicode_eval\*(C'\fR are enabled on Perl \&\f(CW5.16.0\fR and higher. .IP "\(bu" 4 Filehandles are opened with \s-1UTF\-8\s0 encoding turned on by default (including \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR, and \f(CW\*(C`STDERR\*(C'\fR when \f(CW\*(C`utf8::all\*(C'\fR is used from the \f(CW\*(C`main\*(C'\fR package). Meaning that they automatically convert \s-1UTF\-8\s0 octets to characters and vice versa. If you \fIdon't\fR want \s-1UTF\-8\s0 for a particular filehandle, you'll have to set \f(CW\*(C`binmode $filehandle\*(C'\fR. .IP "\(bu" 4 \&\f(CW@ARGV\fR gets converted from \s-1UTF\-8\s0 octets to Unicode characters (when \&\f(CW\*(C`utf8::all\*(C'\fR is used from the \f(CW\*(C`main\*(C'\fR package). This is similar to the behaviour of the \f(CW\*(C`\-CA\*(C'\fR perl command-line switch (see perlrun). .IP "\(bu" 4 \&\f(CW\*(C`readdir\*(C'\fR, \f(CW\*(C`readlink\*(C'\fR, \f(CW\*(C`readpipe\*(C'\fR (including the \f(CW\*(C`qx//\*(C'\fR and backtick operators), and \f(CW\*(C`glob\*(C'\fR (including the \f(CW\*(C`<>\*(C'\fR operator) now all work with and return Unicode characters instead of (\s-1UTF\-8\s0) octets (again only when \f(CW\*(C`utf8::all\*(C'\fR is used from the \f(CW\*(C`main\*(C'\fR package). .SS "Lexical Scope" .IX Subsection "Lexical Scope" The pragma is lexically-scoped, so you can do the following if you had some reason to: .PP .Vb 10 \& { \& use utf8::all; \& open my $out, \*(Aq>\*(Aq, \*(Aqoutfile\*(Aq; \& my $utf8_str = \*(Aqføø bār\*(Aq; \& print length $utf8_str, "\en"; # 7 \& print $out $utf8_str; # out as utf8 \& } \& open my $in, \*(Aq<\*(Aq, \*(Aqoutfile\*(Aq; # in as raw \& my $text = do { local $/; <$in>}; \& print length $text, "\en"; # 10, not 7! .Ve .PP Instead of lexical scoping, you can also use \f(CW\*(C`no utf8::all\*(C'\fR to turn off the effects. .PP Note that the effect on \f(CW@ARGV\fR and the \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR, and \&\f(CW\*(C`STDERR\*(C'\fR file handles is always global and can not be undone! .SS "Enabling/Disabling Global Features" .IX Subsection "Enabling/Disabling Global Features" As described above, the default behaviour of \f(CW\*(C`utf8::all\*(C'\fR is to convert \f(CW@ARGV\fR and to open the \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR, and \f(CW\*(C`STDERR\*(C'\fR file handles with \s-1UTF\-8\s0 encoding, and override the \f(CW\*(C`readlink\*(C'\fR and \&\f(CW\*(C`readdir\*(C'\fR functions and \f(CW\*(C`glob\*(C'\fR operators when \f(CW\*(C`utf8::all\*(C'\fR is used from the \f(CW\*(C`main\*(C'\fR package. .PP If you want to disable these features even when \f(CW\*(C`utf8::all\*(C'\fR is used from the \f(CW\*(C`main\*(C'\fR package, add the option \f(CW\*(C`NO\-GLOBAL\*(C'\fR (or \&\f(CW\*(C`LEXICAL\-ONLY\*(C'\fR) to the use line. E.g.: .PP .Vb 1 \& use utf8::all \*(AqNO\-GLOBAL\*(Aq; .Ve .PP If on the other hand you want to enable these global effects even when \&\f(CW\*(C`utf8::all\*(C'\fR was used from another package than \f(CW\*(C`main\*(C'\fR, use the option \f(CW\*(C`GLOBAL\*(C'\fR on the use line: .PP .Vb 1 \& use utf8::all \*(AqGLOBAL\*(Aq; .Ve .SS "\s-1UTF\-8\s0 Errors" .IX Subsection "UTF-8 Errors" \&\f(CW\*(C`utf8::all\*(C'\fR will handle invalid code points (i.e., utf\-8 that does not map to a valid unicode \*(L"character\*(R"), as a fatal error. .PP For \f(CW\*(C`glob\*(C'\fR, \f(CW\*(C`readdir\*(C'\fR, and \f(CW\*(C`readlink\*(C'\fR, one can change this behaviour by setting the attribute \*(L"$utf8::all::UTF8_CHECK\*(R". .SH "ATTRIBUTES" .IX Header "ATTRIBUTES" .ie n .SS "$utf8::all::UTF8_CHECK" .el .SS "\f(CW$utf8::all::UTF8_CHECK\fP" .IX Subsection "$utf8::all::UTF8_CHECK" By default \f(CW\*(C`utf8::all\*(C'\fR marks decoding errors as fatal (default value for this setting is \f(CW\*(C`Encode::FB_CROAK\*(C'\fR). If you want, you can change this by setting \f(CW$utf8::all::UTF8_CHECK\fR. The value \f(CW\*(C`Encode::FB_WARN\*(C'\fR reports the encoding errors as warnings, and \f(CW\*(C`Encode::FB_DEFAULT\*(C'\fR will completely ignore them. Please see Encode for details. Note: \f(CW\*(C`Encode::LEAVE_SRC\*(C'\fR is \&\fIalways\fR enforced. .PP Important: Only controls the handling of decoding errors in \f(CW\*(C`glob\*(C'\fR, \&\f(CW\*(C`readdir\*(C'\fR, and \f(CW\*(C`readlink\*(C'\fR. .SH "INTERACTION WITH AUTODIE" .IX Header "INTERACTION WITH AUTODIE" If you use autodie, which is a great idea, you need to use at least version \fB2.12\fR, released on June 26, 2012 . Otherwise, autodie obliterates the \s-1IO\s0 layers set by the open pragma. See \s-1RT\s0 #54777 and \s-1GH\s0 #7 . .SH "BUGS" .IX Header "BUGS" Please report any bugs or feature requests on the bugtracker website . .PP When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature. .SH "COMPATIBILITY" .IX Header "COMPATIBILITY" The filesystems of Dos, Windows, and \s-1OS/2\s0 do not (fully) support \&\s-1UTF\-8.\s0 The \f(CW\*(C`readlink\*(C'\fR and \f(CW\*(C`readdir\*(C'\fR functions and \f(CW\*(C`glob\*(C'\fR operators will therefore not be replaced on these systems. .SH "SEE ALSO" .IX Header "SEE ALSO" .IP "\(bu" 4 File::Find::utf8 for fully utf\-8 aware File::Find functions. .IP "\(bu" 4 Cwd::utf8 for fully utf\-8 aware Cwd functions. .SH "AUTHORS" .IX Header "AUTHORS" .IP "\(bu" 4 Michael Schwern .IP "\(bu" 4 Mike Doherty .IP "\(bu" 4 Hayo Baan .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" This software is copyright (c) 2009 by Michael Schwern ; he originated it. .PP This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.