.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.28) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Net::IDN::Standards 3pm" .TH Net::IDN::Standards 3pm "2014-09-17" "perl v5.20.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Net::IDN::Standards \-\- Internationalized Domain Names for Applications (IDNA) .SH "INTRODUCTION" .IX Header "INTRODUCTION" Historically, domain names and host names were restricted to a limited repertoire of \s-1ASCII\s0 characters, i.e. letters, digits and the hypen (i.e. \f(CW\*(C`/[A\-Z0\-9\-]/i\*(C'\fR). Words and names from langauges that require additional characters (such as diacritics or special characters) or other scripts could not be used. .PP Internationalizied Domain Names (IDNs) extend the character repertoire for domain names from \s-1ASCII\s0 to Unicode while maintaining backwards compatibility with software that only expects and handles \s-1ASCII\s0 characters. .PP In order to do so, Unicode domain names are converted to \s-1ASCII\s0 using an ASCII-compatible encoding (\s-1ACE\s0) called Punycode. On the wire, converted domain names start with \f(CW\*(C`xn\-\-\*(C'\fR, followed by the \&\s-1ASCII\s0 encoding of the Unicode string. The Unicode version is typically only shown in applications presenting the domain to the user (hence Internationalized Domain Names for Applications, \&\s-1IDNA\s0). Internationalized Resource Identifiers (IRIs), the Unicode version of URLs, may also include domain names in their Unicode form. .PP The \s-1IDNA\s0 specifications, however, do not only cover the actual Punycode conversion but also include extensive rules for preparation (mapping and/or validation) of input strings. They typically define two functions, \f(CW\*(C`ToASCII\*(C'\fR and \f(CW\*(C`ToUnicode\*(C'\fR, which prepare and convert a domain name to the \s-1ACE\s0 version or the Unicode version. .SH "DIFFERENT STANDARDS" .IX Header "DIFFERENT STANDARDS" .Vb 3 \& "The nice thing about standards is that you have so many to \& choose from." \& \-\- Andrew S. Tanenbaum .Ve .PP While the actual Punycode conversion is stable, there are different specifications regarding mapping and/or validation (preparation): .SS "\s-1IDNA2003\s0" .IX Subsection "IDNA2003" \&\s-1IDNA2003,\s0 which is defined in \s-1RFC\s0\ 3490 () and related documents, was the original specification for the internationalization of domain names. .PP However, some issues were subsequently identified with \s-1IDNA2003:\s0 The specification was tied to Unicode\ 3.2 and therefore did not allow characters added in newer versions of Unicode (without updating the specifications). .PP Furthermore, a few characters were mapped to other characters or deleted although they would carry meaning in some languages (i.e. \&'\*8' and 'X' were mapped to 'ss' and 'X'; \s-1ZWJ\s0 and \s-1ZWNJ\s0 were always mapped to nothing, although some scripts like Arabic require them for correct display). .SS "\s-1IDNA2008\s0" .IX Subsection "IDNA2008" \&\s-1IDNA2008,\s0 which is defined in \s-1RFC\s0\ 5890 () and related documents, resolves the issues found in \s-1IDNA2003.\s0 .PP This was done by allowing some characters that would either be mapped to other characters, mapped to zero and/or cause the preparation to fail. The new domain names would not be accessible by \s-1IDNA2003\s0 implementations, of course. .PP However, \s-1IDNA2008\s0 also disallowed a large number of characters that had been allowed in \s-1IDNA2003 \s0(mostly symbols). An implementation of \s-1IDNA2008\s0 would therefore no longer be able to access domain names such as \f(CW\*(C`X.com\*(C'\fR, which had been registered under \s-1IDNA2003.\s0 .SS "\s-1UTS\s0\ #46" .IX Subsection "UTS#46" Unicode Technical Standard #46 (\s-1UTS\s0\ #46, ) solves this problem by allowing domain names that are valid in either \s-1IDNA2003\s0 or \&\s-1IDNA2008.\s0 .PP This makes \s-1UTS\s0\ #46 the perfect fit for domain lookup (be liberal in what you accept) but unsuitable for validating domain names prior to registration (be conservative in what you send). .SH "AUTHOR" .IX Header "AUTHOR" Claus Fa\*:rber