.\" Copyright (C) 2001 Information-technology Promotion Agency (IPA) .\" Copyright (C) 2001-2011 .\" National Institute of Advanced Industrial Science and Technology (AIST) .\" This file is part of the m17n library documentation. .\" Permission is granted to copy, distribute and/or modify this document .\" under the terms of the GNU Free Documentation License, Version 1.2 or .\" any later version published by the Free Software Foundation; with no .\" Invariant Section, no Front-Cover Texts, .\" and no Back-Cover Texts. A copy of the license is included in the .\" appendix entitled "GNU Free Documentation License". .TH "m17nCharset" 3m17n "Mon Sep 25 2023" "Version 1.8.4" "The m17n Library" \" -*- nroff -*- .ad l .nh .SH NAME m17nCharset_\-_Cha \- \- Charset objects and API for them\&. .SH SYNOPSIS .br .PP .SS "Macros" .in +1c .ti -1c .RI "#define \fBMCHAR_INVALID_CODE\fP" .br .RI "Invalid code-point\&. " .in -1c .SS "Functions" .in +1c .ti -1c .RI "MSymbol \fBmchar_define_charset\fP (const char *name, \fBMPlist\fP *plist)" .br .ti -1c .RI "MSymbol \fBmchar_resolve_charset\fP (MSymbol symbol)" .br .RI "Resolve charset name\&. " .ti -1c .RI "int \fBmchar_list_charset\fP (MSymbol **symbols)" .br .RI "List symbols representing charsets\&. " .ti -1c .RI "int \fBmchar_decode\fP (MSymbol charset_name, unsigned code)" .br .RI "Decode a code-point\&. " .ti -1c .RI "unsigned \fBmchar_encode\fP (MSymbol charset_name, int c)" .br .RI "Encode a character code\&. " .ti -1c .RI "int \fBmchar_map_charset\fP (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg)" .br .RI "Call a function for all the characters in a specified charset\&. " .in -1c .SS "Variables" .in +1c .ti -1c .RI "MSymbol \fBMcharset\fP" .br .in -1c .SS "Variables: Symbols representing a charset\&." Each of the following symbols represents a predefined charset\&. .br .in +1c .ti -1c .RI "MSymbol \fBMcharset_ascii\fP" .br .RI "Symbol representing the charset ASCII\&. " .ti -1c .RI "MSymbol \fBMcharset_iso_8859_1\fP" .br .RI "Symbol representing the charset ISO/IEC 8859/1\&. " .ti -1c .RI "MSymbol \fBMcharset_unicode\fP" .br .RI "Symbol representing the charset Unicode\&. " .ti -1c .RI "MSymbol \fBMcharset_m17n\fP" .br .RI "Symbol representing the largest charset\&. " .ti -1c .RI "MSymbol \fBMcharset_binary\fP" .br .RI "Symbol representing the charset for ill-decoded characters\&. " .in -1c .SS "Variables: Parameter keys for mchar_define_charset()\&." These are the predefined symbols to use as parameter keys for the function \fBmchar_define_charset()\fP (which see)\&. .br .in +1c .ti -1c .RI "MSymbol \fBMmethod\fP" .br .ti -1c .RI "MSymbol \fBMdimension\fP" .br .ti -1c .RI "MSymbol \fBMmin_range\fP" .br .ti -1c .RI "MSymbol \fBMmax_range\fP" .br .ti -1c .RI "MSymbol \fBMmin_code\fP" .br .ti -1c .RI "MSymbol \fBMmax_code\fP" .br .ti -1c .RI "MSymbol \fBMascii_compatible\fP" .br .ti -1c .RI "MSymbol \fBMfinal_byte\fP" .br .ti -1c .RI "MSymbol \fBMrevision\fP" .br .ti -1c .RI "MSymbol \fBMmin_char\fP" .br .ti -1c .RI "MSymbol \fBMmapfile\fP" .br .ti -1c .RI "MSymbol \fBMparents\fP" .br .ti -1c .RI "MSymbol \fBMsubset_offset\fP" .br .ti -1c .RI "MSymbol \fBMdefine_coding\fP" .br .ti -1c .RI "MSymbol \fBMaliases\fP" .br .in -1c .SS "Variables: Symbols representing charset methods\&." These are the predefined symbols that can be a value of the \fBMmethod\fP parameter of a charset used in an argument to the \fBmchar_define_charset()\fP function\&. .PP A method specifies how code\-points and character codes are converted\&. See the documentation of the \fBmchar_define_charset()\fP function for the details\&. .br .in +1c .ti -1c .RI "MSymbol \fBMoffset\fP" .br .ti -1c .RI "MSymbol \fBMmap\fP" .br .RI "Symbol for the map type method of charset\&. " .ti -1c .RI "MSymbol \fBMunify\fP" .br .RI "Symbol for the unify type method of charset\&. " .ti -1c .RI "MSymbol \fBMsubset\fP" .br .ti -1c .RI "MSymbol \fBMsuperset\fP" .br .RI "Symbol for the superset type method of charset\&. " .in -1c .SH "Detailed Description" .PP Charset objects and API for them\&. The symbol \fCMcharset\fP\&. .PP The m17n library uses \fIcharset\fP objects to represent a coded character sets (CCS)\&. The m17n library supports many predefined coded character sets\&. r, application programs can add other charsets\&. A character can belong to multiple charsets\&. .PP The m17n library distinguishes the following three concepts: .PP .PD 0 .IP "\(bu" 2 A \fIcode\-point\fP is a number assigned by the CCS to each character\&. Code\-points may or may not be continuous\&. The type \fCunsigned\fP is used to represent a code\-point\&. An invalid code\-point is represented by the macro \fCMCHAR_INVALID_CODE\fP\&. .PP .PD 0 .IP "\(bu" 2 A \fIcharacter\fP \fIindex\fP is the canonical index of a character in a CCS\&. The character that has the character index N occupies the Nth position when all the characters in the current CCS are sorted by their code\-points\&. Character indices in a CCS are continuous and start with 0\&. .PP .PD 0 .IP "\(bu" 2 A \fIcharacter\fP \fIcode\fP is the internal representation in the m17n library of a character\&. A character code is a signed integer of 21 bits or longer\&. .PP Each charset object defines how characters are converted between code\-points and character codes\&. To \fIencode\fP means converting code\-points to character codes and to \fIdecode\fP means converting character codes to code\-points\&. .br .PP .br .PP .br .PP Any decoded M\-text has a text property whose key is the predefined symbol \fCMcharset\fP\&. The name of \fCMcharset\fP is \fC'charset'\fP\&. .br .SH "Macro Definition Documentation" .PP .SS "#define MCHAR_INVALID_CODE" .PP Invalid code\-point\&. The macro \fBMCHAR_INVALID_CODE\fP gives the invalid code\-point\&. .br .SH "Variable Documentation" .PP .SS "MSymbol Mcharset_ascii" .PP Symbol representing the charset ASCII\&. The symbol \fBMcharset_ascii\fP has name \fC'ascii'\fP and represents the charset ISO 646, USA Version X3\&.4\-1968 (ISO\-IR\-6)\&. .br .SS "MSymbol Mcharset_iso_8859_1" .PP Symbol representing the charset ISO/IEC 8859/1\&. The symbol \fBMcharset_iso_8859_1\fP has name \fC'iso\-8859\-1'\fP and represents the charset ISO/IEC 8859\-1:1998\&. .br .SS "MSymbol Mcharset_unicode" .PP Symbol representing the charset Unicode\&. The symbol \fBMcharset_unicode\fP has name \fC'unicode'\fP and represents the charset Unicode\&. .br .SS "MSymbol Mcharset_m17n" .PP Symbol representing the largest charset\&. The symbol \fBMcharset_m17n\fP has name \fC'm17n'\fP and represents the charset that contains all characters supported by the m17n library\&. .br .SS "MSymbol Mcharset_binary" .PP Symbol representing the charset for ill\-decoded characters\&. The symbol \fBMcharset_binary\fP has name \fC'binary'\fP and represents the fake charset which the decoding functions put to an M\-text as a text property when they encounter an invalid byte (sequence)\&. .br .PP See \fBCode Conversion\fP for more details\&. .br .SS "MSymbol Mmethod" .SS "MSymbol Mdimension" .SS "MSymbol Mmin_range" .SS "MSymbol Mmax_range" .SS "MSymbol Mmin_code" .SS "MSymbol Mmax_code" .SS "MSymbol Mascii_compatible" .SS "MSymbol Mfinal_byte" .SS "MSymbol Mrevision" .SS "MSymbol Mmin_char" .SS "MSymbol Mmapfile" .SS "MSymbol Mparents" .SS "MSymbol Msubset_offset" .SS "MSymbol Mdefine_coding" .SS "MSymbol Maliases" .SS "MSymbol Moffset" .PP .nf @brief Symbol for the offset type method of charset. The symbol #Moffset has the name "offset" and, when used as a value of @b Mmethod parameter of a charset, it means that the conversion of code\-points and character codes of the charset is done by this calculation: .fi .PP .PP .nf CHARACTER\-CODE = CODE\-POINT \- MIN\-CODE + MIN\-CHAR .fi .PP .PP .nf where, MIN\-CODE is a value of @b Mmin_code parameter of the charset, and MIN\-CHAR is a value of @b Mmin_char parameter. .fi .PP .SS "MSymbol Mmap" .PP Symbol for the map type method of charset\&. The symbol \fBMmap\fP has the name \fC'map'\fP and, when used as a value of \fBMmethod\fP parameter of a charset, it means that the conversion of code\-points and character codes of the charset is done by map looking up\&. The map must be given by \fBMmapfile\fP parameter\&. .br .SS "MSymbol Munify" .PP Symbol for the unify type method of charset\&. The symbol \fBMunify\fP has the name \fC'unify'\fP and, when used as a value of \fBMmethod\fP parameter of a charset, it means that the conversion of code\-points and character codes of the charset is done by map looking up and offsetting\&. The map must be given by \fBMmapfile\fP parameter\&. For this kind of charset, a unique continuous character code space for all characters is assigned\&. .PP If the map has an entry for a code\-point, the conversion is done by looking up the map\&. Otherwise, the conversion is done by this calculation: .PP .PP .nf CHARACTER\-CODE = CODE\-POINT \- MIN\-CODE + LOWEST\-CHAR\-CODE .fi .PP .PP .nf where, MIN\-CODE is a value of @b Mmin_code parameter of the charset, and LOWEST\-CHAR\-CODE is the lowest character code of the assigned code space. .fi .PP .SS "MSymbol Msubset" .PP .nf @brief Symbol for the subset type method of charset. The symbol #Msubset has the name "subset" and, when used as a value of @b Mmethod parameter of a charset, it means that the charset is a subset of a parent charset. The parent charset must be given by @b Mparents parameter. The conversion of code\-points and character codes of the charset is done conceptually by this calculation: .fi .PP .PP .nf CHARACTER\-CODE = PARENT\-CODE (CODE\-POINT) + SUBSET\-OFFSET .fi .PP .PP .nf where, PARENT\-CODE is a pseudo function that returns a character code of CODE\-POINT in the parent charset, and SUBSET\-OFFSET is a value given by @b Msubset_offset parameter. .fi .PP .SS "MSymbol Msuperset" .PP Symbol for the superset type method of charset\&. The symbol \fBMsuperset\fP has the name \fC'superset'\fP and, when used as a value of \fBMmethod\fP parameter of a charset, it means that the charset is a superset of parent charsets\&. The parent charsets must be given by \fBMparents\fP parameter\&. .br .SS "MSymbol Mcharset" .SH "Author" .PP Generated automatically by Doxygen for The m17n Library from the source code\&. .SH COPYRIGHT Copyright (C) 2001 Information\-technology Promotion Agency (IPA) .br Copyright (C) 2001\-2011 National Institute of Advanced Industrial Science and Technology (AIST) .br Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License .