.\" Copyright (C) 2001 Information-technology Promotion Agency (IPA)
.\" Copyright (C) 2001-2011
.\" National Institute of Advanced Industrial Science and Technology (AIST)
.\" This file is part of the m17n library documentation.
.\" Permission is granted to copy, distribute and/or modify this document
.\" under the terms of the GNU Free Documentation License, Version 1.2 or
.\" any later version published by the Free Software Foundation; with no
.\" Invariant Section, no Front-Cover Texts,
.\" and no Back-Cover Texts. A copy of the license is included in the
.\" appendix entitled "GNU Free Documentation License".
.TH "m17nCharset" 3m17n "Mon Sep 25 2023" "Version 1.8.4" "The m17n Library" \" -*- nroff -*-
.ad l
.nh
.SH NAME
m17nCharset_\-_Cha \- \- Charset objects and API for them\&.
.SH SYNOPSIS
.br
.PP
.SS "Macros"
.in +1c
.ti -1c
.RI "#define \fBMCHAR_INVALID_CODE\fP"
.br
.RI "Invalid code-point\&. "
.in -1c
.SS "Functions"
.in +1c
.ti -1c
.RI "MSymbol \fBmchar_define_charset\fP (const char *name, \fBMPlist\fP *plist)"
.br
.ti -1c
.RI "MSymbol \fBmchar_resolve_charset\fP (MSymbol symbol)"
.br
.RI "Resolve charset name\&. "
.ti -1c
.RI "int \fBmchar_list_charset\fP (MSymbol **symbols)"
.br
.RI "List symbols representing charsets\&. "
.ti -1c
.RI "int \fBmchar_decode\fP (MSymbol charset_name, unsigned code)"
.br
.RI "Decode a code-point\&. "
.ti -1c
.RI "unsigned \fBmchar_encode\fP (MSymbol charset_name, int c)"
.br
.RI "Encode a character code\&. "
.ti -1c
.RI "int \fBmchar_map_charset\fP (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg)"
.br
.RI "Call a function for all the characters in a specified charset\&. "
.in -1c
.SS "Variables"
.in +1c
.ti -1c
.RI "MSymbol \fBMcharset\fP"
.br
.in -1c
.SS "Variables: Symbols representing a charset\&."
Each of the following symbols represents a predefined charset\&.
.br
.in +1c
.ti -1c
.RI "MSymbol \fBMcharset_ascii\fP"
.br
.RI "Symbol representing the charset ASCII\&. "
.ti -1c
.RI "MSymbol \fBMcharset_iso_8859_1\fP"
.br
.RI "Symbol representing the charset ISO/IEC 8859/1\&. "
.ti -1c
.RI "MSymbol \fBMcharset_unicode\fP"
.br
.RI "Symbol representing the charset Unicode\&. "
.ti -1c
.RI "MSymbol \fBMcharset_m17n\fP"
.br
.RI "Symbol representing the largest charset\&. "
.ti -1c
.RI "MSymbol \fBMcharset_binary\fP"
.br
.RI "Symbol representing the charset for ill-decoded characters\&. "
.in -1c
.SS "Variables: Parameter keys for mchar_define_charset()\&."
These are the predefined symbols to use as parameter keys for the function \fBmchar_define_charset()\fP (which see)\&.
.br
.in +1c
.ti -1c
.RI "MSymbol \fBMmethod\fP"
.br
.ti -1c
.RI "MSymbol \fBMdimension\fP"
.br
.ti -1c
.RI "MSymbol \fBMmin_range\fP"
.br
.ti -1c
.RI "MSymbol \fBMmax_range\fP"
.br
.ti -1c
.RI "MSymbol \fBMmin_code\fP"
.br
.ti -1c
.RI "MSymbol \fBMmax_code\fP"
.br
.ti -1c
.RI "MSymbol \fBMascii_compatible\fP"
.br
.ti -1c
.RI "MSymbol \fBMfinal_byte\fP"
.br
.ti -1c
.RI "MSymbol \fBMrevision\fP"
.br
.ti -1c
.RI "MSymbol \fBMmin_char\fP"
.br
.ti -1c
.RI "MSymbol \fBMmapfile\fP"
.br
.ti -1c
.RI "MSymbol \fBMparents\fP"
.br
.ti -1c
.RI "MSymbol \fBMsubset_offset\fP"
.br
.ti -1c
.RI "MSymbol \fBMdefine_coding\fP"
.br
.ti -1c
.RI "MSymbol \fBMaliases\fP"
.br
.in -1c
.SS "Variables: Symbols representing charset methods\&."
These are the predefined symbols that can be a value of the \fBMmethod\fP parameter of a charset used in an argument to the \fBmchar_define_charset()\fP function\&.
.PP
A method specifies how code\-points and character codes are converted\&. See the documentation of the \fBmchar_define_charset()\fP function for the details\&.
.br
.in +1c
.ti -1c
.RI "MSymbol \fBMoffset\fP"
.br
.ti -1c
.RI "MSymbol \fBMmap\fP"
.br
.RI "Symbol for the map type method of charset\&. "
.ti -1c
.RI "MSymbol \fBMunify\fP"
.br
.RI "Symbol for the unify type method of charset\&. "
.ti -1c
.RI "MSymbol \fBMsubset\fP"
.br
.ti -1c
.RI "MSymbol \fBMsuperset\fP"
.br
.RI "Symbol for the superset type method of charset\&. "
.in -1c
.SH "Detailed Description"
.PP
Charset objects and API for them\&.
The symbol \fCMcharset\fP\&.
.PP
The m17n library uses \fIcharset\fP objects to represent a coded character sets (CCS)\&. The m17n library supports many predefined coded character sets\&. r, application programs can add other charsets\&. A character can belong to multiple charsets\&.
.PP
The m17n library distinguishes the following three concepts:
.PP
.PD 0
.IP "\(bu" 2
A \fIcode\-point\fP is a number assigned by the CCS to each character\&. Code\-points may or may not be continuous\&. The type \fCunsigned\fP is used to represent a code\-point\&. An invalid code\-point is represented by the macro \fCMCHAR_INVALID_CODE\fP\&.
.PP
.PD 0
.IP "\(bu" 2
A \fIcharacter\fP \fIindex\fP is the canonical index of a character in a CCS\&. The character that has the character index N occupies the Nth position when all the characters in the current CCS are sorted by their code\-points\&. Character indices in a CCS are continuous and start with 0\&.
.PP
.PD 0
.IP "\(bu" 2
A \fIcharacter\fP \fIcode\fP is the internal representation in the m17n library of a character\&. A character code is a signed integer of 21 bits or longer\&.
.PP
Each charset object defines how characters are converted between code\-points and character codes\&. To \fIencode\fP means converting code\-points to character codes and to \fIdecode\fP means converting character codes to code\-points\&.
.br
.PP
.br
.PP
.br
.PP
Any decoded M\-text has a text property whose key is the predefined symbol \fCMcharset\fP\&. The name of \fCMcharset\fP is \fC'charset'\fP\&.
.br
.SH "Macro Definition Documentation"
.PP
.SS "#define MCHAR_INVALID_CODE"
.PP
Invalid code\-point\&. The macro \fBMCHAR_INVALID_CODE\fP gives the invalid code\-point\&.
.br
.SH "Variable Documentation"
.PP
.SS "MSymbol Mcharset_ascii"
.PP
Symbol representing the charset ASCII\&. The symbol \fBMcharset_ascii\fP has name \fC'ascii'\fP and represents the charset ISO 646, USA Version X3\&.4\-1968 (ISO\-IR\-6)\&.
.br
.SS "MSymbol Mcharset_iso_8859_1"
.PP
Symbol representing the charset ISO/IEC 8859/1\&. The symbol \fBMcharset_iso_8859_1\fP has name \fC'iso\-8859\-1'\fP and represents the charset ISO/IEC 8859\-1:1998\&.
.br
.SS "MSymbol Mcharset_unicode"
.PP
Symbol representing the charset Unicode\&. The symbol \fBMcharset_unicode\fP has name \fC'unicode'\fP and represents the charset Unicode\&.
.br
.SS "MSymbol Mcharset_m17n"
.PP
Symbol representing the largest charset\&. The symbol \fBMcharset_m17n\fP has name \fC'm17n'\fP and represents the charset that contains all characters supported by the m17n library\&.
.br
.SS "MSymbol Mcharset_binary"
.PP
Symbol representing the charset for ill\-decoded characters\&. The symbol \fBMcharset_binary\fP has name \fC'binary'\fP and represents the fake charset which the decoding functions put to an M\-text as a text property when they encounter an invalid byte (sequence)\&.
.br
.PP
See \fBCode Conversion\fP for more details\&.
.br
.SS "MSymbol Mmethod"
.SS "MSymbol Mdimension"
.SS "MSymbol Mmin_range"
.SS "MSymbol Mmax_range"
.SS "MSymbol Mmin_code"
.SS "MSymbol Mmax_code"
.SS "MSymbol Mascii_compatible"
.SS "MSymbol Mfinal_byte"
.SS "MSymbol Mrevision"
.SS "MSymbol Mmin_char"
.SS "MSymbol Mmapfile"
.SS "MSymbol Mparents"
.SS "MSymbol Msubset_offset"
.SS "MSymbol Mdefine_coding"
.SS "MSymbol Maliases"
.SS "MSymbol Moffset"
.PP
.nf
@brief Symbol for the offset type method of charset.
The symbol #Moffset has the name "offset" and, when used
as a value of @b Mmethod parameter of a charset, it means that the
conversion of code\-points and character codes of the charset is
done by this calculation:
.fi
.PP
.PP
.nf
CHARACTER\-CODE = CODE\-POINT \- MIN\-CODE + MIN\-CHAR
.fi
.PP
.PP
.nf
where, MIN\-CODE is a value of @b Mmin_code parameter of the charset,
and MIN\-CHAR is a value of @b Mmin_char parameter.
.fi
.PP
.SS "MSymbol Mmap"
.PP
Symbol for the map type method of charset\&. The symbol \fBMmap\fP has the name \fC'map'\fP and, when used as a value of \fBMmethod\fP parameter of a charset, it means that the conversion of code\-points and character codes of the charset is done by map looking up\&. The map must be given by \fBMmapfile\fP parameter\&.
.br
.SS "MSymbol Munify"
.PP
Symbol for the unify type method of charset\&. The symbol \fBMunify\fP has the name \fC'unify'\fP and, when used as a value of \fBMmethod\fP parameter of a charset, it means that the conversion of code\-points and character codes of the charset is done by map looking up and offsetting\&. The map must be given by \fBMmapfile\fP parameter\&. For this kind of charset, a unique continuous character code space for all characters is assigned\&.
.PP
If the map has an entry for a code\-point, the conversion is done by looking up the map\&. Otherwise, the conversion is done by this calculation:
.PP
.PP
.nf
CHARACTER\-CODE = CODE\-POINT \- MIN\-CODE + LOWEST\-CHAR\-CODE
.fi
.PP
.PP
.nf
where, MIN\-CODE is a value of @b Mmin_code parameter of the charset,
and LOWEST\-CHAR\-CODE is the lowest character code of the assigned
code space.
.fi
.PP
.SS "MSymbol Msubset"
.PP
.nf
@brief Symbol for the subset type method of charset.
The symbol #Msubset has the name "subset" and, when used
as a value of @b Mmethod parameter of a charset, it means that the
charset is a subset of a parent charset. The parent charset must
be given by @b Mparents parameter. The conversion of code\-points
and character codes of the charset is done conceptually by this
calculation:
.fi
.PP
.PP
.nf
CHARACTER\-CODE = PARENT\-CODE (CODE\-POINT) + SUBSET\-OFFSET
.fi
.PP
.PP
.nf
where, PARENT\-CODE is a pseudo function that returns a character
code of CODE\-POINT in the parent charset, and SUBSET\-OFFSET is a
value given by @b Msubset_offset parameter.
.fi
.PP
.SS "MSymbol Msuperset"
.PP
Symbol for the superset type method of charset\&. The symbol \fBMsuperset\fP has the name \fC'superset'\fP and, when used as a value of \fBMmethod\fP parameter of a charset, it means that the charset is a superset of parent charsets\&. The parent charsets must be given by \fBMparents\fP parameter\&.
.br
.SS "MSymbol Mcharset"
.SH "Author"
.PP
Generated automatically by Doxygen for The m17n Library from the source code\&.
.SH COPYRIGHT
Copyright (C) 2001 Information\-technology Promotion Agency (IPA)
.br
Copyright (C) 2001\-2011 National Institute of Advanced Industrial Science and Technology (AIST)
.br
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License
.