NAME¶
M-text - M-text objects and API for them.
Typedefs¶
typedef struct
MText MText
Type of M-texts.
Enumerations¶
enum
MTextFormat {
MTEXT_FORMAT_US_ASCII,
MTEXT_FORMAT_UTF_8,
MTEXT_FORMAT_UTF_16LE,
MTEXT_FORMAT_UTF_16BE,
MTEXT_FORMAT_UTF_32LE,
MTEXT_FORMAT_UTF_32BE,
MTEXT_FORMAT_MAX }
Enumeration for specifying the format of an M-text. enum
MTextLineBreakOption {
MTEXT_LBO_SP_CM = 1,
MTEXT_LBO_KOREAN_SP = 2,
MTEXT_LBO_AI_AS_ID = 4,
MTEXT_LBO_MAX }
Enumeration for specifying a set of line breaking option.
Functions¶
int
mtext_line_break (
MText *mt, int pos, int option, int *after)
Find a linebreak postion of an M-text. MText *
mtext ()
Allocate a new M-text. MText *
mtext_from_data (const void
*data, int nitems, enum
MTextFormat format)
Allocate a new M-text with specified data. void *
mtext_data
(
MText *mt, enum
MTextFormat *fmt, int *nunits, int *pos_idx,
int *unit_idx)
Get information about the text data in M-text. int
mtext_len
(
MText *mt)
Number of characters in M-text. int
mtext_ref_char (
MText
*mt, int pos)
Return the character at the specified position in an M-text. int
mtext_set_char (
MText *mt, int pos, int c)
Store a character into an M-text. MText *
mtext_cat_char
(
MText *mt, int c)
Append a character to an M-text. MText *
mtext_dup
(
MText *mt)
Create a copy of an M-text. MText *
mtext_cat
(
MText *mt1,
MText *mt2)
Append an M-text to another. MText *
mtext_ncat
(
MText *mt1,
MText *mt2, int n)
Append a part of an M-text to another. MText *
mtext_cpy
(
MText *mt1,
MText *mt2)
Copy an M-text to another. MText *
mtext_ncpy
(
MText *mt1,
MText *mt2, int n)
Copy the first some characters in an M-text to another. MText *
mtext_duplicate (
MText *mt, int from, int to)
Create a new M-text from a part of an existing M-text. MText *
mtext_copy (
MText *mt1, int pos,
MText *mt2, int from,
int to)
Copy characters in the specified range into an M-text. int
mtext_del (
MText *mt, int from, int to)
Delete characters in the specified range destructively. int
mtext_ins (
MText *mt1, int pos,
MText *mt2)
Insert an M-text into another M-text. int
mtext_insert
(
MText *mt1, int pos,
MText *mt2, int from, int to)
Insert sub-text of an M-text into another M-text. int
mtext_ins_char (
MText *mt, int pos, int c, int n)
Insert a character into an M-text. int
mtext_replace
(
MText *mt1, int from1, int to1,
MText *mt2, int from2, int to2)
Replace sub-text of M-text with another. int
mtext_character
(
MText *mt, int from, int to, int c)
Search a character in an M-text. int
mtext_chr (
MText *mt,
int c)
Return the position of the first occurrence of a character in an M-text.
int
mtext_rchr (
MText *mt, int c)
Return the position of the last occurrence of a character in an M-text.
int
mtext_cmp (
MText *mt1,
MText *mt2)
Compare two M-texts character-by-character. int
mtext_ncmp
(
MText *mt1,
MText *mt2, int n)
Compare initial parts of two M-texts character-by-character. int
mtext_compare (
MText *mt1, int from1, int to1,
MText
*mt2, int from2, int to2)
Compare specified regions of two M-texts. int
mtext_spn
(
MText *mt,
MText *accept)
Search an M-text for a set of characters. int
mtext_cspn
(
MText *mt,
MText *reject)
Search an M-text for the complement of a set of characters. int
mtext_pbrk (
MText *mt,
MText *accept)
Search an M-text for any of a set of characters. MText *
mtext_tok (
MText *mt,
MText *delim, int *pos)
Look for a token in an M-text. int
mtext_text (
MText *mt1,
int pos,
MText *mt2)
Locate an M-text in another. int
mtext_search (
MText *mt1,
int from, int to,
MText *mt2)
Locate an M-text in a specific range of another. int
mtext_casecmp (
MText *mt1,
MText *mt2)
Compare two M-texts ignoring cases. int
mtext_ncasecmp
(
MText *mt1,
MText *mt2, int n)
Compare initial parts of two M-texts ignoring cases. int
mtext_case_compare (
MText *mt1, int from1, int to1,
MText *mt2, int from2, int to2)
Compare specified regions of two M-texts ignoring cases. int
mtext_lowercase (
MText *mt)
Lowercase an M-text. int
mtext_titlecase (
MText *mt)
Titlecase an M-text. int
mtext_uppercase (
MText *mt)
Uppercase an M-text.
Variables¶
MSymbol Mlanguage
Variables: Default Endian of UTF-16 and UTF-32¶
enum
MTextFormat MTEXT_FORMAT_UTF_16
Variable of value MTEXT_FORMAT_UTF_16LE or MTEXT_FORMAT_UTF_16BE. const
int
MTEXT_FORMAT_UTF_32
Variable of value MTEXT_FORMAT_UTF_32LE or MTEXT_FORMAT_UTF_32BE.
Detailed Description¶
M-text objects and API for them.
In the m17n library, text is represented as an object called
M-text
rather than as a C-string (char * or unsigned char *). An M-text is a sequence
of characters whose length is equals to or more than 0, and can be coined from
various character sources, e.g. C-strings, files, character codes, etc.
M-texts are more useful than C-strings in the following points.
- •
- M-texts can handle mixture of characters of various
scripts, including all Unicode characters and more. This is an
indispensable facility when handling multilingual text.
- •
- Each character in an M-text can have properties called
text properties. Text properties store various kinds of
information attached to parts of an M-text to provide application programs
with a unified view of those information. As rich information can be
stored in M-texts in the form of text properties, functions in application
programs can be simple.
In addition, the library provides many functions to manipulate an M-text just
the same way as a C-string.
Typedef Documentation¶
typedef struct MText MText¶
Type of
M-texts. The type
MText is for an
M-text object.
Its internal structure is concealed from application programs.
Enumeration Type Documentation¶
enum MTextFormat¶
Enumeration for specifying the format of an M-text. The enum
MTextFormat
is used as an argument of the
mtext_from_data() function to specify the
format of data from which an M-text is created.
Enumerator:
- MTEXT_FORMAT_US_ASCII
- US-ASCII encoding
- MTEXT_FORMAT_UTF_8
- UTF-8 encoding
- MTEXT_FORMAT_UTF_16LE
- UTF-16LE encoding
- MTEXT_FORMAT_UTF_16BE
- UTF-16BE encoding
- MTEXT_FORMAT_UTF_32LE
- UTF-32LE encoding
- MTEXT_FORMAT_UTF_32BE
- UTF-32BE encoding
- MTEXT_FORMAT_MAX
-
enum MTextLineBreakOption¶
Enumeration for specifying a set of line breaking option. The enum
MTextLineBreakOption is to control the line breaking algorithm of the
function
mtext_line_break() by specifying logical-or of the members in
the arg
option.
Enumerator:
- MTEXT_LBO_SP_CM
- Specify the legacy support for space character as base for
combining marks. See the section 8.3 of UAX#14.
- MTEXT_LBO_KOREAN_SP
- Specify to use space characters for line breaking Korean
text.
- MTEXT_LBO_AI_AS_ID
- Specify to treat characters of ambiguous line-breaking
class as of ideographic line-breaking class.
- MTEXT_LBO_MAX
-
Variable Documentation¶
enum MTextFormat MTEXT_FORMAT_UTF_16¶
Variable of value MTEXT_FORMAT_UTF_16LE or MTEXT_FORMAT_UTF_16BE. The global
variable
MTEXT_FORMAT_UTF_16 is initialized to
MTEXT_FORMAT_UTF_16LE on a 'Little Endian' system (storing words with
the least significant byte first), and to
MTEXT_FORMAT_UTF_16BE on a
'Big Endian' system (storing words with the most significant byte first).
SEE ALSO
mtext_from_data()
const int MTEXT_FORMAT_UTF_32¶
Variable of value MTEXT_FORMAT_UTF_32LE or MTEXT_FORMAT_UTF_32BE. The global
variable
MTEXT_FORMAT_UTF_32 is initialized to
MTEXT_FORMAT_UTF_32LE on a 'Little Endian' system (storing words with
the least significant byte first), and to
MTEXT_FORMAT_UTF_32BE on a
'Big Endian' system (storing words with the most significant byte first).
SEE ALSO
mtext_from_data()
MSymbol Mlanguage The symbol whose name is
'language'.¶
Author¶
Generated automatically by Doxygen for The m17n Library from the source code.
COPYRIGHT¶
Copyright (C) 2001 Information-technology Promotion Agency (IPA)
Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and
Technology (AIST)
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License
<
http://www.gnu.org/licenses/fdl.html>.