NAME¶
Text::Bidi - Unicode bidi algorithm using libfribidi
SYNOPSIS¶
use Text::Bidi;
$visual = log2vis($logical);
($visual, $paradir, $l2v, $v2l, $embedding) =
log2vis($logical, $paradir);
EXPORT¶
The following functions can be exported:
- •
- "log2vis()"
- •
- "caprtl_to_unicode()"
- •
- "unicode_to_caprtl()"
- •
- "get_width()"
- •
- "set_width()"
- •
- "get_reset()"
- •
- "set_reset()"
- •
- "get_clean()"
- •
- "set_clean()"
All of them can be exported together using the ":all" tag.
Description¶
This module provides basic support for the Unicode bidirectional text (Bidi)
algorithm, for displaying text consisting of both left-to-right and
right-to-left written languages (like Hebrew and Arabic.) It does so using a
swig interface file to the
libfribidi library.
Though several libfribidi functions are provided by the swig interface file, the
standard usage of this module is provided by one function, "
log2vis()", that translates a logical string into a visual one. In
addition, there are several utility functions, and some functions that
implement part of the algorithm (see "Comparison with libfribidi and
FriBidi.pm" for the reason this is needed.)
The object oriented approach¶
All functions here can be called using either a procedural or an object oriented
approach. For example, you may do either
$visual = log2vis($logical);
or
$bidi = new Text::Bidi;
$visual = $bidi->log2vis($logical);
The advantages of the second form is that it is easier to move to a sub-class,
and that two or more objects with different parameters can be used
simultaneously.
If you do sub-class this class, and want the procedural interface to use your
functions, put a line like
$Text::Bidi::GlobalClass = __PACKAGE__;
in your module.
Types and Namespaces¶
The following constants are imported from the fribidi library:
- •
- Constants of the form FRIBIDI_TYPE_FOO are available
as $Text::Bidi::Type::FOO (note that, though these are variables, they are
read-only)
- •
- Constants of the form FRIBIDI_MASK_FOO are converted
to $Text::Bidi::Mask::FOO.
- •
- Constants of the form UNI_FOO are converted to the
character they represent, and assigned to $Text::Bidi::Unicode::FOO.
In addition, the hash %Mirrored maps mirrored characters to their counter parts,
and the scalar $Mirrored is a pattern that matches one mirrored character.
Functions¶
The following functions are of interest to the user¶
new()
Create a new instance of a bidi translator. The following key-value parameters
are allowed:
- width
- The width, in characters, of the displayed string. This
affects the reordering algorithm. The default is undef, which will
assume that no line-breaking happens.
- reset
- A string of the characters that function as field (segment)
separators. The default is "\x{2029}\x{09}\x{11}", which
is (to my understanding) the unicode specification.
- clean
- If true, "log2vis()" will remove any
explicit bidi marks in the visual string, and adjust the mapping arrays
accordingly. Default is true.
These parameters can be accessed using "
get_width()",
"
set_width()" and similar functions.
get_width()
set_width()
get_reset()
set_reset()
get_clean()
set_clean()
Query or set the values of the corresponding parameters. See "
new()" for details about the parameters.
log2vis()
This function provides the main functionality. It can be called, similarly to
"fribidi_log2vis" of the
fribidi library, as follows:
($vis, $dir, $l2v, $v2l, $levels) =
log2vis($log[, $dir[, $width]])
The arguments are:
- $log
- The logical string
- $dir
- Override the base direction of the paragraph. The possible
values are $Text::Bidi::Type::RTL, $Text::Bidi::Type::LTR or
$Text::Bidi::Type::ON. The default, if not given, is
$Text::Bidi::Type::ON, which means that the direction should be determined
according to the bidi algorithm.
- $width
- The width at which the string is broken. This overrides,
and has the same meaning, as the width parameter set by "
set_width()". As with that parameter, a value of
"undef" means that no line breaking should be done.
The outputs are as follows:
- $vis
- The visual string. In scalar context, this is the only
parameter returned (and in this case the function may work slightly
faster.)
- $l2v
- An arrayref representing the map from the logical string to
the visual one, so the $i-th character of the logical string will be in
position "$l2v-"[$i]> of the visual one.
- $v2l
- The inverse function, mapping characters in the visual
string to the logical one.
- $levels
- The embedding levels - an arrayref assigning to each
character of the logical string the nesting level of text of different
directions to which it belongs. Pure left-to-right text has embedding
level 0. A character is left-to-right (within this string) iff it has even
embedding level.
Functions implementing parts of the algorithm¶
The following functions, that implement parts of the algorithm, are used by
"
log2vis()"
levels2intervals()
This function accepts an arrayref of embedding levels and returns an arrayref
that, at place $i, contains a hash of intervals (to the index of the start of
the interval it assigns the index of the end of it), such that each of them is
a maximal interval of embedding levels at least $i. For example, to the
embedding levels:
0011122111333220001
we get
[
{ 0 => 18 },
{ 2 => 14, 18 => 18 },
{ 5 => 6, 10 => 14 },
{ 10 => 12 }
]
reorder()
This function implements the reordering part of the bidi algorithm (section 3.4,
L1-L4.) The input is the logical string, the (arrayref of) embedding levels,
the base dir of the paragraph, a position in the logical string, and a length.
The default for the position is 0, and for the length till the end of the
string. The function will return the v2l mapping, the modified embedding
levels, the intervals for these levels (as computed by "
levels2intervals()") and the visual string, all for the part of
the string given by the position and the length, and assuming that the string
is broken after this segment. In scalar context, only the visual string is
returned.
invert()
Compute the inverse of a function given by an array. This is used to convert the
"$v2l" mapping to "$l2v".
Utility functions¶
The following functions are available mainly for testing. See also
Text::Bidi::CapRTL for a possibly simpler interface.
caprtl_to_unicode()
Convert a string where right-to-left text is represented by capital letters, and
bidi marks by control sequences, to a string with actual right-to-left
characters and bidi marks. The control sequences are of the form
"_C", where "C" is a character. Run
fribidi --charsetdesc CapRTL
for a description of the translation table.
unicode_to_caprtl()
Perform the inverse of "
caprtl_to_unicode()"
Comparison with libfribidi and FriBidi.pm¶
The module has mostly the same interface as FriBidi, the module written
originally with the fribidi library. The main differences are:
- •
- The function "log2vis()" in the current
implementation returns the rest of the data returned by
"fribidi_log2vis", namely, the mappings between the strings and
the embedding levels.
- •
- The translation of the logical to visual strings optionally
takes into account the display width, for the purpose of line breaks. As
far as I can see, this functionality is not available in libfribidi. For
this reason, part of the implementation of the algorithm that deals with
reordering, and is not provided as a separate function in libfribidi, is
re-implemented here.
- •
- In this implementation, "log2vis()" works
with native perl strings. Functions like "iso88598_to_unicode"
are not provided, since their functionality is provided by the Encode
module.
- •
- The paragraph direction is given by fribidi constants
rather than strings.
BUGS¶
The "
caprtl_to_unicode()" and
"
unicode_to_caprtl()" functions currently do not work,
because of what appears to be a bug in libfribidi. The details are in
<
https://bugs.freedesktop.org/show_bug.cgi?id=8040>.
SEE ALSO¶
Text::Bidi::CapRTL, Encode
The fribidi library: <
http://fribidi.org/>,
<
http://imagic.weizmann.ac.il/~dov/freesw/FriBidi/>
Swig: <
http://www.swig.org>
The unicode bidi algorithm: <
http://www.unicode.org/unicode/reports/tr9/>
AUTHOR¶
Moshe Kamensky, <mailto:kamensky@cpan.org>
COPYRIGHT & LICENSE¶
Copyright 2006 Moshe Kamensky, all rights reserved.
This program is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.