table of contents
txt(3) | AFNIX Module | txt(3) |
NAME¶
txt - standard text processing moduleSTANDARD TEXT PROCESSING MODULE¶
The Standard Text Processing module is an original implementation of an object collection dedicated to text processing. Although text scaning is the current operation perfomed in the field of text processing, the module provides also specialized object to store and index text data. Text sorting and transliteration is also part of this module. Scanning concepts# create a pattern object const pat (afnix:txt:Pattern "$d+")
pat:check "123" # true pat:match "123" # 123
afnix:txt:pattern-p pat # true
# create a balanced pattern const pat (afnix:txt:Pattern "ELEMENT" "<" ">") pat:check "<xml>" # true pat:match "<xml>" # xml
# create a balanced pattern const pat (afnix:txt:Pattern "STRING" "'" '\') pat:check "'hello'" # true pat:match "'hello'" # "hello"
# create a c-comment pattern const pat (afnix:txt:Pattern "STRING" "/*" "*/" )
# create an empty lexeme const lexm (afnix:txt:Lexeme) afnix:txt:lexeme-p lexm # true
lexm:set-value "hello" lexm:get-value # hello
# check for the source lexm:set-source "world" lexm:get-source # world # check for the source index lexm:set-index 2000 lexm:get-index # 2000
# the default scanner const scan (afnix:txt:Scanner) afnix:txt:scanner-p scan # true # the length method scan:length # 0
# create the scanner pattern const REAL ( afnix:txt:Pattern "REAL" [$d+.$d*]) const STRING ( afnix:txt:Pattern "STRING" """ '\') const INTEGER ( afnix:txt:Pattern "INTEGER" [$d+|"0x"$x+]) # add the pattern to the scanner scanner:add INTEGER REAL STRING
while (trans valid (is:valid-p)) { # try to get the lexeme trans lexm (scanner:scan is) # check for nil lexeme and print the value if (not (nil-p lexm)) (println (lexm:get-value)) # update the valid flag valid:= (and (is:valid-p) (not (nil-p lexm))) }
# create an unsorted vector const v-i (Vector 7 5 3 4 1 8 0 9 2 6) # sort the vector in place afnix:txt:sort-ascent v-i # print the vector for (e) (v) (println e)
# create a transliterate object const tl (afnix:txt:Literate) # check the object afnix:txt:literate-p tl # true
# create a transliterate object by escape const tl (afnix:txt:Literate '\')
tl:set-map '' ' '
tl:set-map '' ' '
# set the mapping characters tl:set-map '0 tl:set-map '\' ' tl:set-map 'r' tl:set-map ' ' # translate a string tl:translate "helo" # word
STANDARD TEXT PROCESSING REFERENCE¶
Patternpattern-p
Inheritance
Object
Constructors
Pattern (none)
The Pattern constructor creates an empty pattern.
Pattern (String|Regex)
The Pattern constructor creates a pattern object associated with a regular
expression. The argument can be either a string or a regular expression
object. If the argument is a string, it is converted into a regular expression
object.
Pattern (String String)
The Pattern constructor creates a balanced pattern. The first argument is the
start pattern string. The second argument is the end balanced string.
Pattern (String String Character)
The Pattern constructor creates a balanced pattern with an escape character. The
first argument is the start pattern string. The second argument is the end
balanced string. The third character is the escape character.
Pattern (String String Boolean)
The Pattern constructor creates a recursive balanced pattern. The first argument
is the start pattern string. The second argument is the end balanced
string.
Constants
REGEX
The REGEX constant indicates that the pattern is a regular expression.
BALANCED
The BALANCED constant indicates that the pattern is a balanced pattern.
RECURSIVE
The RECURSIVE constant indicates that the pattern is a recursive balanced
pattern.
Methods
check -> Boolean (String)
The check method checks the pattern against the input string. If the
verification is successful, the method returns true, false otherwise.
match -> String (String|InputStream)
The match method attempts to match an input string or an input stream. If the
matching occurs, the matching string is returned. If the input is a string,
the end of string is used as an end condition. If the input stream is used,
the end of stream is used as an end condition.
set-tag -> none (Integer)
The set-tag method sets the pattern tag. The tag can be further used inside a
scanner.
get-tag -> Integer (none)
The get-tag method returns the pattern tag.
set-name -> none (String)
The set-name method sets the pattern name. The name is symbol identifier for
that pattern.
get-name -> String (none)
The get-name method returns the pattern name.
set-regex -> none (String|Regex)
The set-regex method sets the pattern regex either with a string or with a regex
object. If the method is successfully completed, the pattern type is switched
to the REGEX type.
set-escape -> none (Character)
The set-escape method sets the pattern escape character. The escape character is
used only in balanced mode.
get-escape -> Character (none)
The get-escape method returns the escape character.
set-balanced -> none (String| String
String)
The set-balanced method sets the pattern balanced string. With one argument, the
same balanced string is used for starting and ending. With two arguments, the
first argument is the starting string and the second is the ending
string.
Lexeme
lexeme-p
Inheritance
Literal
Constructors
Lexeme (none)
The Lexeme constructor creates an empty lexeme.
Lexeme (String)
The Lexeme constructor creates a lexeme by value. The string argument is the
lexeme value.
Methods
set-tag -> none (Integer)
The set-tag method sets the lexeme tag. The tag can be further used inside a
scanner.
get-tag -> Integer (none)
The get-tag method returns the lexeme tag.
set-value -> none (String)
The set-value method sets the lexeme value. The lexeme value is generally the
result of a matching operation.
get-value -> String (none)
The get-value method returns the lexeme value.
set-index -> none (Integer)
The set-index method sets the lexeme source index. The lexeme source index can
be for instance the source line number.
get-index -> Integer (none)
The get-index method returns the lexeme source index.
set-source -> none (String)
The set-source method sets the lexeme source name. The lexeme source name can be
for instance the source file name.
get-source -> String (none)
The get-source method returns the lexeme source name.
Scanner
scanner-p
Inheritance
Object
Constructors
Scanner (none)
The Scanner constructor creates an empty scanner.
Methods
add -> none (Pattern*)
The add method adds 0 or more pattern objects to the scanner. The priority of
the pattern is determined by the order in which the patterns are added.
length -> Integer (none)
The length method returns the number of pattern objects in this scanner.
get -> Pattern (Integer)
The get method returns a pattern object by index.
check -> Lexeme (String)
The check method checks that a string is matched by the scanner and returns the
associated lexeme.
scan -> Lexeme (InputStream)
The scan method scans an input stream until a pattern is matched. When a
matching occurs, the associated lexeme is returned.
Literate
literate-p
Inheritance
Object
Constructors
Literate (none)
The Literate constructor creates a default transliteration object.
Literate (Character)
The Literate constructor creates a default transliteration object with an escape
character. The argument is the escape character.
Methods
read -> Character (InputStream)
The read method reads a character from the input stream and translate it with
the help of the mapping table. A second character might be consumed from the
stream if the first character is an escape character.
getu -> Character (InputStream)
The getu method reads a Unicode character from the input stream and translate it
with the help of the mapping table. A second character might be consumed from
the stream if the first character is an escape character.
reset -> none (none)
The reset method resets all the mapping table and install a default identity
one.
set-map -> none (Character
Character)
The set-map method set the mapping table by using a source and target character.
The first character is the source character. The second character is the
target character.
get-map -> Character (Character)
The get-map method returns the mapping character by character. The source
character is the argument.
translate -> String (String)
The translate method translate a string by transliteration and returns a new
string.
set-escape -> none (Character)
The set-escape method set the escape character.
get-escape -> Character (none)
The get-escape method returns the escape character.
set-escape-map -> none (Character
Character)
The set-escape-map method set the escape mapping table by using a source and
target character. The first character is the source character. The second
character is the target character.
get-escape-map -> Character
(Character)
The get-escape-map method returns the escape mapping character by character. The
source character is the argument.
Functions
sort-ascent -> none (Vector)
The sort-ascent function sorts in ascending order the vector argument. The
vector is sorted in place.
sort-descent -> none (Vector)
The sort-descent function sorts in descending order the vector argument. The
vector is sorted in place.
sort-lexical -> none (Vector)
The sort-lexical function sorts in lexicographic order the vector argument. The
vector is sorted in place.
2012-03-26 | AFNIX |