'\" t .\" Title: yaz-icu .\" Author: Index Data .\" Generator: DocBook XSL Stylesheets v1.79.1 .\" Date: 03/25/2020 .\" Manual: Commands .\" Source: YAZ 5.29.0 .\" Language: English .\" .TH "YAZ\-ICU" "1" "03/25/2020" "YAZ 5.29.0" "Commands" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" yaz-icu \- YAZ ICU utility .SH "SYNOPSIS" .HP \w'\fByaz\-icu\fR\ 'u \fByaz\-icu\fR [\-c\ \fIconfig\fR] [\-p\ \fIopt\fR] [\-s] [\-x] [infile] .SH "DESCRIPTION" .PP \fByaz\-icu\fR is a utility which demonstrates the ICU chain module of yaz\&. (yaz/icu\&.h)\&. .PP The utility can be used in two ways\&. It may read some text using an XML configuration for configuring ICU and show text analysis\&. This mode is triggered by option \-c which specifies the configuration to be used\&. The input file is read from standard input or from a file if infile is specified\&. .PP The utility may also show ICU information\&. This is triggered by option \-p\&. .SH "OPTIONS" .PP \-c \fIconfig\fR .RS 4 Specifies the file containing ICU chain configuration which is XML based\&. .RE .PP \-p \fItype\fR .RS 4 Specifies extra information to be printed about the ICU system\&. If \fItype\fR is c then ICU converters are printed\&. If \fItype\fR is l, then available locales are printed\&. If \fItype\fR is t, then available transliterators are printed\&. .RE .PP \-s .RS 4 Specifies that output should include sort key as well\&. Note that sort key differs between ICU versions\&. .RE .PP \-x .RS 4 Specifies that output should be XML based rather than "text" based\&. .RE .SH "ICU CHAIN CONFIGURATION" .PP The ICU chain configuration specifies one or more rules to convert text data into tokens\&. The configuration format is XML based\&. .PP The toplevel element must be named icu_chain\&. The icu_chain element has one required attribute locale which specifies the ICU locale to be used in the conversion steps\&. .PP The icu_chain element must include elements where each element specifies a conversion step\&. The conversion is performed in the order in which the conversion steps are specified\&. Each conversion element takes one attribute: rule which serves as argument to the conversion step\&. .PP The following conversion elements are available: .PP casemap .RS 4 Converts case (and rule specifies how): .PP l .RS 4 Lower case using ICU function u_strToLower\&. .RE .PP u .RS 4 Upper case using ICU function u_strToUpper\&. .RE .PP t .RS 4 To title using ICU function u_strToTitle\&. .RE .PP f .RS 4 Fold case using ICU function u_strFoldCase\&. .RE .sp .RE .PP display .RS 4 This is a meta step which specifies that a term/token is to be displayed\&. This term is retrieved in an application using function icu_chain_token_display (yaz/icu\&.h)\&. .RE .PP transform .RS 4 Specifies an ICU transform rule using a transliterator Identifier\&. The rule attribute is the transliterator Identifier\&. See \m[blue]\fBICU Transforms\fR\m[]\&\s-2\u[1]\d\s+2 for more information\&. .RE .PP transliterate .RS 4 Specifies a rule\-based transliterator\&. The rule attribute is the custom transformation rule to be used\&. See \m[blue]\fBICU Transforms\fR\m[]\&\s-2\u[1]\d\s+2 for more information\&. .RE .PP tokenize .RS 4 Breaks / tokenizes a string into components using ICU functions ubrk_open, ubrk_setText, \&.\&. \&. The rule is one of: .PP l .RS 4 Line\&. ICU: UBRK_LINE\&. .RE .PP s .RS 4 Sentence\&. ICU: UBRK_SENTENCE\&. .RE .PP w .RS 4 Word\&. ICU: UBRK_WORD\&. .RE .PP c .RS 4 Character\&. ICU: UBRK_CHARACTER\&. .RE .PP t .RS 4 Title\&. ICU: UBRK_TITLE\&. .RE .sp .RE .PP join .RS 4 Joins tokens into one string\&. The rule attribute is the joining string, which may be empty\&. The join conversion element was added in YAZ 4\&.2\&.49\&. .RE .SH "EXAMPLES" .PP The following command analyzes text in file text using ICU chain configuration chain\&.xml: .sp .if n \{\ .RS 4 .\} .nf cat text | yaz\-icu \-c chain\&.xml .fi .if n \{\ .RE .\} .sp The chain\&.xml might look as follows: .sp .if n \{\ .RS 4 .\} .nf .fi .if n \{\ .RE .\} .sp .SH "SEE ALSO" .PP \fByaz\fR(7) .PP \m[blue]\fBICU Home\fR\m[]\&\s-2\u[2]\d\s+2 .PP \m[blue]\fBICU Transforms\fR\m[]\&\s-2\u[1]\d\s+2 .SH "AUTHORS" .PP \fBIndex Data\fR .SH "NOTES" .IP " 1." 4 ICU Transforms .RS 4 \%http://userguide.icu-project.org/transforms/general .RE .IP " 2." 4 ICU Home .RS 4 \%http://www.icu-project.org/ .RE