'\" t .\" Title: unicode_word_break .\" Author: Sam Varshavchik .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 11/25/2020 .\" Manual: Courier Unicode Library .\" Source: Courier Unicode Library .\" Language: English .\" .TH "UNICODE_WORD_BREAK" "3" "11/25/2020" "Courier Unicode Library" "Courier Unicode Library" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" unicode_wb_init, unicode_wb_next, unicode_wb_next_cnt, unicode_wb_end, unicode_wbscan_init, unicode_wbscan_next, unicode_wbscan_end \- calculate word breaks .SH "SYNOPSIS" .sp .ft B .nf #include .fi .ft .HP \w'unicode_wb_info_t\ unicode_wb_init('u .BI "unicode_wb_info_t unicode_wb_init(int\ (*" "cb_func" ")(int,\ void\ *), void\ *" "cb_arg" ");" .HP \w'int\ unicode_wb_next('u .BI "int unicode_wb_next(unicode_wb_info_t\ " "wb" ", char32_t\ " "c" ");" .HP \w'int\ unicode_wb_next_cnt('u .BI "int unicode_wb_next_cnt(unicode_wb_info_t\ " "wb" ", const\ char32_t\ *" "cptr" ", size_t\ " "cnt" ");" .HP \w'int\ unicode_wb_end('u .BI "int unicode_wb_end(unicode_wb_info_t\ " "wb" ");" .HP \w'unicode_wbscan_info_t\ unicode_wbscan_init('u .BI "unicode_wbscan_info_t unicode_wbscan_init(void);" .HP \w'int\ unicode_wbscan_next('u .BI "int unicode_wbscan_next(unicode_wbscan_info_t\ " "wbs" ", char32_t\ " "c" ");" .HP \w'size_t\ unicode_wbscan_end('u .BI "size_t unicode_wbscan_end(unicode_wbscan_info_t\ " "wbs" ");" .SH "DESCRIPTION" .PP These functions implement the unicode word breaking algorithm\&. Invoke \fBunicode_wb_init\fR() to initialize the word breaking algorithm\&. The first parameter is a callback function\&. The second parameter is an opaque pointer\&. The callback function gets invoked with two parameters\&. The second parameter is the opaque pointer that was given to \fBunicode_wb_init\fR(); and the opaque pointer is not subject to any further interpretation by these functions\&. .PP \fBunicode_wb_init\fR() returns an opaque handle\&. Repeated invocations of \fBunicode_wb_next\fR(), passing the handle, and one unicode character defines a sequence of unicode characters over which the word breaking algorithm calculation takes place\&. \fBunicode_wb_next_cnt\fR() is a shortcut for invoking \fBunicode_wb_next\fR() repeatedly over an array cptr containing cnt unicode characters\&. .PP \fBunicode_wb_end\fR() denotes the end of the unicode character sequence\&. After the call to \fBunicode_wb_end\fR() the word breaking unicode_wb_info_t handle is no longer valid\&. .PP Between the call to \fBunicode_wb_init\fR() and \fBunicode_wb_end\fR(), the callback function gets invoked exactly once for each unicode character given to \fBunicode_wb_next\fR() or \fBunicode_wb_next_cnt\fR()\&. Usually each call to \fBunicode_wb_next\fR() results in the callback function getting invoked immediately, but it does not have to be\&. It\*(Aqs possible that a call to \fBunicode_wb_next\fR() returns without invoking the callback function, and some subsequent call to \fBunicode_wb_next\fR() (or \fBunicode_wb_end\fR()) invokes the callback function more than once, to catch things up\&. The contract is that before \fBunicode_wb_end\fR() returns, the callback function gets invoked the exact number of times as the number of characters in the unicode sequence defined by the intervening calls to \fBunicode_wb_next\fR() and \fBunicode_wb_next_cnt\fR(), unless an error occurs\&. .PP Each call to the callback function reports the calculated wordbreaking status of the corresponding character in the unicode character sequence\&. If the parameter to the callback function is non zero, a word break is permitted \fIbefore\fR the corresponding character\&. A zero value indicates that a word break is prohibited \fIbefore\fR the corresponding character\&. .PP The callback function should return 0\&. A non\-zero value indicates to the word breaking algorithm that an error has occurred\&. \fBunicode_wb_next\fR() and \fBunicode_wb_next_cnt\fR() return zero either if they never invoked the callback function, or if each call to the callback function returned zero\&. A non zero return from the callback function results in \fBunicode_wb_next\fR() and \fBunicode_wb_next_cnt\fR() immediately returning the same value\&. .PP \fBunicode_wb_end\fR() must be invoked to destroy the word breaking handle even if \fBunicode_wb_next\fR() and \fBunicode_wb_next_cnt\fR() returned an error indication\&. It\*(Aqs also possible that, under normal circumstances, \fBunicode_wb_end\fR() invokes the callback function one or more times\&. The return value from \fBunicode_wb_end\fR() has the same meaning as from \fBunicode_wb_next\fR() and \fBunicode_wb_next_cnt\fR(); however in all cases after \fBunicode_wb_end\fR() returns the line breaking handle is no longer valid\&. .SS "Word scan" .PP \fBunicode_wbscan_init\fR(), \fBunicode_wbscan_next\fR() and \fBunicode_wbscan_end\fR scan for the next word boundary in a unicode character sequence\&. \fBunicode_wbscan_init\fR() obtains a handle, then \fBunicode_wbscan_next\fR() gets repeatedly invoked to define the unicode character sequence\&. \fBunicode_wbscan_end\fR() deallocates the handle and returns the number of leading characters in the unicode character sequence up to the first word break\&. .PP A non\-0 return value from \fBunicode_wbscan_next\fR() indicates that the word boundary is already known, and any further calls to \fBunicode_wbscan_next\fR() will be ignored\&. \fBunicode_wbscan_end\fR() must still be called, to obtain the unicode character count\&. .SH "SEE ALSO" .PP \m[blue]\fBTR\-29\fR\m[]\&\s-2\u[1]\d\s+2, \fBcourier-unicode\fR(7), \fBunicode::wordbreak\fR(3), \fBunicode_convert_tocase\fR(3), \fBunicode_line_break\fR(3), \fBunicode_grapheme_break\fR(3)\&. .SH "AUTHOR" .PP \fBSam Varshavchik\fR .RS 4 Author .RE .SH "NOTES" .IP " 1." 4 TR-29 .RS 4 \%http://www.unicode.org/reports/tr29/tr29-27.html .RE