Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UNICODE_LINE_BREAK(3)	    Courier Unicode Library	 UNICODE_LINE_BREAK(3)

NAME
       unicode_line_break, unicode_lb_init, unicode_lb_set_opts,
       unicode_lb_next,	unicode_lb_next_cnt, unicode_lb_end, unicode_lbc_init,
       unicode_lbc_set_opts, unicode_lbc_next, unicode_lbc_next_cnt,
       unicode_lbc_end - calculate mandatory or	allowed	line breaks

SYNOPSIS
       #include	<courier-unicode.h>

       unicode_lb_info_t unicode_lb_init(int (*cb_func)(int, void *),
					 void *cb_arg);

       void unicode_lb_set_opts(unicode_lb_info_t lb, int opts);

       int unicode_lb_next(unicode_lb_info_t lb, char32_t c);

       int unicode_lb_next_cnt(unicode_lb_info_t lb, const char32_t *cptr,
			       size_t cnt);

       int unicode_lb_end(unicode_lb_info_t lb);

       unicode_lbc_info_t
					   unicode_lbc_init(int	(*cb_func)(int,	char32_t, void *),
					   void	*cb_arg);

       void unicode_lbc_set_opts(unicode_lbc_info_t lb,	int opts);

       int unicode_lbc_next(unicode_lb_info_t lb, char32_t c);

       int unicode_lbc_next_cnt(unicode_lb_info_t lb, const char32_t *cptr,
				size_t cnt);

       int unicode_lbc_end(unicode_lb_info_t lb);

DESCRIPTION
       These functions implement the unicode line breaking algorithm. Invoke
       unicode_lb_init() to initialize the line	breaking algorithm. The	first
       parameter is a callback function. The second parameter is an opaque
       pointer.	The callback function gets invoked with	two parameters.	The
       first parameter is one of three values: UNICODE_LB_MANDATORY,
       UNICODE_LB_NONE,	or UNICODE_LB_ALLOWED, as described below. The second
       parameter is the	opaque pointer that was	passed to unicode_lb_init();
       the opaque pointer is not subject to any	further	interpretation by
       these functions.

       unicode_lb_init() returns an opaque handle. Repeated invocations	of
       unicode_lb_next(), passing the handle and one unicode character at a
       time, defines a sequence	of unicode characters over which the line
       breaking	algorithm calculation takes place.  unicode_lb_next_cnt() is a
       shortcut	for invoking unicode_lb_next() repeatedly over an array	cptr
       containing cnt unicode characters.

       unicode_lb_end()	denotes	the end	of the unicode character sequence.
       After the call to unicode_lb_end() the line breaking unicode_lb_info_t
       handle is no longer valid.

       Between the call	to unicode_lb_init() and unicode_lb_end(), the
       callback	function gets invoked exactly once for each unicode character
       given to	unicode_lb_next() or unicode_lb_next_cnt(). Usually each call
       to unicode_lb_next() results in the callback function getting invoked
       immediately, but	it does	not have to be.	It's possible that a call to
       unicode_lb_next() returns without invoking the callback function, and
       some subsequent call to unicode_lb_next() (or unicode_lb_end()) invokes
       the callback function more than once, to	catch up. The contract is that
       before unicode_lb_end() returns,	the callback function gets invoked the
       exact number of times as	the number of characters in the	unicode
       sequence	defined	by the intervening calls to unicode_lb_next() and
       unicode_lb_next_cnt(), unless an	error occurs.

       Each call to the	callback function reports the calculated line breaking
       status of the corresponding character in	the unicode character
       sequence:

       UNICODE_LB_MANDATORY
	   A line break	is MANDATORY before the	corresponding character.

       UNICODE_LB_NONE
	   A line break	is PROHIBITED before the corresponding character.

       UNICODE_LB_ALLOWED
	   A line break	is OPTIONAL before the corresponding character.

       The callback function should return 0. A	non-zero value indicates to
       the line	breaking algorithm that	an error has occurred.
       unicode_lb_next() and unicode_lb_next_cnt() return zero either if they
       never invoked the callback function, or if each call to the callback
       function	returned zero. A non zero return from the callback function
       results in unicode_lb_next() and	unicode_lb_next_cnt() immediately
       returning the same value.

       unicode_lb_end()	must be	invoked	to destroy the line breaking handle
       even if unicode_lb_next() and unicode_lb_next_cnt() returned an error
       indication. It's	also possible that, under normal circumstances,
       unicode_lb_end()	invokes	the callback function one or more times. The
       return value from unicode_lb_end() has the same meaning as from
       unicode_lb_next() and unicode_lb_next_cnt(); however in all cases after
       unicode_lb_end()	returns	the line breaking handle is no longer valid.

   Alternative callback	function
       unicode_lbc_init(), unicode_lbc_next(), unicode_lbc_next_cnt(),
       unicode_lbc_end() are alternative functions that	implement the same
       algorithm. The only difference is that the callback function receives
       an extra	parameter, the unicode character value to which	the line
       breaking	status applies to, passed through from the input unicode
       character sequence.

   Options
       unicode_lb_set_opts() and unicode_lbc_set_opts()	enable non-default
       options for the line breaking algorithm.	These functions	must be	called
       immediately after unicode_lb_init() or unicode_lbc_init(), and before
       any other function.  opts is a bitmask that can contain the following
       values:

       UNICODE_LB_OPT_PRBREAK
	   Enables a modified LB24 rule. This prevents plus signs, as in "C++"
	   from	breaking. This flag adds the following rules to	the LB24 rule:

			      PR x PR

			      AL x PR

				 ID x PR

       UNICODE_LB_OPT_SYBREAK
	   Tailored breaking rules for the "/" character. This prevents
	   breaking after the "/" character (think URLs); including an
	   exception to	the "x SY" rule	in LB13. This flag adds	the following
	   rules to the	LB24 rule:

			      SY x EX

			      SY x AL

			      SY x ID

				 SP / SY, which	takes precedence over "x SY".

       UNICODE_LB_OPT_DASHWJ
	   This	flag reclassifies U+2013 and U+2014 as class WJ, prohibiting
	   breaks before and after the m-dash and the n-dash unicode
	   characters.

SEE ALSO
       courier-unicode(7), unicode::linebreak(3), TR-14[1]

AUTHOR
       Sam Varshavchik
	   Author

NOTES
	1. TR-14
	   https://www.unicode.org/reports/tr14/tr14-51.html

Courier	Unicode	Library		  05/18/2024		 UNICODE_LINE_BREAK(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=unicode_lbc_set_opts&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help