Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
TICKIT_UTF8_COUNT(3)	   Library Functions Manual	  TICKIT_UTF8_COUNT(3)

NAME
       tickit_utf8_count,  tickit_utf8_countmore - count characters in Unicode
       strings

SYNOPSIS
       #include	<tickit.h>

       typedef struct {
	   size_t bytes;
	   int	  codepoints;
	   int	  graphemes;
	   int	  columns;
       } TickitStringPos;

       size_t tickit_utf8_count(const char *str, TickitStringPos *pos,
	   const TickitStringPos *limit);
       size_t tickit_utf8_countmore(const char *str, TickitStringPos *pos,
	   const TickitStringPos *limit);

       size_t tickit_utf8_ncount(const char *str, size_t len,
	   TickitStringPos *pos, const TickitStringPos *limit);
       size_t tickit_utf8_ncountmore(const char	*str, size_t len,
	   TickitStringPos *pos, const TickitStringPos *limit);

       Link with -ltickit.

DESCRIPTION
       tickit_utf8_count() counts characters  in  the  given  Unicode  string,
       which  must  be	in  UTF-8  encoding. It	starts at the beginning	of the
       string and counts forward over codepoints and  graphemes,  incrementing
       the  counters  in  pos until it reaches a limit.	It will	not go further
       than any	of the limits given by the limits structure (where  the	 value
       -1 indicates no limit of	that type). It will never split	a codepoint in
       the  middle  of	a UTF-8	sequence, nor will it split a grapheme between
       its codepoints; it is therefore possible	that the function returns  be-
       fore  any  of  the limits have been reached, if the next	whole grapheme
       would involve going past	at least one  of  the  specified  limits.  The
       function	 will also stop	when it	reaches	the end	of str.	It returns the
       total number of bytes it	has counted over.

       The bytes member	counts UTF-8 bytes which encode	individual codepoints.
       For example the Unicode character U+00E9	is encoded by two bytes	 0xc3,
       0xa9;  it  would	 increment  the	 bytes counter by 2 and	the codepoints
       counter by 1.

       The codepoints member counts individual Unicode codepoints.

       The graphemes member counts whole composed graphical clusters of	 code-
       points, where combining accents which count as individual codepoints do
       not  count  as  separate	graphemes. For example,	the codepoint sequence
       U+0065 U+0301 would increment  the  codepoint  counter  by  2  and  the
       graphemes counter by 1.

       The  columns member counts the number of	screen columns consumed	by the
       graphemes. Most graphemes consume only 1	column,	but some  are  defined
       in Unicode to consume 2.

       tickit_utf8_countmore()	is  similar  to	 tickit_utf8_count() except it
       will not	zero any of the	counters before	it  starts.  It	 can  continue
       counting	 where a previous call finished. In particular,	it will	assume
       that it is starting at the beginning of a UTF-8 sequence	that begins  a
       new  grapheme;  it will not check these facts and the behavior is unde-
       fined if	these assumptions do not hold. It will	begin  at  the	offset
       given by	pos.bytes.

       The tickit_utf8_ncount()	and tickit_utf8_ncountmore() variants are sim-
       ilar  except  that they read no more than len bytes from	the string and
       do not require it to be NUL terminated. They will still stop at	a  NUL
       byte if one is found before len bytes have been read.

       These functions will all	immediately abort if any C0 or C1 control byte
       other  than NUL is encountered, returning the value -1. In this circum-
       stance, the pos structure will still be updated with  the  progress  so
       far.

USAGE
       Typically, these	functions would	be used	either of two ways.

       When  given a value in limit.bytes (or no limit and simply using	string
       termination), tickit_utf8_count() will yield the	 width	of  the	 given
       string in terminal columns, in the pos.columns field.

       When given a value in limit.columns, tickit_utf8_count()	will yield the
       number of bytes of that string that will	consume	the given space	on the
       terminal.

RETURN VALUE
       tickit_utf8_count()  and	 tickit_utf8_countmore()  return the number of
       bytes they have skipped over this call, or -1 if	they encounter a C0 or
       C1 byte other than NUL .

SEE ALSO
       tickit_stringpos_zero(3),	      tickit_stringpos_limit_bytes(3),
       tickit_utf8_mbswidth(3),	tickit(7)

							  TICKIT_UTF8_COUNT(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=tickit_utf8_count&sektion=3&manpath=FreeBSD+Ports+15.0>

home | help