Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UTF(3)			   Library Functions Manual			UTF(3)

NAME
       runetochar,  chartorune,	 runelen, fullrune, utflen, utfrune, utfrrune,
       utfutf -	Unicode	Text Format functionality

SYNOPSIS
       #include	<utf.h>

       int runetochar(char *cp,	Rune *rp);

       int chartorune(Rune *rp,	char *cp);

       int runelen(long	r);

       int fullrune(char *cp, int n);

       int utflen(char *s);

       int utfbytes(char *s);

       char *utfrune(char *cp, long r);

       char *utfrrune(char *cp,	long r);

       char *utfutf(char *big, char *little);

       int utf_snprintf(char *buf, size_t size,	char *format, ...);

       int utfcmp(char *s1, char *s2);

       int utfncmp(char	*s1, char *s2, int rc);

       char *utfcpy(char *dst, char *src);

       char *utfncpy(char *dst,	char *src, int nbytes);

       char *utfcat(char *src, char *append);

       char *utfncat(char *src,	char *append, int nbytes);

DESCRIPTION
       The UTF routines	are used to pack the  Unicode  text  encoding  into  a
       standard	 character  stream.   To do that effectively, ASCII characters
       form the	lowest 127 characters of UTF-8.	These  characters  are	inter-
       changeable between the two character sets.  A Rune is a Unicode charac-
       ter, defined in the header file utf.h.

       runetochar  translates  a single	Rune to	a UTF sequence and returns the
       number of bytes produced. chartorune is the inverse of  this  function,
       returning  the number of	bytes consumed.	 runelen returns the number of
       bytes in	the encoding of	a Rune.	 fullrune  checks  that	 the  first  n
       bytes of	the UTF	string cp contain a complete UTF encoding.

       utflen  returns	the  number of runes in	a UTF string.  utbytes returns
       the number of bytes in a	UTF string.  utfrune returns a pointer to  the
       first occurrence	of a rune in a UTF string.  utfrrune returns a pointer
       to  the last.  utfutf searches for the first occurrence of a UTF	string
       in another UTF string.

       utf_snprintf is a prticularly dumb implementation of snprintf  for  utf
       strings	-  it  only  interprets	 %%, %s	and %d sequences in the	format
       string, and does	no field width calculation on those.

       utfcmp compares two strings lexicographically, Rune by  Rune,  and  re-
       turns  a	value greater than 0, equal to zero, or	less than zero depend-
       ing on whether the first	UTF string is greater than, the	 same  as,  or
       less  than  the second string.  utfncmp does the	same comparison	as ut-
       fcmp, with a maximum upper bound	of rc Runes.

       utfcpy copies from source to destination, Rune by Rune, and returns its
       destination string. No bounds checking is done on the number  of	 Runes
       copied,	or  their  individual  sizes.	The  dst argument is returned.
       utfncpy copies at most nbytes bytes from	source to destination,	termi-
       nating  when a null Rune	is found in the	source.	If the number of bytes
       copied is less than nbytes, then	the destination	string is paddedf with
       null (0)	bytes. If it is	equal to or greater than nbytes, no zero bytes
       is added.  The dst argument is returned.	 utfcat	appends	the UTF	string
       append onto the UTF string src.	utfncat	appends	the UTF	string	append
       onto  the  UTF  string src, bearing in mind that	the buffer src is only
       nbytes long.

IMPLEMENTATION
       This implementation of UTF, nominally UTF-8, can	encode a null  Unicode
       character  using	 a one-byte or a two-byte encoding.  Typically,	Plan 9
       uses a one-byte encoding, whilst	Java uses a two-byte encoding.	Plan 9
       type encoding makes backwards  compatibility  much  easier,  and	 loses
       nothing	-  all	the Java functionality is there, there are no embedded
       null bytes in a UTF string, due to the encoding	of  second  and	 third
       characters, and ordinary	C strings are recognised as well, which	is not
       the case	in Java.  By default, a	one byte Null-byte encoding is used.

       UTF-8  is  defined in X/Open Company Ltd., "File	System Safe UCS	Trans-
       formation Format	(FSS_UTF)", X/Open Preliminary Specification, Document
       Number: P316, which also	appears	in ISO/IEC 10646, Annex	P.

BUGS
       Undoubtably, these are many, and	legion.

AUTHOR
       Written	by  Alistair  Crooks   (agc@amdahl.com,	  or   agc@westley.de-
       mon.co.uk), from	a draft	document written by Rob	Pike and Ken Thompson,
       detailing the implementation of UTF in the Plan 9 operating system.

									UTF(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=utf&sektion=3&manpath=FreeBSD+Ports+15.0>

home | help