Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
UTF2(5)			    BSD	File Formats Manual		       UTF2(5)

     utf2 -- Universal character set Transformation Format encoding of wide


     The UTF2 encoding has been	deprecated in favour of	UTF-8.	New applica-
     tions should not use UTF2.

     The UTF2 encoding is based	on a proposed X-Open multibyte FSS-UCS-TF
     (File System Safe Universal Character Set Transformation Format) encoding
     as	used in	Plan 9 from Bell Labs.	Although it is capable of representing
     more than 16 bits,	the current implementation is limited to 16 bits as
     defined by	the Unicode Standard.

     UTF2 representation is backwards compatible with ASCII, so	0x00-0x7f re-
     fer to the	ASCII character	set.  The multibyte encodings of wide charac-
     ters between 0x0080 and 0xffff consist entirely of	bytes whose high order
     bit is set.  The actual encoding is represented by	the following table:

     [0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb
     [0x0080 - 0x07ff] [00000bbb.bbbbbbbb] -> 110bbbbb,	10bbbbbb
     [0x0800 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb,	10bbbbbb, 10bbbbbb

     If	more than a single representation of a value exists (for example,
     0x00; 0xC0	0x80; 0xE0 0x80	0x80) the shortest representation is always
     used (but the longer ones will be correctly decoded).

     The final three encodings provided	by X-Open:

     [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
	     11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb

     [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
	     111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb

     [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
	     1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb

     which provides for	the entire proposed ISO-10646 31 bit standard are cur-
     rently not	implemented.

     mklocale(1), setlocale(3),	utf8(5)

BSD			       October 11, 2002				   BSD


Want to link to this manual page? Use this URL:

home | help