Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
std::codecvt_utf8(3)	      C++ Standard Libary	  std::codecvt_utf8(3)

NAME
       std::codecvt_utf8 - std::codecvt_utf8

Synopsis
	  Defined in header <codecvt>
	  template<

	  class	Elem,
	  unsigned	    long	  Maxcode	   =	     0x10ffff,
       (since C++11)
	  std::codecvt_mode Mode = (std::codecvt_mode)0			 (dep-
       recated in C++17)

	  > class codecvt_utf8 : public	std::codecvt<Elem, char,
	  std::mbstate_t>;

	  std::codecvt_utf8 is a std::codecvt facet which encapsulates conver-
       sion between a
	  UTF-8	encoded	byte string and	UCS2 or	UTF-32 character  string  (de-
       pending on the type
	  of  Elem).  This  codecvt  facet can be used to read and write UTF-8
       files, both text
	  and binary.

Template Parameters
	  Elem	  - either char16_t, char32_t, or wchar_t
	  Maxcode - the	largest	value of Elem that this	 facet	will  read  or
       write without error
	  Mode	  - a constant of type std::codecvt_mode

Member functions
	  constructor	constructs a new codecvt_utf8 facet
			(public	member function)
	  destructor	destroys a codecvt_utf8	facet
			(public	member function)

       std::codecvt_utf8::codecvt_utf8

	  explicit codecvt_utf8( std::size_t refs = 0 );

	  Constructs  a	new std::codecvt_utf8 facet, passes the	initial	refer-
       ence counter refs
	  to the base class.

Parameters
	  refs - the number of references that link to the facet

       std::codecvt_utf8::~codecvt_utf8

	  ~codecvt_utf8();

	  Destroys the facet. Unlike the locale-managed	facets,	 this  facet's
       destructor is
	  public.

       Inherited from std::codecvt

Member types
	  Member type Definition
	  intern_type internT
	  extern_type externT
	  state_type  stateT

Member objects
	  Member name Type
	  id (static) std::locale::id

Member functions
	  out		invokes	do_out
			(public	 member	 function  of std::codecvt<InternT,Ex-
       ternT,State>)
	  in		invokes	do_in
			(public	member	function  of  std::codecvt<InternT,Ex-
       ternT,State>)
	  unshift	invokes	do_unshift
			(public	 member	 function  of std::codecvt<InternT,Ex-
       ternT,State>)
	  encoding	invokes	do_encoding
			(public	member	function  of  std::codecvt<InternT,Ex-
       ternT,State>)
	  always_noconv	invokes	do_always_noconv
			(public	 member	 function  of std::codecvt<InternT,Ex-
       ternT,State>)
	  length	invokes	do_length
			(public	member	function  of  std::codecvt<InternT,Ex-
       ternT,State>)
	  max_length	invokes	do_max_length
			(public	 member	 function  of std::codecvt<InternT,Ex-
       ternT,State>)

Protected member functions
			   converts a string from internT to externT, such  as
       when writing to
	  do_out	   file
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)
			   converts  a string from externT to internT, such as
       when reading
	  do_in		   from	file
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)
			   generates the termination character sequence	of ex-
       ternT characters
	  do_unshift	   for incomplete conversion
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)
			   returns the number of externT characters  necessary
       to produce one
	  do_encoding	   internT character, if constant
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)
			   tests  if  the facet	encodes	an identity conversion
       for all valid
	  do_always_noconv argument values
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)
			   calculates the length of the	 externT  string  that
       would be	consumed
	  do_length	   by conversion into given internT buffer
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)
			   returns  the	 maximum  number of externT characters
       that could be
	  do_max_length	   converted into a single internT character
	  [virtual]	   (virtual protected member function of
			   std::codecvt<InternT,ExternT,State>)

       Inherited from std::codecvt_base

	  Member type				      Definition
	  enum result {	ok, partial, error,  noconv  };	 Unscoped  enumeration
       type

	  Enumeration constant Definition
	  ok		       conversion was completed	with no	error
	  partial	       not all source characters were converted
	  error		       encountered an invalid character
	  noconv		no conversion required,	input and output types
       are the same

Notes
	  Although the standard	requires that this facet works with UCS2  when
       the size	of Elem
	  is 16	bits, some implementations use UTF-16 instead. The term	"UCS2"
       was deprecated
	  and removed from the Unicode standard.

Example
	  The following	example	demonstrates the difference between UCS2/UTF-8
       and
	  UTF-16/UTF-8 conversions: the	third character	in the string is not a
       valid UCS2
	  character.

       // Run this code

	#include <iostream>
	#include <string>
	#include <locale>
	#include <codecvt>

	int main()
	{
	    // UTF-8 data. The character U+1d10b, musical sign segno, does not
       fit in UCS2
	    std::string	utf8 = u8"z\u6c34\U0001d10b";

	    // the UTF-8 / UTF-16 standard conversion facet
	    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,  char16_t>
       utf16conv;
	    std::u16string utf16 = utf16conv.from_bytes(utf8);
	    std::cout << "UTF16	conversion produced "  <<  utf16.size()	 <<  "
       code units:\n";
	    for	(char16_t c : utf16)
		std::cout << std::hex << std::showbase << c << '\n';

	    // the UTF-8 / UCS2	standard conversion facet
	    std::wstring_convert<std::codecvt_utf8<char16_t>,	     char16_t>
       ucs2conv;
	    try	{
		std::u16string ucs2 = ucs2conv.from_bytes(utf8);
	    } catch(const std::range_error& e) {
		std::u16string	 ucs2	=   ucs2conv.from_bytes(utf8.substr(0,
       ucs2conv.converted()));
		std::cout  <<  "UCS2  failed  after producing "	<< std::dec <<
       ucs2.size()<<" characters:\n";
		for (char16_t c	: ucs2)
		    std::cout << std::hex << std::showbase << c	<< '\n';
	    }
	}

Output:
	UTF16 conversion produced 4 code units:
	0x7a
	0x6c34
	0xd834
	0xdd0b
	UCS2 failed after producing 2 characters:
	0x7a
	0x6c34

See also
	    Character	     locale-defined
	   conversions		      multibyte				 UTF-8
       UTF-16
			    (UTF-8, GB18030)
			   mbrtoc16  /		 codecvt<char16_t,  char,  mb-
       state_t>
	     UTF-16	    c16rtomb(with  C11's  codecvt_utf8_utf16<char16_t>
       N/A
			   DR488)	       codecvt_utf8_utf16<char32_t>
					       codecvt_utf8_utf16<wchar_t>
			   c16rtomb(without		codecvt_utf8<char16_t>
       codecvt_utf16<char16_t>
	      UCS2	   C11's DR488)
					       codecvt_utf8<wchar_t>(Windows)
       codecvt_utf16<wchar_t>(Windows)
					       codecvt<char32_t,   char,   mb-
       state_t>	codecvt_utf16<char32_t>
	     UTF-32	      mbrtoc32	 /   c32rtomb	codecvt_utf8<char32_t>
       codecvt_utf16<wchar_t>(non-Windows)
					       codecvt_utf8<wchar_t>(non-Win-
       dows)
			   mbsrtowcs /
	  system  wide:	     wcsrtombs	UTF-32(non-Windows)  use_facet<codecvt
       No				  No
	  UCS2(Windows)	   <wchar_t, char,
			   mbstate_t>>(locale)

				       converts	 between  character encodings,
       including UTF-8,
	  codecvt		       UTF-16, UTF-32
				       (class template)
	  codecvt_mode		       tags to alter behavior of the  standard
       codecvt facets
	  (C++11)(deprecated in	C++17) (enum)
	  codecvt_utf16		       converts	between	UTF-16 and UCS2/UCS4
	  (C++11)(deprecated in	C++17) (class template)
	  codecvt_utf8_utf16	       converts	between	UTF-8 and UTF-16
	  (C++11)(deprecated in	C++17) (class template)

http://cppreference.com		  2022.07.31		  std::codecvt_utf8(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=std::codecvt_utf8&sektion=3&manpath=FreeBSD+Ports+15.0>

home | help