Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
iconv(3)		   Library Functions Manual		      iconv(3)

NAME
       iconv - charset conversion function

SYNOPSIS
       #include	<iconv.h>

       size_t iconv(iconv_t cd,	const char **inbuf,
	    size_t *inbytesleft, char **outbuf,
	    size_t *outbytesleft);

DESCRIPTION
       The  iconv()  function  converts	 the  sequence	of characters from one
       charset,	in the array specified by inbuf, into  a  sequence  of	corre-
       sponding	 characters in another charset,	in the array specified by out-
       buf.  The charsets are those specified in the  iconv_open()  call  that
       returned	the conversion descriptor, cd.	The inbuf argument points to a
       variable	that points to the first character in the input	buffer and in-
       bytesleft  indicates the	number of bytes	to the end of the buffer to be
       converted.  The outbuf argument points to a variable that points	to the
       first available byte in the output buffer  and  outbytesleft  indicates
       the number of the available bytes to the	end of the buffer.

       For  state-dependent  encodings,	the conversion descriptor cd is	placed
       into its	initial	shift state by a  call	for  which  inbuf  is  a  null
       pointer,	 or for	which inbuf points to a	null pointer.  When iconv() is
       called in this way, and if outbuf is not	a null pointer or a pointer to
       a null pointer, and outbytesleft	points to a  positive  value,  iconv()
       will  place,  into  the	output buffer, the byte	sequence to change the
       output buffer to	its initial shift state.  If the output	buffer is  not
       large  enough  to hold the entire reset sequence, iconv() will fail and
       set errno to E2BIG.  Subsequent calls with inbuf	as other than  a  null
       pointer	or  a  pointer	to a null pointer cause	the conversion to take
       place from the current state of the conversion descriptor.

       If a sequence of	input bytes does not form a  valid  character  in  the
       specified  charset,  conversion	stops  after the previous successfully
       converted character.  If	the input buffer ends with an incomplete char-
       acter or	shift sequence,	conversion stops after the  previous  success-
       fully  converted	 bytes.	  If  the output buffer	is not large enough to
       hold the	entire converted input,	conversion stops just prior to the in-
       put bytes that would cause the output buffer to overflow.  The variable
       pointed to by inbuf is updated to point to the byte following the  last
       byte  successfully used in the conversion.  The value pointed to	by in-
       bytesleft is decremented	to reflect the number of bytes still not  con-
       verted  in  the input buffer.  The variable pointed to by outbuf	is up-
       dated to	point to the byte following the	last byte of converted	output
       data.   The  value pointed to by	outbytesleft is	decremented to reflect
       the number of bytes still available in the output buffer.   For	state-
       dependent  encodings,  the  conversion descriptor is updated to reflect
       the shift state in effect at the	end of the last	successfully converted
       byte sequence.

       If iconv() encounters a character in the	input buffer  that  is	legal,
       but  for	 which	an  identical  character  does not exist in the	target
       charset,	iconv()	performs an implementation-defined conversion on  this
       character.

RETURN VALUES
       The  iconv() function updates the variables pointed to by the arguments
       to reflect the extent of	the conversion and returns the number of  non-
       identical  conversions  performed.   If	the entire string in the input
       buffer is converted, the	value pointed to by inbytesleft	will be	0.  If
       the input conversion is stopped due to any conditions mentioned	above,
       the  value  pointed to by inbytesleft will be non-zero and errno	is set
       to indicate the condition.  If an error occurs iconv() returns (size_t)
       -1 and sets errno to indicate the error.

ERRORS
       The iconv() function will fail if:

       EILSEQ	      Input conversion stopped due to an input byte that  does
		      not belong to the	input charset.

       E2BIG	      Input  conversion	 stopped  due  to lack of space	in the
		      output buffer.

       EINVAL	      Input conversion stopped due to an incomplete  character
		      or shift sequence	at the end of the input	buffer.

       The iconv() function may	fail if:

       EBADF	      The  cd argument is not a	valid open conversion descrip-
		      tor.

APPLICATION USAGE
       The inbuf argument indirectly points to the memory area which  contains
       the conversion input data. The outbuf argument indirectly points	to the
       memory  area  which is to contain the result of the conversion. The ob-
       jects indirectly	pointed	to by inbuf and	outbuf are not	restricted  to
       containing  data	 that  is directly representable in the	ISO C language
       char data type. The type	of inbuf and outbuf, char **, does  not	 imply
       that  the  objects  pointed  to	are  interpreted  as null-terminated C
       strings or arrays of characters.	Any interpretation of a	byte  sequence
       that represents a character in a	given character	set encoding scheme is
       done  internally	 within	the codeset converters.	 For example, the area
       pointed to indirectly by	inbuf  and/or  outbuf  can  contain  all  zero
       octets  that  are  not  interpreted  as string terminators but as coded
       character data according	to the respective codeset encoding scheme. The
       type of the data	 (char,	short int, long	int, and so on)	read or	stored
       in the objects is not specified,	but may	be inferred for	both the input
       and output data by the converters determined by	the  from_charset  and
       to_charset arguments of iconv_open().

       Regardless  of the data type inferred by	the converter, the size	of the
       remaining space in both input and output	objects	(the intbytesleft  and
       outbytesleft arguments) is always measured in bytes.

IMPLEMENTATION DETAILS
       Conversions between different charsets are done via the UCS-4 universal
       character  set.	Conversions  between  the same charset (e.g.  when two
       different aliases of the	same charset are  used)	 are  done  by	direct
       copying	from  the input	buffer to the output one. The libiconv library
       itself usually contains only a small set	of (built-in)  charsets.   Ta-
       bles for	conversion between UCS-4 and particular	charsets are mapped to
       memory  from  binary  table  files, or C	methods	are loaded dynamically
       from shared modules:

       Coded character sets (CCS)
	      Each CCS file contains tables for	convertion between exactly one
	      character	of a corresponding charset and	one  UCS-4  character,
	      and  vice	 versa,	 a UCS-4 character to the character of the CCS
	      charset. About 200 character sets	are supported (only those used
	      in FreeBSD distribution is provided in this  package)  including
	      ASCII and	the following standards: ISO-8859, KOI8, Windows, IBM-
	      DOS, Macintosh, CJK national charsets and	EBCDIC.	 CCS files are
	      accessed via memory mapping.

       Character encoding schemes (CES)
	       Each  CES  module contains functions converting a byte sequence
	       of a corresponding encoding scheme to exactly one UCS-4	32-bit
	       character, and vice versa, a UCS-4 character to a byte sequence
	       of  the	CES.   The  following  CES groups are supported	in the
	       iconv-1.0: ISO-10646 (UCS-4 and UCS-2, each in  both  architec-
	       ture  independent (network) and dependent (internal) byte order
	       versions), Unicode (UTF-16, UTF-8 and UTF-7), ISO-2022 and  Ex-
	       tended  Unix Code (EUC) (both for Chinese (CN and TW), Japanese
	       and Korean languages). A	special	table-driven CES  module  pro-
	       viding  conversion  for	all CCS	tables is always built in into
	       the library.  ISO-2022, EUC and table-driven modules use	one or
	       more memory-mapped CCS tables.

       Any CCS table or	CES module can be built	in into	the library at	compi-
       lation time.

       A  CCS or CES charset can have zero or more aliases (alternative	names)
       which are listed	in charset.aliases file	located	in the same  directory
       as  CCS	tables.	 The  library  maps the	aliases	file to	memory to find
       canonical charset names.

       If iconv() encounters a character in the	input buffer  that  is	legal,
       but  for	 which	an  identical  character  does not exist in the	target
       charset,	iconv()	replaces the source character with the	  '_'  (under-
       score)  character  and tries to convert it into the target charset.  If
       there is	no underscore character	in the target charset,	no  bytes  are
       written	to  the	 target	 buffer	for the	source character. In any case,
       iconv() increments the number of	 non-identical	conversions  performed
       (the value being	returned as the	function result).

FILES
       /usr/local/share/iconv/charset.aliases
				Charset	aliases	file
       /usr/local/share/iconv/*.cct
				CCS conversion tables
       /usr/local/libexec/iconv/*.so
				CES conversion modules

SEE ALSO
       iconv(1), iconv_close(3), iconv_open(3)

				  7 Sep	2000			      iconv(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=biconv&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help