FreeBSD Manual Pages

home | help
enca(1)								       enca(1)

NAME
       enca -- detect and convert encoding of text files

SYNOPSIS
       enca [-L	LANGUAGE] [OPTION]... [FILE]...
       enconv [-L LANGUAGE] [OPTION]...	[FILE]...

INTRODUCTION AND EXAMPLES
       If you are lucky	enough,	the only two things you	will ever need to know
       are: command

	      enca FILE

       will tell you which encoding file FILE uses (without changing it), and

	      enconv FILE

       will  convert file FILE to your locale native encoding.	To convert the
       file to some other encoding use the -x option (see -x entry in  section
       OPTIONS and sections CONVERSION and ENCODINGS for details).

       Both work with multiple files and standard input	(output) too.  E.g.

	      enca -x latin2 <sometext | lpr

       assures file `sometext' is in ISO Latin 2 when it's sent	to printer.

       The  main  reason  why these command will fail and turn your files into
       garbage is that Enca needs to know their	language to detect the	encod-
       ing.   It  tries	 to determine your language and	preferred charset from
       locale settings,	which might not	be what	you want.

       You can (or have	to) use	-L option to tell it the right language.  Sup-
       pose, you downloaded some Russian HTML file, `file.htm',	it claims it's
       windows-1251 but	it isn't.  So you run

	      enca -L ru file.htm

       and find	out it's KOI8-R	(for example).	Be warned, currently there are
       not many	supported languages (see section LANGUAGES).

       Another warning concerns	the fact several Enca's	features,  namely  its
       charset	conversion  capabilities,  strongly depend on what other tools
       are installed on	your system (see section CONVERSION)--run

	      enca --version

       to get list of features (see section FEATURES).	Also try

	      enca --help

       to get description of all other Enca options (and to find the  rest  of
       this manual page	redundant).

DESCRIPTION
       Enca reads given	text files, or standard	input when none	are given, and
       uses  knowledge	about  their language (must be supported by you) and a
       mixture of parsing, statistical analysis, guessing and black  magic  to
       determine  their	encodings, which it then prints	to standard output (or
       it confesses it doesn't have any	idea what the encoding could be).   By
       default,	 Enca  presents	results	as a multiline human-readable descrip-
       tions, several other formats are	available--see Output  type  selectors
       below.

       Enca can	also convert files to some other encoding ENC when you ask for
       it--either  using  a built-in converter,	some conversion	library, or by
       calling an external converter.

       Enca's primary goal is to be usable unattended, as an automatic conver-
       sion tool, though it perhaps have not reached this  point  yet  (please
       see section SECURITY).

       Please  note  except rare cases Enca really has to know the language of
       input files to give you a reliable answer.  On the other	hand,  it  can
       then cope quite well with files that are	not purely textual or even de-
       tect charset of text strings inside some	binary file; of	course,	it de-
       pends on	the character of the non-text component.

       Enca  doesn't  care  about structure of input files, it views them as a
       uniform piece of	text/data.  In case of	multipart  files  (e.g.	 mail-
       boxes),	you have to use	some tool knowing the structure	to extract the
       individual parts	first.	It's the cost of ability to  detect  encodings
       of any damaged, incomplete or otherwise incorrect files.

OPTIONS
       There are several categories of options:	operation mode options,	output
       type selectors, guessing	parameters, conversion parameters, general op-
       tions and listings.

       All  long  options  can be abbreviated as long as they are unambiguous,
       mandatory parameters of long options are	mandatory  for	short  options
       too.

   Operation modes
       are following:

       -c, --auto-convert
	      Equivalent to calling Enca as enconv.

	      If  no output type selector is specified,	detect file encodings,
	      guess your preferred charset from	locales, and convert files  to
	      it (only available with +target-charset-auto feature).

       -g, --guess
	      Equivalent to calling Enca as enca.

	      If  no  output type selector is specified, detect	file encodings
	      and report them.

   Output type selectors
       select what action Enca will take when it determines the	encoding; most
       of them just choose between different names,  formats  and  conventions
       how encodings can be printed, but one of	them (-x) is special: it tells
       Enca to recode files to some other encoding ENC.	 These options are mu-
       tually exclusive; if you	specify	more than one output type selector the
       last one	takes precedence.

       Several output types represent charset name used	by some	other program,
       but not all these programs know all the charsets	which Enca recognises.
       Be  warned,  Enca  makes	no difference between unrecognised charset and
       charset having no name in given namespace in such situations.

       -d, --details
	      It used to print a few  pages  of	 details  about	 the  guessing
	      process,	but  since  Enca is just a program linked against Enca
	      library, this is not possible and	this option is roughly equiva-
	      lent to --human-readable,	except it reports failure reason  when
	      Enca doesn't recognize the encoding.

       -e, --enca-name
	      Prints  Enca's  nice name	of the charset,	i.e., perhaps the most
	      generally	accepted and more or less human-readable charset iden-
	      tifier, with surfaces appended.

	      This name	is used	when calling an	external converter, too.

       -f, --human-readable
	      Prints verbal description	 of  the  detected  charset  and  sur-
	      faces--something	a human	understands best.  This	is the default
	      behaviour.

	      The precise format is following: the first line contains charset
	      name alone, and it's followed by zero  or	 more  indented	 lines
	      containing names of detected surfaces.  This format is not, how-
	      ever,  suitable  or intended for further machine-processing, and
	      the verbal charset descriptions are like to change  in  the  fu-
	      ture.

       -i, --iconv-name
	      Prints   how  iconv(3)  (and/or  iconv(1))  calls	 the  detected
	      charset.	More precisely,	it prints one, more or less  arbitrar-
	      ily chosen, alias	accepted by iconv.  A charset unknown to iconv
	      counts as	unknown.

	      This  output  type  makes	 sense only when Enca is compiled with
	      iconv support (feature +iconv-interface).

       -r, --rfc1345-name
	      Prints RFC 1345 charset name.  When such a  name	doesn't	 exist
	      because  RFC  1345  doesn't  define a given encoding, some other
	      name defined in some other RFC or	just  the  name	 which	author
	      considers	`the most canonical', is printed.

	      Since  RFC  1345 doesn't define surfaces,	no surface info	is ap-
	      pended.

       -m, --mime-name
	      Prints preferred MIME name of detected  charset.	 This  is  the
	      name you should normally use when	fixing e-mails or web pages.

	      A	charset	not present in http://www.iana.org/assignments/charac-
	      ter-sets counts as unknown.

       -s, --cstocs-name
	      Prints  how cstocs(1) calls the detected charset.	 A charset un-
	      known to cstocs counts as	unknown.

       -n, --name=WORD
	      Prints charset (encoding)	name selected by WORD (can be abbrevi-
	      ated as long  as	is  unambiguous).   For	 names	listed	above,
	      --name=WORD is equivalent	to --WORD.

	      Using  aliases  as  the output type causes Enca to print list of
	      all accepted aliases of detected charset.

       -x, --convert-to=[..]ENC
	      Converts file to encoding	ENC.

	      The optional `..'	before encoding	name has no  special  meaning,
	      except  you  can	use  it	to remind yourself that, unlike	in re-
	      code(1), you should specify desired encoding,  instead  of  cur-
	      rent.

	      You  can	use  recode(1)	recoding  chains  or any other kind of
	      braindead	recoding specification for ENC,	provided that you tell
	      Enca to use some tool understanding it for conversion (see  sec-
	      tion CONVERSION).

	      When  Enca  fails	to determine the encoding, it prints a warning
	      and leaves the the file as is; when it is	run  as	 a  filter  it
	      tries  to	 do its	best to	copy standard input to standard	output
	      unchanged.  Nevertheless,	you should  not	 rely  on  it  and  do
	      backup.

   Guessing parameters
       There's	only  one:  -L setting language	of input files.	This option is
       mandatory (but see below).

       -L, --language=LANG
	      Sets language of input files to LANG.

	      More precisely, LANG can be any valid locale name	(or alias with
	      +locale-alias feature) of	some supported language.  You can also
	      specify `none' as	language name, only  multibyte	encodings  are
	      recognised then.	Run

	      enca --list languages

	      to  get list of supported	languages.  When you don't specify any
	      language Enca tries to guess your	language from locale  settings
	      and  assumes  input  files  use this language.  See section LAN-
	      GUAGES for details.

   Conversion parameters
       give you	finer control of how charset  conversion  will	be  performed.
       They  don't  affect  anything  when -x is not specified as output type.
       Please see section CONVERSION for the gory conversion details.

       -C, --try-converters=LIST
	      Appends comma separated LIST to the list of converters that will
	      be tried when you	ask for	conversion.  Their names can be	abbre-
	      viated as	long as	they are unambiguous.  Run

	      enca --list converters

	      to get list of all valid converter names (and see	 section  CON-
	      VERSION for their	description).

	      The default list depends on how Enca has been compiled, run

	      enca --help

	      to find out default converter list.

	      Note  the	default	list is	used only when you don't specify -C at
	      all.  Otherwise, the list	is built as if it were initially empty
	      and every	-C adds	new converter(s) to it.	 Moreover,  specifying
	      none as converter	name causes clearing the converter list.

       -E, --external-converter-program=PATH
	      Sets  external converter program name to PATH.  Default external
	      converter	depends	on how enca has	been complied, and the	possi-
	      bility  to  use external converters may not be available at all.
	      Run

	      enca --help

	      to find out default converter program in your enca build.

   General options
       don't fit to other option categories...

       -p, --with-filename
	      Forces Enca to prefix each result	with corresponding file	 name.
	      By  default,  Enca  prefixes  results with filenames when	run on
	      multiple files.

	      Standard input is	printed	as STDIN and standard output as	STDOUT
	      (the latter can be probably seen in error	messages only).

       -P, --no-filename
	      Forces Enca to not prefix	results	with file names.  By  default,
	      Enca  doesn't  prefix result with	file name when run on a	single
	      file (including standard input).

       -V, --verbose
	      Increases	verbosity level	(each use increases it by one).

	      Currently	this option in not very	useful because different parts
	      of Enca respond differently to the same verbosity	level,	mostly
	      not at all.

   Listings
       are  all	terminal, i.e. when Enca encounters some of them it prints the
       required	listing	and terminates without processing  any	following  op-
       tions.

       -h, --help
	      Prints brief usage help.

       -G, --license
	      Prints full Enca license (through	a pager, if possible).

       -l, --list=WORD
	      Prints  list specified by	WORD (can be abbreviated as long as it
	      is unambiguous).	Available lists	include:

	      built-in-charsets.  All encodings	convertible by	built-in  con-
	      verter,  by  group  (both	input and output encoding must be from
	      this list	and belong to the same group for internal conversion).

	      built-in-encodings.  Equivalent to built-in-charsets,  but  con-
	      sidered obsolete;	will be	accepted with a	warning, for a while.

	      converters.  All valid converter names (to be used with -C).

	      charsets.	  All encodings	(charsets).  You can select what names
	      will be printed with --name or any name output type selector (of
	      course, only encodings having a name in given namespace will  be
	      printed then), the selector must be specified before --list.

	      encodings.   Equivalent  to  charsets,  but considered obsolete;
	      will be accepted with a warning, for a while.

	      languages.  All supported	languages together with	 charsets  be-
	      longing  to them.	 Note output type selects language name	style,
	      not charset name style here.

	      names.  All possible values of --name option.

	      lists.  All possible values of this option.  (Crazy?)

	      surfaces.	 All surfaces Enca recognises.

       -v, --version
	      Prints program version and list of features  (see	 section  FEA-
	      TURES).

CONVERSION
       Though  Enca has	been originally	designed as a tool for guessing	encod-
       ing only, it now	features several methods of charset  conversion.   You
       can control which of them will be used with -C.

       Enca  sequentially tries	converters from	the list specified by -C until
       it finds	some that is able to perform required conversion or  until  it
       exhausts	the list.  You should specify preferred	converters first, less
       preferred  later.   External converter (extern) should be always	speci-
       fied last, only as last resort, since it's usually not possible to  re-
       cover when it fails.  The default list of converters always starts with
       built-in	 and  then continues with the first one	available from:	libre-
       code, iconv, nothing.

       It should be noted when Enca says it is not able	to perform the conver-
       sion it only means none of the converters is able to  perform  it.   It
       can  be	still  possible	 to perform the	required conversion in several
       steps, using several converters,	but to figure out how, human  intelli-
       gence is	probably needed.

   Built-in converter
       is  the	simplest  and  far  the	fastest	of all,	can perform only a few
       byte-to-byte conversions	and modifies files directly in place  (may  be
       considered  dangerous,  but  is pretty efficient).  You can get list of
       all encodings it	can convert with

	      enca --list built-in

       Beside speed, its main advantage	(and also  disadvantage)  is  that  it
       doesn't	care: it simply	converts characters having a representation in
       target encoding,	doesn't	touch anything else and	never prints any error
       message.

       This converter can be specified as built-in with	-C.

   Librecode converter
       is an interface to GNU recode library, that does	 the  actual  recoding
       job.  It	may or may not be compiled in; run

	      enca --version

       to find out its availability in your enca build (feature	+librecode-in-
       terface).

       You  should be familiar with recode(1) before using it, since recode is
       a quite sophisticated and powerful charset conversion  tool.   You  may
       run  into  problems  using  it  together	with Enca particularly because
       Enca's support for surfaces not 100% compatible,	because	 recode	 tries
       too  hard  to  make the transformation reversible, because it sometimes
       silently	ignores	I/O errors, and	because	it's incredibly	buggy.	Please
       see GNU recode info pages for details about recode library.

       This converter can be specified as librecode with -C.

   Iconv converter
       is an interface to the UNIX98 iconv(3) conversion  functions,  that  do
       the actual recoding job.	 It may	or may not be compiled in; run

	      enca --version

       to  find	out its	availability in	your enca build	(feature +iconv-inter-
       face).

       While iconv is present on most today systems it only rarely offer  some
       useful  set  of available conversions, the only notable exception being
       iconv from GNU libc.  It	is usually quite  picky	 about	surfaces,  too
       (while,	at  the	 same  time, not implementing surface conversion).  It
       however probably	represents the only standard(ized) tool	able  to  per-
       form  conversion	from/to	Unicode.  Please see iconv documentation about
       for details about its capabilities on your particular system.

       This converter can be specified as iconv	with -C.

   External converter
       is an arbitrary external	conversion tool	that can be specified with  -E
       option  (at  most  one  can be defined simultaneously).	There are some
       standard, provided together with	enca: cstocs, recode, map,  umap,  and
       piconv.	 All  are  wrapper  scripts: for cstocs(1), recode(1), map(1),
       umap(1),	and piconv(1).

       Please note enca	has little control what	the external converter	really
       does.   If you set it to	/bin/rm	you are	fully responsible for the con-
       sequences.

       If you want to make your	own converter to use  with  enca,  you	should
       know it is always called

	      CONVERTER	ENC_CURRENT ENC	FILE [-]

       where CONVERTER is what has been	set by -E, ENC_CURRENT is detected en-
       coding, ENC is what has been specified with -x, and FILE	is the file to
       convert,	 i.e.  it  is  called  for each	file separately.  The optional
       fourth parameter, -, should cause (when present)	sending	result of con-
       version to standard output instead of overwriting the file  FILE.   The
       converter  should  also take care of not	changing file permissions, re-
       turning error code 1 when it fails and cleaning	its  temporary	files.
       Please see the standard external	converters for examples.

       This converter can be specified as extern with -C.

   Default target charset
       The  straightforward way	of specifying target charset is	the -x option,
       which overrides any defaults.  When Enca	is called as  enconv,  default
       target charset is selected exactly the same way as recode(1) does it.

       If  the	DEFAULT_CHARSET	 environment variable is set, it's used	as the
       target charset.

       Otherwise, if you system	provides the nl_langinfo(3) function,  current
       locale's	native charset is used as the target charset.

       When both methods fail, Enca complains and terminates.

   Reversibility notes
       If  reversibility  is  crucial  for you,	you shouldn't use enca as con-
       verter at all (or maybe you can,	with very  specifically	 designed  re-
       code(1)	wrapper).   Otherwise you should at least know that there four
       basic means of handling inconvertible character entities:

       fail--this is a possibility, too, and incidentally  it's	 exactly  what
       current	GNU libc iconv implementation does (recode can be also told to
       do it)

       don't touch them--this is what enca internal converter always does  and
       recode  can  do;	 though	it is not reversible, a	human being is usually
       able to reconstruct the original	(at least in principle)

       approximate them--this is what cstocs can do, and  recode  too,	though
       differently;  and the best choice if you	just want to make the accursed
       text readable

       drop them out--this is what both	recode and cstocs can do  (cstocs  can
       also  replace  these characters by some fixed character instead of mere
       ignoring); useful when the to-be-omitted	characters contain only	noise.

       Please consult your favourite converter manual for details of this  is-
       sue.   Generally,  if  you are not lucky	enough to have all convertible
       characters in you file, manual intervention is needed anyway.

   Performance notes
       Poor performance	of available converters	has been one of	 main  reasons
       for  including built-in converter in enca.  Try to use it whenever pos-
       sible, i.e. when	files in consideration	are  charset-clean  enough  or
       charset-messy  enough  so  that	its zero built-in intelligence doesn't
       matter.	It requires no extra disk space	nor extra memory and can  out-
       perform	recode(1)  more	 than 10 times on large	files and Perl version
       (i.e. the faster	one) of	cstocs(1) more than 400	times on  small	 files
       (in fact	it's almost as fast as mere cp(1)).

       Try  to	avoid  external	 converters when it's not absolutely necessary
       since all the forking and moving	stuff around is	incredibly slow.

ENCODINGS
       You can get list	of recognised character	sets with

	      enca --list charsets

       and using --name	parameter you can select any name you want to be  used
       in the listing.	You can	also list all surfaces with

	      enca --list surfaces

       Encoding	 and  surface  names are case insensitive and non-alphanumeric
       characters are not taken	into account.  However,	non-alphanumeric char-
       acters are mostly not allowed at	all.  The only allowed are: `-',  `_',
       `.',  `:',  and	`/'  (as  charset/surface separator).  So `ibm852' and
       `IBM-852' are the same, while `IBM 852' is not accepted.

   Charsets
       Following list of recognised charsets uses Enca's names (-e) and	verbal
       descriptions as reported	by Enca	(-f):

       ASCII	     7bit ASCII	characters
       ISO-8859-2    ISO 8859-2	standard; ISO Latin 2
       ISO-8859-4    ISO 8859-4	standard; Latin	4
       ISO-8859-5    ISO 8859-5	standard; ISO Cyrillic
       ISO-8859-13   ISO 8859-13 standard; ISO Baltic; Latin 7
       ISO-8859-16   ISO 8859-16 standard
       CP1125	     MS-Windows	code page 1125
       CP1250	     MS-Windows	code page 1250
       CP1251	     MS-Windows	code page 1251
       CP1257	     MS-Windows	code page 1257;	WinBaltRim
       IBM852	     IBM/MS code page 852; PC (DOS) Latin 2
       IBM855	     IBM/MS code page 855
       IBM775	     IBM/MS code page 775
       IBM866	     IBM/MS code page 866
       baltic	     ISO-IR-179; Baltic
       KEYBCS2	     Kamenicky encoding; KEYBCS2
       macce	     Macintosh Central European
       maccyr	     Macintosh Cyrillic
       ECMA-113	     Ecma Cyrillic; ECMA-113
       KOI-8_CS_2    KOI8-CS2 code (`T602')
       KOI8-R	     KOI8-R Cyrillic
       KOI8-U	     KOI8-U Cyrillic
       KOI8-UNI	     KOI8-Unified Cyrillic
       TeX	     (La)TeX control sequences
       UCS-2	     Universal character set 2 bytes; UCS-2; BMP
       UCS-4	     Universal character set 4 bytes; UCS-4; ISO-10646
       UTF-7	     Universal transformation format 7 bits; UTF-7
       UTF-8	     Universal transformation format 8 bits; UTF-8
       CORK	     Cork encoding; T1
       GBK	     Simplified	Chinese	National Standard; GB2312
       BIG5	     Traditional Chinese Industrial Standard; Big5
       HZ	     HZ	encoded	GB2312
       unknown	     Unrecognized encoding

       where unknown is	not any	real encoding, it's reported when Enca is  not
       able to give a reliable answer.

   Surfaces
       Enca  has some experimental support for so-called surfaces (see below).
       It detects following surfaces (not all can be applied to	all charsets):

       /CR     CR line terminators
       /LF     LF line terminators
       /CRLF   CRLF line terminators
       N.A.    Mixed line terminators
       N.A.    Surrounded by/intermixed	with non-text data
       /21     Byte order reversed in pairs (1,2 -> 2,1)
       /4321   Byte order reversed in quadruples (1,2,3,4 -> 4,3,2,1)
       N.A.    Both little and big endian chunks, concatenated
       /qp     Quoted-printable	encoded

       Note some surfaces have N.A. in place  of  identifier--they  cannot  be
       specified  on command line, they	can only be reported by	Enca.  This is
       intentional because they	only inform you	why the	file cannot be consid-
       ered surface-consistent instead of representing a real surface.

       Each charset has	its natural surface (called `implied' in recode) which
       is not reported,	e.g., for IBM 852 charset  it's	 `CRLF	line  termina-
       tors'.  For UCS encodings, big endian is	considered as natural surface;
       unusual byte orders are constructed from	21 and 4321 permutations: 2143
       is reported simply as 21, while 3412 is reported	as combination of 4321
       and 21.

       Doubly-encoded  UTF-8  is  neither  charset  nor	surface, it's just re-
       ported.

   About charsets, encodings and surfaces
       Charset is a set	of character entities while encoding is	its  represen-
       tation  in  the	terms  of  bytes and bits.  In Enca, the word encoding
       means the same as `representation of text', i.e.	the  relation  between
       sequence	 of  character	entities constituting the text and sequence of
       bytes (bits) constituting the file.

       So, encoding is both character set and so-called	surface	(line termina-
       tors, byte order, combining, Base64 transformation,  etc.).   Neverthe-
       less, it	proves convenient to work with some {charset,surface} pairs as
       with  genuine  charsets.	 So, as	in recode(1), all UCS- and UTF-	encod-
       ings of Universal character set are called charsets.  Please see	recode
       documentation for more details of this issue.

       The only	good thing about surfaces is: when  you	 don't	start  playing
       with  them,  neither Enca won't start and it will try to	behave as much
       as possible as a	surface-unaware	program, even when talking to recode.

LANGUAGES
       Enca needs to know the language of input	files  to  work	 reliably,  at
       least  in case of regular 8bit encoding.	 Multibyte encodings should be
       recognised for any Latin, Cyrillic or Greek language.

       You can (or have	to) use	-L option to tell Enca	the  language.	 Since
       people  most  often work	with files in the same language	for which they
       have configured locales,	Enca tries tries to guess the language by  ex-
       amining	value  of LC_CTYPE and other locale categories (please see lo-
       cale(7))	and using it for the language when you don't specify any.   Of
       course,	it  may	be completely wrong and	will give you nonsense answers
       and damage your files, so please	don't forget to	 use  the  -L  option.
       You can also use	ENCAOPT	environment variable to	set a default language
       (see section ENVIRONMENT).

       Following  languages are	supported by Enca (each	language is listed to-
       gether with supported 8bit encodings).

       Belarusian    CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855
       Bulgarian     CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
       Czech	     ISO-8859-2	CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
       Estonian	     ISO-8859-4	CP1257 IBM775 ISO-8859-13 macce	baltic
       Croatian	     CP1250 ISO-8859-2 IBM852 macce CORK
       Hungarian     ISO-8859-2	CP1250 IBM852 macce CORK
       Lithuanian    CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce	baltic
       Latvian	     CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce	baltic
       Polish	     ISO-8859-2	CP1250 IBM852 macce ISO-8859-13	ISO-8859-16 baltic CORK
       Russian	     KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
       Slovak	     CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
       Slovene	     ISO-8859-2	CP1250 IBM852 macce CORK
       Ukrainian     CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
       Chinese	     GBK BIG5 HZ
       none

       The special language none can be	shortened to __, it contains  no  8bit
       encodings, so only multibyte encodings are detected.

       You can also use	locale names instead of	languages:
       Belarusian      be
       Bulgarian       bg
       Czech	       cs
       Estonian	       et
       Croatian	       hr
       Hungarian       hu
       Lithuanian      lt
       Latvian	       lv
       Polish	       pl
       Russian	       ru
       Slovak	       sk
       Slovene	       sl
       Ukrainian       uk
       Chinese	       zh

FEATURES
       Several	Enca's features	depend on what is available on your system and
       how it was compiled.  You can get their list with

	      enca --version

       Plus sign before	a feature name means it's available, minus sign	 means
       this build lacks	the particular feature.

       librecode-interface.   Enca has interface to GNU	recode library charset
       conversion functions.

       iconv-interface.	 Enca has interface to UNIX98 iconv charset conversion
       functions.

       external-converter.  Enca can use external conversion programs (if  you
       have some suitable installed).

       language-detection.   Enca  tries  to guess language (-L) from locales.
       You don't need the --language option, at	least in principle.

       locale-alias.  Enca is able to decrypt locale aliases used for language
       names.

       target-charset-auto.  Enca tries	to detect your preferred charset  from
       locales.	  Option  --auto-convert  and calling Enca as enconv works, at
       least in	principle.

       ENCAOPT.	 Enca is able to correctly parse this environment variable be-
       fore command line parameters.  Simple stuff like	ENCAOPT="-L  uk"  will
       work even without this feature.

ENVIRONMENT
       The variable ENCAOPT can	hold set of default Enca options.  Its content
       is  interpreted	before	command	 line  arguments.  Unfortunately, this
       doesn't work everywhere (must have +ENCAOPT feature).

       LC_CTYPE, LC_COLLATE, LC_MESSAGES (possibly inherited  from  LC_ALL  or
       LANG) is	used for guessing your language	(must have +language-detection
       feature).

       The  variable DEFAULT_CHARSET can be used by enconv as the default tar-
       get charset.

DIAGNOSTICS
       Enca returns exit code 0	when all input files  were  successfully  pro-
       ceeded  (i.e.  all encodings were detected and all files	were converted
       to required encoding, if	conversion was asked for).  Exit code 1	is re-
       turned when Enca	wasn't able to either guess encoding or	 perform  con-
       version	on any input file because it's not clever enough.  Exit	code 2
       is returned in case of serious (e.g. I/O) troubles.

SECURITY
       It should be possible to	let Enca work unattended, it's its goal.  How-
       ever:

       There's	no warranty the	detection works	100%. Don't bet	on it, you can
       easily lose valuable data.

       Don't use enca (the program), link to libenca instead if	you want  any-
       thing  resembling security. You have to perform the eventual conversion
       yourself	then.

       Don't use external converters. Ideally, disable them compile-time.

       Be aware	of ENCAOPT and all the	built-in  automagic  guessing  various
       things from environment,	namely locales.

SEE ALSO
       autoconvert(1), cstocs(1), file(1), iconv(1), iconv(3), nl_langinfo(3),
       map(1),	piconv(1),  recode(1),	locale(5), locale(7), ltt(1), umap(1),
       unicode(7), utf-8(7), xcode(1)

KNOWN BUGS
       It has too many unknown bugs.

       The idea	of using LC_* value for	language is certainly braindead.  How-
       ever I like it.

       It can't	backup files before mangling them.

       In certain situations, it may behave incorrectly	on >31bit file systems
       and/or over NFS (both untested but shouldn't cause  problems  in	 prac-
       tice).

       Built-in	 converter  does not convert character `ch' from KOI8-CS2, and
       possibly	some other characters you've probably never heard  about  any-
       way.

       EOL  type  recognition  works poorly on Quoted-printable	encoded	files.
       This should be fixed someday.

       There are no command line options to tune libenca parameters.  This  is
       intentional (Enca should	DWIM) but sometimes this is a nuisance.

       The  manual  page  is  too long,	especially this	section.  This doesn't
       matter since nobody does	read it.

       Send bug	reports	to <https://github.com/nijel/enca/issues>.

TRIVIA
       Enca is Extremely Naive	Charset	 Analyser.   Nevertheless,  the	 `enc'
       originally  comes  from `encoding' so the leading `e' should be read as
       in `encoding' not as in `extreme'.

AUTHORS
       David Necas (Yeti) <yeti@physics.muni.cz>

       Michal Cihar <michal@cihar.com>

       Unicode data has	been generated from various (free)  on-line  resources
       or  using GNU recode.  Statistical data has been	generated from various
       texts on	the Net, I hope	 character  counting  doesn't  break  anyone's
       copyright.

ACKNOWLEDGEMENTS
       Please see the file THANKS in distribution.

COPYRIGHT
       Copyright (C) 2000-2003 David Necas (Yeti).

       Copyright (C) 2009 Michal Cihar <michal@cihar.com>.

       Enca  is	 free software;	you can	redistribute it	and/or modify it under
       the terms of version 2 of the GNU General Public	License	 as  published
       by the Free Software Foundation.

       Enca is distributed in the hope that it will be useful, but WITHOUT ANY
       WARRANTY;  without even the implied warranty of MERCHANTABILITY or FIT-
       NESS FOR	A PARTICULAR PURPOSE.  See the GNU General Public License  for
       more details.

       You should have received	a copy of the GNU General Public License along
       with  Enca;  if	not,  write to the Free	Software Foundation, Inc., 675
       Mass Ave, Cambridge, MA 02139, USA.

enca 1.11			   Sep 2009			       enca(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=enconv&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages