FreeBSD Manual Pages

home | help
CONVMV(1)							     CONVMV(1)

NAME
       convmv -	converts filenames from	one encoding to	another

SYNOPSIS
       convmv [options]	FILE(S)	... DIRECTORY(S)

OPTIONS
       -f ENCODING
	   specify  the	 current encoding of the filename(s) from which	should
	   be converted

       -t ENCODING
	   specify the encoding	to which the filename(s) should	be converted

       -i  interactive mode (ask y/n for each action)

       -r  recursively go through directories

       --nfc
	   target files	will be	normalization form C for UTF-8 (Linux etc.)

       --nfd
	   target files	will be	normalization form D for UTF-8 (OS X etc.).

       --qfrom , --qto
	   be more quiet about the "from" or "to" of a rename (if it screws up
	   your	terminal e.g.).	This will in fact do nothing else than replace
	   any non-ASCII character (bytewise) with ? and any control character
	   with	* on printout, this does not affect rename operation itself.

       --exec command
	   execute the given command. You have to quote	 the  command  and  #1
	   will	 be substituted	by the old, #2 by the new filename. Using this
	   option link targets will stay untouched. Have in mind that  #1  and
	   #2  will  be	 quoted	 by  convmv  already,  you  must not add extra
	   quotation marks around them.

	   Example:

	   convmv -f latin1 -t utf-8 -r	--exec "echo #1	should be  renamed  to
	   #2" path/to/files

       --list
	   list	 all  available	 encodings. To get support for more Chinese or
	   Japanese encodings  install	the  Perl  HanExtra  or	 JIS2K	Encode
	   packages.

       --lowmem
	   keep	memory footprint low by	not creating a hash of all files. This
	   disables checking if	symlink	targets	are in subtree.	Symlink	target
	   pointers  will  be  converted regardlessly. If you convert multiple
	   hundredthousands or millions	of files the memory  usage  of	convmv
	   might grow quite high. This option would help you out in that case.

       --nosmart
	   by default convmv will detect if a filename is already UTF8 encoded
	   and	will  skip  this  file if conversion from some charset to UTF8
	   should be performed.	 "--nosmart" will  also	 force	conversion  to
	   UTF-8  for such files, which	might result in	"double	encoded	UTF-8"
	   (see	section	below).

       --fixdouble
	   using the "--fixdouble" option convmv does only convert files which
	   will	still be UTF-8 encoded after  conversion.  That's  useful  for
	   fixing double-encoded UTF-8 files. All files	which are not UTF-8 or
	   will	not result in UTF-8 after conversion will not be touched. Also
	   see chapter "How to undo double UTF-8 ..."  below.

       --notest
	   Needed  to  actually	 rename	the files. By default convmv will just
	   print what it wants to do.

       --parsable
	   This	is an advanced option that people who  want  to	 write	a  GUI
	   front end will find useful (some others maybe, too).	It will	convmv
	   make	 print out what	it would do in an easy parsable	way. The first
	   column contains the action or some kind of information, the	second
	   column  mostly  contains  the  file	that  is to be modified	and if
	   appropriate the third column	contains  the  modified	 value.	  Each
	   column  is  separated  by  \0\n  (nullbyte  newline). Each row (one
	   action) is separated	by \0\0\n (nullbyte nullbyte newline).

       --run-parsable
	   This	option can be used to blindly execute the output of a previous
	   --parsable run.  This way it's possible to rename a huge amount  of
	   file	in a minimum of	time.

       --no-preserve-mtimes
	   modifying  filenames	 usually  causes  the parent directory's mtime
	   being updated.  Since version 2 convmv by default resets the	 mtime
	   to the old value. If	your filesystem	supports sub-second resolution
	   the	sub-second  part  of  the atime	and mtime will be lost as Perl
	   does	not yet	support	that. With this	option	you  can  disable  the
	   preservation	of the mtimes.

       --replace
	   if  the  file  to which shall be renamed already exists, it will be
	   overwritten if the other file content is equal.

       --unescape
	   this	option will remove this	ugly % hex  sequences  from  filenames
	   and	turn  them  into  (hopefully)  nicer  8-bit  characters. After
	   --unescape  you  might  want	 to  do	 a  charset  conversion.  This
	   sequences like %20 etc. are sometimes produced when downloading via
	   http	or ftp.

       --upper , --lower
	   turn	 filenames  into all upper or all lower	case. When the file is
	   not ASCII-encoded, convmv expects a charset to be entered  via  the
	   -f switch.

       --map=some-extra-mapping
	   apply some custom character mappings, currently supported are:

	   ntfs-sfm(-undo),  ntfs-sfu(-undo)  for  the mapping of illegal ntfs
	   characters for Linux	or Macintosh cifs clients (see	MS  KB	117258
	   also	mapchars mount option of mount.cifs on Linux).

	   ntfs-pretty(-undo)  for  for	the mapping of illegal ntfs characters
	   to pretty legal Japanese versions of	them.

	   See the map_get_newname() function how to easily add	 own  mappings
	   if  needed.	Let me know if you think convmv	is missing some	useful
	   mapping here.

       --dotlessi
	   care	about the dotless i/I issue. A lowercase version of  "I"  will
	   also	 be  dotless  while  an	 uppercase version of "i" will also be
	   dotted. This	is an issue for	Turkish	and Azeri.

	   By the way: The superscript dot of the letter i was	added  in  the
	   Middle  Ages	 to  distinguish  the  letter  (in  manuscripts)  from
	   adjacent vertical strokes in	such letters as	u, m, and n.  J	 is  a
	   variant  form  of  i	 which	emerged	 at this time and subsequently
	   became a separate letter.

       --caseful-sz
	   let convmv convert  the  sz	ligature  (U+00DF)  to	the  uppercase
	   version  (U+1E9E)  and  vice	versa. As of 2017 most fs case mapping
	   tables don't	treat those two	code points as case equivalents.  Thus
	   the	default	of convmv is to	treat it caseless for now also (unless
	   this	option is used).

       --help
	   print a short summary of available options

       --dump-options
	   print a list	of all available options

DESCRIPTION
       convmv is meant to help convert a single	filename, a directory tree and
       the contained files or a	whole filesystem into a	different encoding. It
       just converts the filenames, not	the content of the  files.  A  special
       feature of convmv is that it also takes care of symlinks, also converts
       the  symlink  target  pointer  in  case	the  symlink  target  is being
       converted, too.

       All this	comes in very handy when one wants to  switch  over  from  old
       8-bit  locales  to  UTF-8  locales.  It	is  also  possible  to convert
       directories to UTF-8 which are already partly UTF-8 encoded. convmv  is
       able to detect if certain files are UTF-8 encoded and will skip them by
       default.	To turn	this smartness off use the "--nosmart" switch.

   Filesystem issues
       Almost  all  POSIX  filesystems	do  not	 care  about how filenames are
       encoded,	here are some exceptions:

       HFS+ on OS X / Darwin

       Linux and (most?) other Unix-like operating systems use the  so	called
       normalization form C (NFC) for its UTF-8	encoding by default but	do not
       enforce	this.  HFS+  on	the Macintosh OS enforces normalization	form D
       (NFD), where a few characters are encoded in a different	way. On	 OS  X
       it's  not  possible  to	create	NFC  UTF-8  filenames  because this is
       prevented at filesystem layer.  On HFS+ filenames are internally	stored
       in UTF-16 and when converted back to UTF-8 (because the Unix  based  OS
       can't  deal  with UTF-16	directly), NFD is created for whatever reason.
       See  http://developer.apple.com/qa/qa2001/qa1173.html  for  defails.  I
       think  it  was  a very bad idea and breaks many things under OS X which
       expect a	normal POSIX conforming	system.	Anywhere else convmv  is  able
       to   convert   files  from  NFC	to  NFD	 or  vice  versa  which	 makes
       interoperability	with such systems a lot	easier.

       APFS on macOS

       Apple, with the introduction of APFS in macOS 10.3, gave	up  to	impose
       NFD  on user space. But once you	enforced NFD there is no easy way back
       without breaking	existing  applications.	 So  they  had	to  make  APFS
       normalization-insensitive,  that	 means a file can be created in	NFC or
       NFD in the filesystem and it can	be  accessed  with  both  forms	 also.
       Under the hood they store hashes	of the normalized form of the filename
       to provide normalization	insensitivity. Sounds like a great idea? Let's
       see:  If	 you  readddir a directory, you	will get back the files	in the
       the normalization form that was used when those files were created.  If
       you  stat  a  file  in  NFC  or	in NFD form you	will get back whatever
       normalization  form  you	 used  in  the	stat  call.  So	  user	 space
       applications  can't expect that a file that can be stat'ed and accessed
       successfully, is	also part of directory listings	because	 the  returned
       normalization  form  is	faked  to  match  what	the  user  asked  for.
       Theoretically also user space will have to normalize  strings  all  the
       time.  This  is	the  same  problem  as	for  the case insensitivity of
       filenames before, which still breaks many user space applications. Just
       that the	latter one was much more obvious to spot and to	implement than
       this thing. So long, and	thanks for all the fish.

       JFS

       If people mount JFS partitions with iocharset=utf8, there is a  similar
       problem,	 because  JFS  is  designed  to	 store filenames internally in
       UTF-16, too; that is because Linux' JFS is really  JFS2,	 which	was  a
       rewrite	of  JFS	for OS/2. JFS partitions should	always be mounted with
       iocharset=iso8859-1, which  is  also  the  default  with	 recent	 2.6.6
       kernels.	 If  this  is  not  done,  JFS	does  not  behave like a POSIX
       filesystem and it might happen that certain files cannot	be created  at
       all,   for   example   filenames	 in  ISO-8859-1	 encoding.  Only  when
       interoperation with OS/2	is needed iocharset should be set according to
       your used locale	charmap.

       NFS4

       Despite other POSIX filesystems RFC3530 (NFS 4) mandates	UTF-8 but also
       says: "The nfs4_cs_prep profile does not	specify	a normalization	 form.
       A  later	 revision  of  this  specification  may	 specify  a particular
       normalization form." In other words, if you want	to use NFS4 you	 might
       find the	conversion and normalization features of convmv	quite useful.

       FAT/VFAT	and NTFS

       NTFS  and  VFAT	(for  long  filenames)	use UTF-16 internally to store
       filenames.  You should not need to convert filenames if you  mount  one
       of those	filesystems.  Use appropriate mount options instead!

   How to undo double UTF-8 (or	other) encoded filenames
       Sometimes  it might happen that you "double-encoded" certain filenames,
       for  example  the  file	names  already	were  UTF-8  encoded  and  you
       accidently  did	another	conversion from	some charset to	UTF-8. You can
       simply undo that	by converting that the	other  way  round.  The	 from-
       charset	has  to	be UTF-8 and the to-charset has	to be the from-charset
       you previously accidently used.	If you use  the	 "--fixdouble"	option
       convmv will make	sure that only files will be processed that will still
       be  UTF-8  encoded  after  conversion and it will leave non-UTF-8 files
       untouched. You should check to get the correct  results	by  doing  the
       conversion  without  "--notest" before, also the	"--qfrom" option might
       be helpful, because the double utf-8 file names	might  screw  up  your
       terminal	 if  they  are	being  printed	-  they	 often contain control
       sequences which do funny	things with your terminal window. If  you  are
       not  sure  about	the charset which was accidently converted from, using
       "--qfrom" is a good way to fiddle out  the  required  encoding  without
       destroying the file names finally.

   How to repair Samba files
       When  in	 the  smb.conf	(of Samba 2.x) there hasn't been set a correct
       "character set" variable, files which are created from Win* clients are
       being created in	the client's codepage, e.g. cp850 for western european
       languages. As a result  of  that	 the  files  which  contain  non-ASCII
       characters  are	screwed	up if you "ls" them on the Unix	server.	If you
       change the "character set"  variable  afterwards	 to  iso8859-1,	 newly
       created	files  are okay, but the old files are still screwed up	in the
       Windows encoding. In this case convmv can also be used to  convert  the
       old Samba-shared	files from cp850 to iso8859-1.

       By  the	way:  Samba 3.x	finally	maps to	UTF-8 filenames	by default, so
       also when you migrate from Samba	2 to Samba 3 you might have to convert
       your file names.

   Netatalk interoperability issues
       When Netatalk is	being switched to UTF-8	which is supported in  version
       2 then it is NOT	sufficient to rename the file names. There needs to be
       done				 more.				   See
       http://netatalk.sourceforge.net/2.0/htmldocs/upgrade.html#volumes-and-filenames
       and the uniconv utility of Netatalk for details.

SEE ALSO
       locale(1) utf-8(7) charsets(7)

BUGS
       no bugs or fleas	known

DONATE
       You    can    support	convmv	  by	doing	 a    donation,	   see
       <https://www.j3e.de/donate.html>

AUTHOR
       Bjoern JACKE

       Send mail to bjoern [at]	j3e.de for bug reports and suggestions.

perl v5.36.3			  2025-04-18			     CONVMV(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=convmv&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages