Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
JDUPES(1)		    General Commands Manual		     JDUPES(1)

NAME
       jdupes -	finds and performs actions upon	duplicate files

SYNOPSIS
       jdupes [	options	] DIRECTORIES ...

DESCRIPTION
       Searches	the given path(s) for duplicate	files. Such files are found by
       comparing  file sizes, then partial and full file hashes, followed by a
       byte-by-byte comparison.	The default behavior with no other "action op-
       tions" specified	(delete, summarize, link, dedupe, etc.)	 is  to	 print
       sets of matching	files.

OPTIONS
       -@ --loud
	      output annoying low-level	debug info while running

       -0 --print-null
	      when  printing  matches,	use null bytes instead of CR/LF	bytes,
	      just like	'find -print0' does. This has no effect	with  any  ac-
	      tion  mode other than the	default	"print matches"	(delete, link,
	      etc. will	still print normal line	endings	in the output.)

       -1 --one-file-system
	      do not match files that are on different filesystems or devices

       -A --no-hidden
	      exclude hidden files from	consideration

       -B --dedupe
	      call same-extents	ioctl or clonefile() to	trigger	a  filesystem-
	      level  data  deduplication on disk (known	as copy-on-write, CoW,
	      cloning, or  reflink);  only  a  few  filesystems	 support  this
	      (BTRFS;  XFS when	mkfs.xfs was used with -m crc=1,reflink=1; Ap-
	      ple APFS)

       -C --chunk-size=number-of-KiB
	      set the I/O chunk	size manually; larger values may improve  per-
	      formance	on rotating media by reducing the number of head seeks
	      required,	but also increases memory usage	and can	reduce perfor-
	      mance in some cases

       -D --debug
	      if this feature is compiled in, show  debugging  statistics  and
	      info at the end of program execution

       -d --delete
	      prompt  user  for	 files	to  preserve, deleting all others (see
	      CAVEATS below)

       -e --error-on-dupe
	      exit on any duplicate found with status code 255

       -f --omit-first
	      omit the first file in each set of matches

       -H --hard-links
	      normally,	when two or more files point to	 the  same  disk  area
	      they are treated as non-duplicates; this option will change this
	      behavior

       -h --help
	      displays help

       -i --reverse
	      reverse (invert) the sort	order of matches

       -I --isolate
	      isolate each command-line	parameter from one another; only match
	      if the files are under different parameter specifications

       -j --json
	      produce JSON (machine-readable) output

       -L --link-hard
	      replace  all duplicate files with	hardlinks to the first file in
	      each set of duplicates

       -m --summarize
	      summarize	duplicate file information

       -M --print-summarize
	      print matches and	summarize the duplicate	 file  information  at
	      the end

       -N --no-prompt
	      when  used  together  with  --delete, preserve the first file in
	      each set of duplicates and delete	the others  without  prompting
	      the user

       -O --param-order
	      parameter	 order	preservation is	more important than the	chosen
	      sort; this is particularly useful	with the -N option  to	ensure
	      that automatic deletion behaves in a controllable	way

       -o --order=WORD
	      order  files according to	WORD: time - sort by modification time
	      name - sort by filename (default)

       -p --permissions
	      don't consider files with	different  owner/group	or  permission
	      bits as duplicates

       -P --print=type
	      print  extra  information	 to stdout; valid options are: early -
	      matches that pass	early size/permission/link/etc.	checks partial
	      -	files whose partial hashes match fullhash - files  whose  full
	      hashes match

       -Q --quick
	      [WARNING:	 RISK  OF  DATA	 LOSS, SEE CAVEATS] skip byte-for-byte
	      verification of duplicate	pairs (use hashes only)

       -q --quiet
	      hide progress indicator

       -R --recurse:
	      for each directory given after this option follow	subdirectories
	      encountered within (note the ':' at the end of option;  see  the
	      Examples section below for further explanation)

       -r --recurse
	      for  every  directory  given  follow  subdirectories encountered
	      within

       -l --link-soft
	      replace all duplicate files with symlinks	to the first  file  in
	      each set of duplicates

       -S --size
	      show size	of duplicate files

       -s --symlinks
	      follow symlinked directories

       -T --partial-only
	      [WARNING:	EXTREME	RISK OF	DATA LOSS, SEE CAVEATS]	match based on
	      hash of first block of file data,	ignoring the rest

       -U --no-trav-check
	      disable double-traversal safety check (BE	VERY CAREFUL)

       -u --print-unique
	      print only a list	of unique (non-duplicate, unmatched) files

       -v --version
	      display jdupes version and compilation feature flags

       -y --hash-db=file
	      create/use  a hash database text file to speed up	future runs by
	      caching file hash	data

       -X --ext-filter=spec:info
	      exclude/filter files based on specified criteria;	 general  for-
	      mat:

	      jdupes -X	filter[:value][size_suffix]

	      Some  filters take no value or multiple values. Filters that can
	      take a numeric option generally  support	the  size  multipliers
	      K/M/G/T/P/E  with	 or  without an	added iB or B. Multipliers are
	      binary-style unless the -B suffix	is used, which will use	 deci-
	      mal  multipliers.	 For  example,	16k  or	 16kib = 16384;	16kb =
	      16000. Multipliers are case-insensitive.

	      Filters have cumulative effects: jdupes -X size+:99 -X size-:101
	      will cause only files of exactly 100 bytes in  size  to  be  in-
	      cluded.

	      Extension	matching is case-insensitive.  Path substring matching
	      is case-sensitive.

	      Supported	filters	are:

	      `size[+-=]:number[suffix]'
		     match  only  if  size  is	greater	(+), less than (-), or
		     equal to (=) the specified	number.	The +/-	and  =	speci-
		     fiers  can	 be combined, i.e.  "size+=:4K"	will only con-
		     sider files with a	size greater than  or  equal  to  four
		     kilobytes (4096 bytes).

	      `noext:ext1[,ext2,...]'
		     exclude  files  with certain extension(s),	specified as a
		     comma-separated list. Do not use a	leading	dot.

	      `onlyext:ext1[,ext2,...]'
		     only include files	with certain  extension(s),  specified
		     as	a comma-separated list.	Do not use a leading dot.

	      `nostr:text_string'
		     exclude  all  paths containing the	substring text_string.
		     This scans	the full file path, so it can be used to match
		     directories: -X nostr:dir_name/

	      `onlystr:text_string'
		     require all paths to contain the  substring  text_string.
		     This scans	the full file path, so it can be used to match
		     directories: -X onlystr:dir_name/

	      `newer:datetime`
		     only  include files newer than specified date.  Date/time
		     format: "YYYY-MM-DD HH:MM:SS" (time is optional).

	      `older:datetime`
		     only include files	older than specified date.   Date/time
		     format: "YYYY-MM-DD HH:MM:SS" (time is optional).

       -z --zero-match
	      consider	zero-length  files to be duplicates; this replaces the
	      old default behavior when	-n was not specified

       -Z --soft-abort
	      if the user aborts the program  (as  with	 CTRL-C)  act  on  the
	      matches that were	found before the abort was received. For exam-
	      ple,  if -L and -Z are specified,	all matches found prior	to the
	      abort will be hard linked. The default behavior without -Z is to
	      abort without taking any actions.

NOTES
       A set of	arrows are used	in hard	linking	to show	what action was	 taken
       on each link candidate. These arrows are	as follows:

       ---->  This  file was successfully hard linked to the first file	in the
	      duplicate	chain

       -@@->  This file	was successfully symlinked to the first	 file  in  the
	      chain

       -##->  This  file  was  successfully  cloned from the first file	in the
	      chain

       -==->  This file	was already a hard link	to the first file in the chain

       -//->  Linking this file	failed due to  an  error  during  the  linking
	      process

       Duplicate  files	are listed together in groups with each	file displayed
       on a separate line. The groups are then separated from  each  other  by
       blank lines.

EXAMPLES
       jdupes a	--recurse: b
	      will follow subdirectories under b, but not those	under a.

       jdupes a	--recurse b
	      will follow subdirectories under both a and b.

       jdupes -O dir1 dir3 dir2
	      will  always  place 'dir1' results first in any match set	(where
	      relevant)

CAVEATS
       Using -1	or --one-file-system prevents matches that cross  filesystems,
       but  a more relaxed form	of this	option may be added that allows	cross-
       matching	for all	filesystems that each parameter	is present on.

       When using -d or	--delete, care should be taken to insure against acci-
       dental data loss.

       -Z or --soft-abort used to be --hardabort in jdupes prior to  v1.5  and
       had  the	 opposite  behavior.   Defaulting to taking action on abort is
       probably	not what most users  would  expect.  The  decision  to	invert
       rather  than  reassign to a different option was	made because this fea-
       ture was	still fairly new at the	time of	the change.

       The -O or --param-order option allows the  user	greater	 control  over
       what  appears  in  the  first position of a match set, specifically for
       keeping the -N option from deleting all but one file  in	 a  set	 in  a
       seemingly  random  way.	All  directories specified on the command line
       will be used as the sorting order of result sets	first, followed	by the
       sorting algorithm set by	the -o or --order option. This means that  the
       order  of all match pairs for a single directory	specification will re-
       tain the	old sorting behavior even if this option is specified.

       When used together with options -s or --symlink,	a user could  acciden-
       tally preserve a	symlink	while deleting the file	it points to.

       The -Q or --quick option	only reads each	file once, hashes it, and per-
       forms comparisons based solely on the hashes. There is a	small but sig-
       nificant	 risk of a hash	collision which	is the purpose of the failsafe
       byte-for-byte comparison	that this option explicitly bypasses.  Do  not
       use  it	on ANY data set	for which any amount of	data loss is unaccept-
       able. This option is not	included in the	help text for the program  due
       to its risky nature.  You have been warned!

       The -T or --partial-only	option produces	results	based on a hash	of the
       first  block of file data in each file, ignoring	everything else	in the
       file. Partial hash checks have always been an important exclusion  step
       in  the	jdupes algorithm, usually hashing the first 4096 bytes of data
       and allowing files that are different  at  the  start  to  be  rejected
       early.  In certain scenarios it may be a	useful heuristic for a user to
       see that	a set of files has the same size and the same  starting	 data,
       even if the remaining data does not match; one example of this would be
       comparing files with data blocks	that are damaged or missing such as an
       incomplete file transfer	or checking a data recovery against known-good
       copies  to  see	what damaged data can be deleted in favor of restoring
       the known-good copy. This option	is meant to be used with informational
       actions and can result in EXTREME DATA LOSS if used with	 options  that
       delete  files,  create hard links, or perform other destructive actions
       on data based on	the matching output. Because of	the potential for mas-
       sive data destruction, this option MUST BE SPECIFIED TWICE to take  ef-
       fect and	will error out if it is	only specified once.

       Using  the -C or	--chunk-size option to override	I/O chunk size can in-
       crease performance on rotating storage media by reducing	"head  thrash-
       ing,"  reading larger amounts of	data sequentially from each file. This
       tunable size can	have bad side effects; the default size	maximizes  al-
       gorithmic  performance without regard to	the I/O	characteristics	of any
       given device and	uses a modest amount of	memory,	but other  values  may
       greatly increase	memory usage or	incur a	lot more system	call overhead.
       Try  several  different	values	to see how they	affect performance for
       your hardware and data set. This	option does not	affect	match  results
       in  any way, so even if it slows	down the file matching process it will
       not hurt	anything.

       The -y or --hash-db feature creates and maintains a text	 file  with  a
       list  of	 file paths, hashes, and other metadata	that enables jdupes to
       "remember" file data across runs. Specifying a period '.' as the	 data-
       base  file  name	 will  use a name of "jdupes_hashdb.txt" instead; this
       alias makes it easy to use the hash database feature without  typing  a
       descriptive name	each time. THIS	FEATURE	IS CURRENTLY UNDER DEVELOPMENT
       AND HAS MANY QUIRKS. USE	IT AT YOUR OWN RISK. In	particular, one	of the
       biggest problems	with this feature is that it stores every path exactly
       as  specified  on the command line; if any paths	are passed into	jdupes
       on a subsequent run with	a different prefix then	they will not be  rec-
       ognized	and they will be treated as totally different files. For exam-
       ple, running jdupes -y .	foo/ is	not the	same as	jdupes -y . ./foo  nor
       the  same  as (from a sibling directory)	jdupes -y ../foo. You must run
       jdupes from the same working directory and with the same	path  specifi-
       cations	to take	advantage of the hash database feature.	When used cor-
       rectly, a fully populated hash database can reduce subsequent runs with
       hundreds	of thousands of	files that normally take a very	long  time  to
       run  down  to  the directory scanning time plus a couple	of seconds. If
       the directory data is already in	the OS disk cache, this	can make  sub-
       sequent runs with over 100K files finish	in under one second.

REPORTING BUGS
       Send  bug  reports and feature requests to jody@jodybruchon.com,	or for
       general information and help, visit www.jdupes.com

SUPPORTING DEVELOPMENT
       If you find this	software useful, please	consider financially  support-
       ing its development through the author's	home page:

       https://www.jodybruchon.com/

AUTHOR
       jdupes is created and maintained	by Jody	Bruchon	<jody@jodybruchon.com>
       and was forked from fdupes 1.51 by Adrian Lopez <adrian2@caribe.net>

LICENSE
       MIT License

       Copyright (c) 2015-2023 Jody Lee	Bruchon	<jody@jodybruchon.com>

       Permission is hereby granted, free of charge, to	any person obtaining a
       copy  of	 this  software	and associated documentation files (the	"Soft-
       ware"), to deal in the Software without restriction, including  without
       limitation the rights to	use, copy, modify, merge, publish, distribute,
       sublicense,  and/or  sell copies	of the Software, and to	permit persons
       to whom the Software is furnished to do so, subject  to	the  following
       conditions:

       The above copyright notice and this permission notice shall be included
       in all copies or	substantial portions of	the Software.

       THE SOFTWARE IS PROVIDED	"AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
       OR  IMPLIED,  INCLUDING	BUT  NOT  LIMITED  TO  THE  WARRANTIES OF MER-
       CHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN
       NO  EVENT  SHALL	 THE  AUTHORS  OR  COPYRIGHT HOLDERS BE	LIABLE FOR ANY
       CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN	 ACTION	 OF  CONTRACT,
       TORT OR OTHERWISE, ARISING FROM,	OUT OF OR IN CONNECTION	WITH THE SOFT-
       WARE OR THE USE OR OTHER	DEALINGS IN THE	SOFTWARE.

								     JDUPES(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=jdupes&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help