FreeBSD Manual Pages

home | help
dupd(1)			    General Commands Manual		       dupd(1)

NAME
       dupd - find duplicate files

SYNOPSIS
       dupd COMMAND [OPTIONS]

DESCRIPTION
       dupd scans all the files	in the given path(s) to	find files with	dupli-
       cate content.

       The  sets of duplicate files are	not displayed during a scan.  Instead,
       the duplicate info is saved into	a database which can be	 queried  with
       subsequent commands without having to scan all files again.

       Even though dupd	can be used as a simple	duplicate reporting tool simi-
       lar  to	how  other duplicate finders work (by running dupd scan	; dupd
       report),	the real power of dupd comes from interactively	exploring  the
       filesystem  for	duplicates after the scan has completed. See the file,
       ls, dups, uniques and refresh commands.

       Additional documentation	and examples are available under the docs  di-
       rectory	in  the	 source	tree. If you don't have	the source tree	avail-
       able, see https://github.com/jvirkki/dupd/blob/master/docs/index.md

COMMANDS
       As noted	in the synopsis, the first argument to dupd must be  the  com-
       mand to run.  The command is one	of:

       scan - scan files looking for duplicates

       report -	show duplicate report from last	scan

       file - check for	duplicates of one file

       ls  - list info about every file

       dups - list all duplicate files

       uniques - list all unique files

       refresh - remove	deleted	files from the database

       validate	- revalidate all duplicates in database

       rmsh - create shell script to delete all	duplicates (use	with care!)

       help - show brief usage info

       usage - show this documentation

       man - show this documentation

       license - show license info

       version - show version and exit

OPTIONS
       scan - Perform the filesystem scan for duplicates.

       -p, --path PATH
	      Recursively  scan	the directory tree starting at this path.  The
	      path option can be given multiple	times to specify multiple  di-
	      rectory  trees to	scan.  If no path option is given, the default
	      is to start scanning from	the current directory.

       -m, --minsize SIZE
	      Minimum size (in bytes) to include  in  scan.   By  default  all
	      files  with  1 byte or more are scanned.	In practice duplicates
	      in files that small are rarely interesting, so you can speed  up
	      the scan by ignoring smaller files.

       --buflimit LIMIT
	      Limit read buffer	size. LIMIT may	be an integer in bytes,	or in-
	      clude  a	suffix of M for	megabytes or G for gigabytes. The scan
	      animation	shows the percentage of	buffer space in	use (%b).  Un-
	      less that	value goes up to 100% or beyond	during a scan there is
	      no need to adjust	this limit.  Setting this limit	to a low value
	      will  constrain dupd memory usage	but possibly at	a cost to per-
	      formance (depends	on the data set).

       -X, --one-file-system
	      For each path scanned, do	not cross over to a different filesys-
	      tem.  This is helpful, for example, if you want to  scan	/  but
	      want  to	avoid any other	mounted	filesystems such as NFS	mounts
	      or external drives.

       --hidden
	      Include hidden files (and	hidden directories) in the  scan.   By
	      default these are	not included.

       --db PATH
	      Override	the  default  database	file location.	The default is
	      $HOME/.dupd_sqlite.  If you override the path during  scan,  re-
	      member  to provide this argument and the path for	subsequent op-
	      erations so the database can be found.

       -I, --hardlink-is-unique
	      Consider hard links to the same file content as unique.  By  de-
	      fault  hard links	are listed as duplicates.  See HARD LINKS sec-
	      tion below.  Note	that if	this option is given during  scan,  it
	      cannot be	given during interactive operations.

       --stats-file FILE
	      On  completion,  create  (or append to) FILE and save some stats
	      from the run.  These are the same	stats as get displayed in ver-
	      bose mode	but are	more suitable for programmatic consumption.

       report -	Display	the list of duplicates.

       --cut PATHSEG
	      Remove prefix PATHSEG from the file paths	in the report  output.
	      This  can	 reduce	 clutter  in  the output text if all the files
	      scanned share a long identical prefix.

       --minsize SIZE
	      Report only duplicate sets which consume at least	this much disk
	      space.  Note this	is the total size occupied by all  the	dupli-
	      cates in a set, not their	individual file	size.

       --format	NAME
	      Produce  the report in this output format.  NAME is one of text,
	      csv, json.  The default is text.

       Note: The database format generated by scan is  not  guaranteed	to  be
       compatible  with	 future	 versions.  You	should run report (and all the
       other commands below which access the database) using the same  version
       of dupd that was	used to	generate the database.

       file - Report duplicate status of one file.

       To check	whether	one given file still has known duplicates use the file
       operation.   Note  that this does not do	a new scan so it will not find
       new duplicates.	This checks whether the	duplicates  identified	during
       the  previous  scan still exist and verifies (by	hash) whether they are
       still duplicates.

       --file PATH
	      Required:	The file to check

       --cut PATHSEG
	      Remove prefix PATHSEG from the file paths	in the report output.

       --exclude PATH
	      Ignore any duplicates  under  PATH  when	reporting  duplicates.
	      This  is	useful	if  you	intend to delete the entire tree under
	      PATH, to make sure you don't delete all copies of	the file.

       --hardlink-is-unique
	      Ignore the existence of hard links to the	file for  the  purpose
	      of considering whether the file is unique.

       ls, uniques, dups - List	matching files.

       While  the  file	 command checks	the duplicate status of	a single file,
       these commands do the same for all the files in a given directory tree.

       ls - List all files, show whether they have duplicates or not.

       uniques - List all unique files.

       dups - List all files which have	known duplicates.

       --path PATH
	      Start from this directory	(default is current directory)

       --cut PATHSEG
	      Remove prefix $PATHSEG from the file paths in the	output.

       --exclude PATH
	      Ignore any duplicates under PATH when reporting duplicates.

       --hardlink-is-unique
	      Ignore the existence of hard links to the	file for  the  purpose
	      of considering whether the file is unique.

       refresh - Refreshing the	database.

       As  you remove duplicate	files these are	still listed in	the dupd data-
       base.  Ideally you'd run	the scan again to rebuild the database.	  Note
       that  re-running	 the  scan  after deleting some	duplicates can be very
       fast because the	files are in the cache,	so that	is the best option.

       However,	when dealing with a set	of files large enough that they	 don't
       fit  in the cache, re-running the scan may take a long time.  For those
       cases the refresh command offers	a much faster alternative.

       The refresh command checks whether all the files	in the	dupd  database
       still exist and removes those which do not.

       Be sure to consider the limitations of this approach.  The refresh com-
       mand  does  not	re-verify  whether  all	files listed as	duplicates are
       still duplicates.  It also, of course, does not detect any  new	dupli-
       cates which may have appeared since the last scan.

       In  summary, if you have	only been deleting duplicates since the	previ-
       ous scan, run the refresh command.  It will prune all the deleted files
       from the	database and will be much faster than a	scan.  However,	if you
       have been adding	and/or modifying files since the last scan, it is best
       to run a	new scan.

       validate	- Validating the database.

       The validate operation is primarily for testing but is documented  here
       as it may be useful if you want to reconfirm that all duplicates	in the
       database	are still truly	duplicates.

       In  most	cases you will be better off re-running	the scan operation in-
       stead of	using validate.

       Validate	is fairly slow as it will fully	hash every file	in  the	 data-
       base.

       rmsh - Create shell scrip to remove duplicate files.

       As a policy dupd	never modifies the filesystem!

       As  a convenience for those times when it is desirable to automatically
       remove files, this operation can	create a shell script to do  so.   The
       output  is  a shell script (to stdout) which can	you run	to delete your
       files (if you're	feeling	lucky).

       Review the generated script carefully to	see if it truly	does what  you
       want!

       Automated  deletion is generally	not very useful	because	it takes human
       intervention to decide which of the duplicates is the best one to  keep
       in  each	 case.	 While the content is the same,	one of them may	have a
       better file name	and/or location.

       Optionally, the shell script can	create either soft or hard links  from
       each removed file to the	copy being kept.  The options are mutually ex-
       clusive.

       --link Create symlinks for deleted files.

       --hardlink
	      Create hard links	for deleted files.

       Additional global options

       -q     Quiet, suppress all output.

       -v     Verbose  mode.  Can be repeated multiple times for ever increas-
	      ing verbosity.

       -V, --verbose-level N
	      Set the logging verbosity	level directly to N.

       -h     Show brief help summary.

       --db PATH
	      Override the default database file location.

       -F, --hash NAME
	      Specify an different hash	function.  This	applies	to any command
	      which uses content hashing.  NAME	is one	of:  md5  sha1	sha512
	      xxhash

HARD LINKS
       Are  hard  links	duplicates or not?  The	answer depends on "what	do you
       mean by duplicates?" and	"what are you trying to	do?"

       If your primary goal for	removing duplicates is to save disk space then
       it makes	sense to ignore	hardlinks.  If,	on the other hand,  your  pri-
       mary  goal  is to reduce	filesystem clutter then	it makes more sense to
       think of	hardlinks as duplicates.

       By default dupd considers hardlinks as duplicates. You can switch  this
       around  with the	--hardlink-is-unique option.  This option can be given
       either during scan or to	the interactive	reporting commands (file,  ls,
       uniques,	dups).

EXAMPLES
       Scan  all files in your home directory and then show the	sets of	dupli-
       cates found:

	      %	dupd scan --path $HOME

	      %	dupd report

       Show duplicate status (duplicate	or unique) for all files in docs  sub-
       directory:

	      %	dupd ls	--path docs

       I'm  about  to delete docs/old.doc but want to check one	last time that
       it is a duplicate and I want to review where those duplicates are:

	      %	dupd file --file docs/old.doc -v

       Read the	documentation in the dupd 'docs' directory or online  documen-
       tation for more usage examples.

EXIT
       dupd exits with status code 0 on	success, non-zero on error.

SEE ALSO
       sqlite3(1)

       https://github.com/jvirkki/dupd/blob/master/docs/index.md

								       dupd(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=dupd&sektion=1&manpath=FreeBSD+15.0-RELEASE+and+Ports>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages