Next / Previous / Contents / Shipman's homepage

2. Operation of the deduper script

In order to find large duplicate files in a specific directory structure, use a command of this form.

deduper [OPTION] ... [DIR]

Command line options include:

--size, -s

Specifies the minimum size in bytes for a file to be considered “large”. The default is 100,000 bytes. The value must be a number, optionally followed by a units specifier: “k” for thousands, “m” for millions, or “g” for billions. This code is case-insensitive.

For example, the option “--size=2.5M” would consider only files of 2,500,000 bytes or larger.

The optional DIR argument specifies a root directory to be searched for duplicates. The default value is “.”, the current working directory.

The standard output of the script is a report displaying zero or more sets of duplicate files. Each set starts with a separator line “----”, followed by the sizes and path names of the duplicates in that set. Here is some sample output showing a group of two duplicates and a group of three:

----
      109682 /u/shipman/www/soft/hueyims/good-huey
      109682 /u/shipman/www/soft/hueyims/huey.py
----
      169923 /u/shipman/www/soft/isoents/isoents.ent
      169923 /u/shipman/www/soft/rnc/isoents.ent
      169923 /u/shipman/www/soft/tracecase/isoents.ent