Dupln. This command-line application scans a specified directory for duplicate files and replaces duplicates with hard links to a single copy of the file. By doing so, it conserves storage space while preserving the file structure and accessibility.
If you find this project helpful, consider supporting me:
- Fast duplicate detection using file sizes, inodes, and MD5 hashes
- Space optimization by replacing duplicates with links
- Multiple operations:
- Statistics (
stat
) - Linking (
link
) - Listing unique files (
uniques
) - Listing duplicates (
duplicates
)
- Statistics (
> pip install dupln
# Show statistics about duplicates
dupln stat /path/to/directory
# Link duplicates using hardlinks (default)
dupln link /path/to/directory
# List unique files
dupln uniques /path/to/directory
# List duplicate files
dupln duplicates /path/to/directory
> dupln link '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
INFO: ++ '/tmp/dupln/as/ci/i_letters' [2]
INFO: - '/tmp/dupln/as/cii_letters' - '/tmp/dupln/as/tmp7uwq0r4l' [1]
INFO: - '/tmp/dupln/as/ci/i_/letters' - '/tmp/dupln/as/ci/i_/tmp0beeaxht' [0]
INFO: ++ '/tmp/dupln/as/ci/i_uppercase' [2]
INFO: - '/tmp/dupln/as/ci/i_/uppercase' - '/tmp/dupln/as/ci/i_/tmpcsykrlv5' [1]
INFO: - '/tmp/dupln/as/cii_uppercase' - '/tmp/dupln/as/tmp5knmbazf' [0]
INFO: ++ '/tmp/dupln/as/ci/i_/lowercase' [2]
INFO: - '/tmp/dupln/as/ci/i_lowercase' - '/tmp/dupln/as/ci/tmpxeegm9eu' [1]
INFO: - '/tmp/dupln/as/cii_lowercase' - '/tmp/dupln/as/tmp8ra1cf6z' [0]
INFO: ++ '/tmp/dupln/di/gits' [1]
INFO: - '/tmp/dupln/di/gi/ts' - '/tmp/dupln/di/gi/tmp80gznyej' [0]
INFO: ++ '/tmp/dupln/he/xd/ig/its' [2]
INFO: - '/tmp/dupln/he/xd/igits' - '/tmp/dupln/he/xd/tmpg3jm_ttb' [1]
INFO: - '/tmp/dupln/he/xdigits' - '/tmp/dupln/he/tmp2nqxy47g' [0]
INFO: ++ '/tmp/dupln/oc/td/igits' [2]
INFO: - '/tmp/dupln/oc/tdigits' - '/tmp/dupln/oc/tmpodvxqodo' [1]
INFO: - '/tmp/dupln/oc/td/ig/its' - '/tmp/dupln/oc/td/ig/tmp1um7nupk' [0]
INFO: ++ '/tmp/dupln/pr/intable' [2]
INFO: - '/tmp/dupln/pr/in/ta/ble' - '/tmp/dupln/pr/in/ta/tmploz2qhry' [1]
INFO: - '/tmp/dupln/pr/in/table' - '/tmp/dupln/pr/in/tmptf8egynt' [0]
INFO: ++ '/tmp/dupln/pu/nc/tu/ation' [2]
INFO: - '/tmp/dupln/pu/nctuation' - '/tmp/dupln/pu/tmp4yjomdni' [1]
INFO: - '/tmp/dupln/pu/nc/tuation' - '/tmp/dupln/pu/nc/tmpp0hsusw1' [0]
INFO: ++ '/tmp/dupln/wh/it/es/pace' [2]
INFO: - '/tmp/dupln/wh/it/espace' - '/tmp/dupln/wh/it/tmpd2plpkm7' [1]
INFO: - '/tmp/dupln/wh/itespace' - '/tmp/dupln/wh/tmpg7bw47b1' [0]
INFO: Total disk_size 564b; files 35; inodes 35; linked 17; same_hash 9; same_size 8; size 1.1k; uniq_hash 9;
> dupln uniques '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
/tmp/dupln/as/ci/i_/letters
/tmp/dupln/ascii_letters
/tmp/dupln/as/cii_uppercase
/tmp/dupln/as/cii_lowercase
/tmp/dupln/ascii_lowercase
/tmp/dupln/ascii_uppercase
/tmp/dupln/di/gi/ts
/tmp/dupln/digits
/tmp/dupln/he/xd/igits
/tmp/dupln/hexdigits
/tmp/dupln/oc/td/igits
/tmp/dupln/octdigits
/tmp/dupln/pr/in/table
/tmp/dupln/printable
/tmp/dupln/pu/nctuation
/tmp/dupln/punctuation
/tmp/dupln/wh/itespace
/tmp/dupln/whitespace
INFO: Total devices 1; disk_size 564b; files 35; inodes 18; same_ino 9; size 1.1k; unique_size 8;
> dupln stat '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
INFO: Total disk_size 564b; files 35; inodes 18; same_ino 9; same_size 8; size 1.1k;
- disk_size - total size excluding duplicate files
- size - total size including duplicate files
- files - total files found
- inodes - total unique i-nodes found
- same_ino - total unique i-nodes found at least twice
- same_size - total unique size found at least twice
- same_hash - total unique hash found at least twice
- unique_size - total unique size found
- unique_hash - total unique file hash found
# Use symbolic links instead of hardlinks
dupln link --linker os.symlink /path/to/dir
# Filter by size range (1MB to 10MB)
dupln duplicates --sizes 1M..10M /path/to/dir
# Continue on errors
dupln link --carry-on /path/to/dir
- Dry-run mode: Use
stat
command first to preview - Atomic operations: Temporary files used during linking
- Error recovery:
--carry-on
continues after errors
Option | Description |
---|---|
os.link |
Python hardlinks (default) |
os.symlink |
Python symbolic links |
ln |
System hardlinks (ln command) |
lns |
System symlinks (ln -s ) |
- Python 3.7+
- Optional:
PyYAML
for YAML output support
Note: Always back up important data before running file operations.