Hardlink duplicate files โ fast, scriptable, and Unix-friendly ๐งโก
lndup finds duplicate files and replaces duplicates with hard links (so they share the same inode), saving disk space without changing file contents. Neat, right? ๐ผ
Most dedup tools are either:
- ๐ fast but rigid (fixed rules), or
- ๐งฉ flexible but clunky (external scripts, awkward integration).
lndup is different: you can inject tiny JavaScript functions as filters and keys โ naturally, like writing Node.js callbacks. One-liners welcome โจ
- โก Asynchronous I/O (fast traversal on big trees)
- ๐ Minimal metadata overhead: one
stat()per file - ๐ Hash only whatโs necessary (โleast filesโ strategy)
- ๐ง JavaScript injection for filters & keys (standard Node.js
fs.StatsAPI)
๐ซ Does not follow symbolic links.
npm i -g lndupUsage: lndup [OPTION]... [PATH]...
Hardlink duplicate files.
-n, --dry-run don't link
-v, --verbose explain what is being done
-q, --quiet don't output extra information
-i, --stdin read more paths from stdin
-f, --file add a file filter
(stats: fs.Stats, path: string): boolean
-d, --dir add a directory filter
(stats: fs.Stats, path: string, files: string[]): boolean
-k, --key add a key to differentiate files
(stats: fs.Stats, path: string): any
-H, --hash select a digest algorithm, default: sha1
run 'openssl list -digest-algorithms' for available algorithms.
-h, --help display this help and exit
-V, --version output version information and exit
lndup -n -v /path/to/scanlndup -v /path/to/scanfind /data -type f -print0 | lndup -i -n -vlndup prints executable Unix-shell commands to stdout.
Extra information is carried in # comments.
That means itโs:
- ๐ easy to review / audit,
- ๐งช safe to start with
--dry-run, - ๐งฐ easy to pipe into other tools.
Example:
$ lndup -v .
#Stat: probe: readdir 204B 3
#Stat: probe: stat 144.02MiB 23
#Stat: probe: select 144.02MiB 19
#Time: probe: 7.351ms
#Stat: verify: internal 0B 0
#Stat: verify: external 144.00MiB 9
#Stat: verify: total 144.00MiB 9
#Time: verify: 183.209ms
#Stat: solve: current 112.00MiB 7
#Time: solve: 0.110ms
ln -f -- '16M/null_2' '16M/null_3'
ln -f -- '16M/null_2' '16M/null_1'
ln -f -- '16M/ran1_1' '16M/ran1_2'
ln -f -- 'root/ran4_1' 'root/ran4_2'
ln -f -- 'root/ran4_1' 'root/ran4_2' #Error: EACCES: permission denied, rename 'root/ran4_2' -> 'root/ran4_2.e8c70ebe0635ab41'
#Stat: execute: todo 64.00MiB 3 4
#Stat: execute: done 48.00MiB 2 3
#Stat: execute: fail 16.00MiB 1 1
#Time: execute: 8.331msSkip files smaller than 1024 bytes:
lndup /path -f 's => s.size >= 1024'Only process files owned by a specific uid (example: 1001):
lndup /path -f '(s, p) => s.uid === 1001'Ignore directories with more than 100 files:
lndup /path -d '(s, p, files) => files.length <= 100'Combine filters like LEGO ๐งฑ:
lndup /path \
-f 's => s.size >= 1024' \
-d '(s, p, files) => files.length <= 100'By default, duplicates are decided by content (hash). If you want to avoid hardlinking across different metadata constraints, add keys.
Example: separate by uid / gid / mode:
lndup /path -k 's => s.uid' -k 's => s.gid' -k 's => s.mode'lndup /path \
-k 'require("/path/to/keyfunc.js")' \
-f 'require("/path/to/filter.js")'๐ก Tip: keep your filter/key functions pure and cheap (no I/O) for best performance.
Hardlinking is fast, but itโs not always โatomicโ โ permissions, readonly dirs, racesโฆ stuff happens ๐
So lndup executes each link as a 3-step mini-transaction:
- ๐ท๏ธ Rename the target file to a temporary name with a random hex suffix
(e.g.
fileโfile.<hex>) - ๐งท Create the hardlink (
ln) from the chosen source inode to the target path - ๐งน Delete the temporary file (
rm file.<hex>) after the link succeeds
If something fails mid-way, lndup prints the failing command to stderr and may also emit
recovery / rollback commands so you can restore consistency:
mvto put the original file back (undo step 1)rmto remove a newly-created link if needed (undo step 2)
Example:
ln -f -- 'root/ran4_1' 'root/ran4_2' #Error: EACCES: permission denied, rename 'root/ran4_2' -> 'root/ran4_2.e8c70ebe0635ab41'Recovery commands are designed to be the inverse of just-completed steps, so failures are rare. But if a recovery command fails, youโll need to run it manually.
When in doubt, lndup prefers leaving extra files behind over risking data loss:
space saving is optional, your data isnโt โ
- Node.js >= 9
// nested maps
devMap // Map
sizeMap = devMap[stat.dev] // Map
exkeyMap = sizeMap[stat.size] // Map
contentMap = exkeyMap[extraKeyValue] // Map
inoMap = contentMap[digest] // Map
paths = inoMap[stat.ino] // Array<string>probe(paths).then(verify).then(solve).then(execute)- ๐ probe: async traverse inputs,
stat()each file, group candidates - ๐งฎ verify: hash the least amount of data needed, group by digest
- ๐ง solve: choose a โmajority inodeโ as the link target
- ๐ ๏ธ execute: run (or dry-run) the generated link plan
MIT ยฉ Chinory