Skip to content

Chinory/lndup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

69 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

lndup ๐Ÿงทโœจ

Hardlink duplicate files โ€” fast, scriptable, and Unix-friendly ๐Ÿงโšก

lndup finds duplicate files and replaces duplicates with hard links (so they share the same inode), saving disk space without changing file contents. Neat, right? ๐Ÿ˜ผ

Why lndup ๐Ÿค”

Most dedup tools are either:

  • ๐Ÿš€ fast but rigid (fixed rules), or
  • ๐Ÿงฉ flexible but clunky (external scripts, awkward integration).

lndup is different: you can inject tiny JavaScript functions as filters and keys โ€” naturally, like writing Node.js callbacks. One-liners welcome โœจ

Highlights ๐ŸŒŸ

  • โšก Asynchronous I/O (fast traversal on big trees)
  • ๐Ÿ“‰ Minimal metadata overhead: one stat() per file
  • ๐Ÿ” Hash only whatโ€™s necessary (โ€œleast filesโ€ strategy)
  • ๐Ÿง  JavaScript injection for filters & keys (standard Node.js fs.Stats API)

๐Ÿšซ Does not follow symbolic links.

Installation ๐Ÿ“ฆ

npm i -g lndup

Usage ๐Ÿงฐ

Usage: lndup [OPTION]... [PATH]...
Hardlink duplicate files.

  -n, --dry-run  don't link
  -v, --verbose  explain what is being done
  -q, --quiet    don't output extra information
  -i, --stdin    read more paths from stdin

  -f, --file     add a file filter
                 (stats: fs.Stats, path: string): boolean
  -d, --dir      add a directory filter
                 (stats: fs.Stats, path: string, files: string[]): boolean
  -k, --key      add a key to differentiate files
                 (stats: fs.Stats, path: string): any
  -H, --hash     select a digest algorithm, default: sha1
                 run 'openssl list -digest-algorithms' for available algorithms.

  -h, --help     display this help and exit
  -V, --version  output version information and exit

Quick start ๐Ÿš€

1) Dry-run first (recommended) โœ…

lndup -n -v /path/to/scan

2) Actually hardlink duplicates ๐Ÿงท

lndup -v /path/to/scan

3) Feed paths via stdin ๐Ÿœ

find /data -type f -print0 | lndup -i -n -v

Output model (important!) ๐Ÿงพโœจ

lndup prints executable Unix-shell commands to stdout. Extra information is carried in # comments.

That means itโ€™s:

  • ๐Ÿ‘€ easy to review / audit,
  • ๐Ÿงช safe to start with --dry-run,
  • ๐Ÿงฐ easy to pipe into other tools.

Example:

$ lndup -v .
#Stat: probe: readdir       204B   3
#Stat: probe: stat     144.02MiB  23
#Stat: probe: select   144.02MiB  19
#Time: probe: 7.351ms
#Stat: verify: internal         0B  0
#Stat: verify: external  144.00MiB  9
#Stat: verify: total     144.00MiB  9
#Time: verify: 183.209ms
#Stat: solve: current  112.00MiB  7
#Time: solve: 0.110ms
ln -f -- '16M/null_2' '16M/null_3'
ln -f -- '16M/null_2' '16M/null_1'
ln -f -- '16M/ran1_1' '16M/ran1_2'
ln -f -- 'root/ran4_1' 'root/ran4_2'
ln -f -- 'root/ran4_1' 'root/ran4_2' #Error: EACCES: permission denied, rename 'root/ran4_2' -> 'root/ran4_2.e8c70ebe0635ab41'
#Stat: execute: todo  64.00MiB  3  4
#Stat: execute: done  48.00MiB  2  3
#Stat: execute: fail  16.00MiB  1  1
#Time: execute: 8.331ms

Scriptable filters & keys (core feature) ๐Ÿง ๐Ÿ”ง

File filter (-f) ๐Ÿ—‚๏ธ

Skip files smaller than 1024 bytes:

lndup /path -f 's => s.size >= 1024'

Only process files owned by a specific uid (example: 1001):

lndup /path -f '(s, p) => s.uid === 1001'

Directory filter (-d) ๐Ÿ“

Ignore directories with more than 100 files:

lndup /path -d '(s, p, files) => files.length <= 100'

Combine filters like LEGO ๐Ÿงฑ:

lndup /path \
  -f 's => s.size >= 1024' \
  -d '(s, p, files) => files.length <= 100'

Extra keys (-k) ๐Ÿ”‘

By default, duplicates are decided by content (hash). If you want to avoid hardlinking across different metadata constraints, add keys.

Example: separate by uid / gid / mode:

lndup /path -k 's => s.uid' -k 's => s.gid' -k 's => s.mode'

Require your own JS modules ๐Ÿ“Ž

lndup /path \
  -k 'require("/path/to/keyfunc.js")' \
  -f 'require("/path/to/filter.js")'

๐Ÿ’ก Tip: keep your filter/key functions pure and cheap (no I/O) for best performance.

Safety & recovery (aka โ€œwhat if something fails?โ€) ๐Ÿ›ก๏ธ๐Ÿงฏ

Hardlinking is fast, but itโ€™s not always โ€œatomicโ€ โ€” permissions, readonly dirs, racesโ€ฆ stuff happens ๐Ÿ˜…

So lndup executes each link as a 3-step mini-transaction:

  1. ๐Ÿท๏ธ Rename the target file to a temporary name with a random hex suffix (e.g. file โ†’ file.<hex>)
  2. ๐Ÿงท Create the hardlink (ln) from the chosen source inode to the target path
  3. ๐Ÿงน Delete the temporary file (rm file.<hex>) after the link succeeds

Recovery commands ๐Ÿงฐ

If something fails mid-way, lndup prints the failing command to stderr and may also emit recovery / rollback commands so you can restore consistency:

  • mv to put the original file back (undo step 1)
  • rm to remove a newly-created link if needed (undo step 2)

Example:

ln -f -- 'root/ran4_1' 'root/ran4_2' #Error: EACCES: permission denied, rename 'root/ran4_2' -> 'root/ran4_2.e8c70ebe0635ab41'

Safety-first rule ๐Ÿงก

Recovery commands are designed to be the inverse of just-completed steps, so failures are rare. But if a recovery command fails, youโ€™ll need to run it manually.

When in doubt, lndup prefers leaving extra files behind over risking data loss: space saving is optional, your data isnโ€™t โœ…

Requirements โœ…

  • Node.js >= 9

How it works (internals) ๐Ÿงฉ

Data structure ๐Ÿ—บ๏ธ

// nested maps
devMap                        // Map
sizeMap = devMap[stat.dev]    // Map
exkeyMap = sizeMap[stat.size] // Map
contentMap = exkeyMap[extraKeyValue] // Map
inoMap = contentMap[digest]   // Map
paths = inoMap[stat.ino]      // Array<string>

Pipeline ๐Ÿƒโ€โ™‚๏ธ

probe(paths).then(verify).then(solve).then(execute)
  • ๐Ÿ”Ž probe: async traverse inputs, stat() each file, group candidates
  • ๐Ÿงฎ verify: hash the least amount of data needed, group by digest
  • ๐Ÿง  solve: choose a โ€œmajority inodeโ€ as the link target
  • ๐Ÿ› ๏ธ execute: run (or dry-run) the generated link plan

License ๐Ÿ“œ

MIT ยฉ Chinory

About

Hardlink duplicate files! ๐Ÿค“

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors