Skip to content

cmd: add --dedup-stats flag to ls; show per-file dedup stats#102

Merged
S4tvara merged 5 commits intoS4tvara:mainfrom
Janmesh23:add-ls-dedup-stats
Oct 6, 2025
Merged

cmd: add --dedup-stats flag to ls; show per-file dedup stats#102
S4tvara merged 5 commits intoS4tvara:mainfrom
Janmesh23:add-ls-dedup-stats

Conversation

@Janmesh23
Copy link
Copy Markdown
Contributor

This PR adds a --dedup-stats (-d) flag to sietch ls. When enabled, ls will:

  • Show number of shared chunks per file (shared_chunks).
  • Show estimated bytes avoided per file due to dedup (saved) using per-chunk EncryptedSize (fallbacks to Size, then 4MiB default).
  • List other files that share those chunks (shared_with) — truncated to first 10 entries.

Notes:

  • The saved bytes reflect how much duplicate data the file avoided storing (per-file view).
  • For global reclaimed space metrics we could show (refs-1)*chunkSize aggregated per chunk; can be implemented in a follow-up PR.
  • Includes unit test for computeDedupStatsForFile.
Screenshot 2025-10-05 at 9 03 25 PM
  • The above image shows the implementation of the cmd sietch ls --dedup-stats
image
  • The above image shows the implementation of the cmd sietch ls -l -d

Fixes: #96

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 5, 2025

Codecov Report

❌ Patch coverage is 20.17544% with 91 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/deduplication/util.go 0.00% 41 Missing ⚠️
internal/ls/ui.go 0.00% 31 Missing ⚠️
cmd/ls.go 54.76% 15 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 5, 2025

Hi @Janmesh23,
Thanks for your contribution. I’ve suggested some changes—please review them when you get a chance. Also, the target test code coverage is 11.64%, but your current PR is at 2.94%.

I was also able to reproduce the results:
image

@Janmesh23
Copy link
Copy Markdown
Contributor Author

Janmesh23 commented Oct 5, 2025

Hi @SubstantialCattle5 , thank you for reviewing , i would appreciate if you can tell me more about from where can i see the suggestions/changes (i'm a first time contributor hence need some help navigating )
again , thanks for your cooperation

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 5, 2025

image

there should be 3-4 comments tagged with review.

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 5, 2025

The Codecov report shows that the patch coverage for this PR is only 2.94%, with 99 lines in cmd/ls.go missing test coverage.
This means that most of the new or modified code in this PR isn’t covered by automated tests.

@Janmesh23
Copy link
Copy Markdown
Contributor Author

The Codecov report shows that the patch coverage for this PR is only 2.94%, with 99 lines in cmd/ls.go missing test coverage. This means that most of the new or modified code in this PR isn’t covered by automated tests.

sure , i'll get on it

@Janmesh23
Copy link
Copy Markdown
Contributor Author

image there should be 3-4 comments tagged with review.

hi @SubstantialCattle5 , I could not see the comments you added in file changes
please let me know where I can see them
sorry for the inconvenience

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 6, 2025

image there should be 3-4 comments tagged with review.

hi @SubstantialCattle5 , I could not see the comments you added in file changes please let me know where I can see them sorry for the inconvenience

@Janmesh23 ,
hi sure no worries I'll put them here itself.

image image image

…tion to internal/deduplication; update tests
@Janmesh23
Copy link
Copy Markdown
Contributor Author

Refactored dedup logic for better modularity and testability.
Exported ComputeDedupStatsForFile, FormatSharedWith, and DisplayShortFormat so cmd and tests can reuse them.
Moved UI helpers under internal/ls and kept dedup logic in internal/deduplication.

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 6, 2025

@Janmesh23,

you’ve committed your test vaults and bak files as well. Please remove them.

Cheers,
Nilay

@Janmesh23
Copy link
Copy Markdown
Contributor Author

Janmesh23 commented Oct 6, 2025

@Janmesh23,

you’ve committed your test vaults and bak files as well. Please remove them.

Cheers, Nilay

yeah my bad , doing it right away :)

edit - done 👍

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 6, 2025

image @Janmesh23 please follow the commit message guidelines explained in the `CONTRIBUTING.md`. Also, squash all your commits.

@S4tvara S4tvara merged commit 6e5b3a5 into S4tvara:main Oct 6, 2025
28 checks passed
@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 6, 2025

@Janmesh23,
I've squashed your commits from my side and also updated the commit message to follow the guidelines.
Please make sure to follow the standards defined for subsequent PRs.

Cheers,
Nilay

@Janmesh23
Copy link
Copy Markdown
Contributor Author

@Janmesh23, I've squashed your commits from my side and also updated the commit message to follow the guidelines. Please make sure to follow the standards defined for subsequent PRs.

Cheers, Nilay

yess sure !
thankyou for the opportunity @SubstantialCattle5 , i would like to work on more such issues in sietch and learn more .
looking forward to do so :)

@S4tvara
Copy link
Copy Markdown
Owner

S4tvara commented Oct 6, 2025

Sure just drop me a message whichever one you're interested in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show Deduplication Statistics in sietch ls

2 participants