Skip to content

Improve Zstd binary provided#93

Merged
KillingSpark merged 11 commits intoKillingSpark:masterfrom
zleyyij:clap
Oct 15, 2025
Merged

Improve Zstd binary provided#93
KillingSpark merged 11 commits intoKillingSpark:masterfrom
zleyyij:clap

Conversation

@zleyyij
Copy link
Contributor

@zleyyij zleyyij commented Sep 30, 2025

Closes #38

This PR introduces the addition of several crates to improve the ergonomics of the CLI.

The new CLI has been designed to be simple, ergonomic, and scale well as more features are added. It has a real time progress bar, along with an ETA. Once processing is complete, it presents basic stats about what happened.

If demand is present, backwards compatibility could be added under a compat subcommand.

Future work building on this PR might involve the addition of a bench subcommand, and/or a gen-dict subcommand.

Attached is a recording of the decompress command in action

asciicast

@KillingSpark
Copy link
Owner

KillingSpark commented Oct 1, 2025

That progress bar is slick!

I think the result page should show the ratio inverted

0.93GiB ——> 406.89MiB (2.34x) makes we wonder why it would come out 2.34x time bigger, when it is actually 0.43x smaller. Maybe that's just me though :)

I was also wondering if it might make sense to make this repo a cargo workspace and put the zstd binary into it's onw crate? This might make dependency management and crate features less messy

@zleyyij
Copy link
Contributor Author

zleyyij commented Oct 1, 2025

I agree that the ratio feels inverted, but it's how Facebook represents it (and I think most entities). It could be replaced with y% of original, or z% smaller if you feel it would be more intuitive.

https://facebook.github.io/zstd/

Switching to a workspace makes sense, I'd be down for that and can implement that change if needed.

@fintelia
Copy link
Contributor

fintelia commented Oct 2, 2025

The C-based zstd utility outputs the compression ratio as a percentage:

$ zstd foo.txt 
foo.txt              : 33.91%   (  4.43 KiB =>   1.50 KiB, foo.txt.zst)

@KillingSpark
Copy link
Owner

Switching to a workspace makes sense, I'd be down for that and can implement that change if needed.

That would be awesome!

I agree that the ratio feels inverted, but it's how Facebook represents it (and I think most entities).

I think that's just for benchmarking purposes (or rather for purposes of cool graphs). "Line go up" is visually more impressive than "Line go down". Since the C impl seem to show this as a precentage then that's the way I'd go too :)

@zleyyij
Copy link
Contributor Author

zleyyij commented Oct 4, 2025

Okay, switching to a workspace has been done, and the ratio was switched to a percentage (as well as a more granular time display)

Some future discussion topics:

  • Can we make the test that decodes every file in local_corpus_files skipped by default, or removed entirely? IMO those are more integration tests, and I find myself renaming the local_corpus_files directory temporarily just so I can avoid processing gigabytes of data for every feature flag combo. Alternatively, updating the .gitignore to ignore any file with the local_ prefix
  • What do you think about adding a task runner to the repository? (i.e just or cargo xtask). This would be done to make running CI locally easier, you could have something like just nightly-ci and just check. While you can run CI locally with act, or by copy pasting the commands, act is slower than bare metal, repeats a lot of work, and creates a 30+gb docker cache for this repo alone. Copy pasting the commands is slow and easy to miss.

@KillingSpark
Copy link
Owner

Since we now have a workspace, maybe the local_corpus_files test can go into a separate crate too? Something like ruzstd_verification.

What do you think about adding a task runner to the repository?

I think at the scale of the CI we have right now we can just add a few scripts that run the commands of the CI locally instead of depending on yet another tool. Do you think that's reasonable or would a taskrunner bring any benefits I'm overlooking right now?

(also sorry for the late response, didn't see that this PR moved from draft to open, I'll merge this soon)

@KillingSpark KillingSpark merged commit eff39ae into KillingSpark:master Oct 15, 2025
2 checks passed
@KillingSpark
Copy link
Owner

I created #95 to track the issue of the local_ tests

@zleyyij
Copy link
Contributor Author

zleyyij commented Oct 17, 2025

I think a separate crate could be beneficial, but it could possibly be broader than that, and be a dedicated crate for benchmarking, testing, and improving.

Your feedback on a task runner is valuable and I see where you're coming from. I don't love the ecosystem around pretty much any script right now, but maybe when cargo script lands this can be re-evaluated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the zstd binary provided

3 participants