-
Notifications
You must be signed in to change notification settings - Fork 46
rdedup v2.0.0 Release Notes
I'm happy to announce rdedup v2.0.0!
There is no single milestone or a feature that is responsible for the new major release. Mostly it's following the mantra "release early, release often". It's been more than a year since rdedup v1.0.0 has been released.
rdedup v1.0.0 was mostly focused around my personal use case:
- being written in Rust (yay!)
- public key cryptography
- synchronization over Dropbox/Syncthing
Since v1.0.0 rdedup attracted some user base, and with time improved considerably:
-
rdedup storeperformance have been greatly improved, to the point where I'd like to think of it asripgrepof dedup[*]:storepipeline is zero-copy and extremely multi-threaded- new faster algorithms are implemented:
- default CDC ("chunking") algorithm is now FastCDC; FastCDC is state of the art and
rdedupis one of the first (maybe even the only, at the time of writting) Open Source data deduplication tool to have it - blake2s is now default hashing algo
- zstd is default for compression
- default CDC ("chunking") algorithm is now FastCDC; FastCDC is state of the art and
- almost all parts of
rdedupare now configurable with many algorithms to choose from - testing has been improved, particularly with end-to-end tests, giving greater confidence in rdedup reliability
-
-tflag has been introduced to help with timing different parts of the pipeline, to help finding performance bottlenecks - asynchronous IO architecture have been added, in preparation for over-the-network backends
[*] Take with a grain of salt. Use https://github.com/gilbertchen/benchmarking to draw your own conclusions.
I'd like to thank all the users providing me with a feedback, and most of all, all the contributors: it really helps my motiviation knowing that there are people using rdedup.
Having said that, rdedup is still mostly a one-man, spare time project, and should be treated as such. Since v1.0.0 there have been no reports of data loss or corruption, but it's hard to tell if it's because of rdedup reliability or just small userbase. :)
I'm very aware of project pain-points:
- current GC model is not very scalable, and may be too slow for datasets of TB or more. New
rdedupGC approach is on the roadmap for v3.0.0 and will feature incremental, scalable and efficient GC without compromising anything. - Network-backends are still not implemented.
The codebase is not as neat as it could be, and testing is not as comprehensive as it should be for a "production ready" product.
I am planning to continue development toward rdedup v3.0.0 in the master branch. v3 will have a different repository format, to enable more efficient GC and other features. I'll continue to add fixes and smaller-scope enhancements to v2, now living in 2.0.0 branch.
If you think rdedup seems like an interesting project, feel free to reach out! I'd be happy to mentor and help.