This repository was archived by the owner on Jun 2, 2024. It is now read-only.
Pipelined parallel extract #407
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One attempt to fix zip-rs/zip2#165.
Upsides
Lots and lots faster when extracting zips with many separate entries, or with large highly compressed individual entries:
The
*_pipelined_*
benchmarks use the new pipelined parallel extraction method, and the*_compressible_big
benchmarks demonstrate almost a 5x speedup, while the*_random
benchmarks demonstrate a 1.4x speedup. Note that the*_compressible_small
benchmark is slower in the pipelined case, but this is such a small input that we actually lose very little.Downsides
This brings in
rayon
and a few other dependencies which we would probably want to assign to a flag. As @NobodyXu mentioned in https://github.com/zip-rs/zip/issues/403#issuecomment-1712451398, this also imposes aClone
requirement on the reader:TODO
As mentioned above, this also loses performance against small inputs. I think a fully async approach with the
async-executor
crate might be a much cleaner approach than trying to scale our rayon threadpools up and down according to the size of the input.