Skip to content
This repository was archived by the owner on Jun 2, 2024. It is now read-only.

Conversation

cosmicexplorer
Copy link

@cosmicexplorer cosmicexplorer commented Sep 21, 2023

One attempt to fix zip-rs/zip2#165.

Upsides

Lots and lots faster when extracting zips with many separate entries, or with large highly compressed individual entries:

> cargo bench -- extract
# ...
running 6 tests
test extract_pipelined_compressible_big   ... bench:  81,470,730 ns/iter (+/- 6,342,681) = 449 MB/s
test extract_pipelined_compressible_small ... bench:     716,842 ns/iter (+/- 256,469) = 8 MB/s
test extract_pipelined_random             ... bench:  23,833,197 ns/iter (+/- 15,791,334) = 424 MB/s
test extract_sync_compressible_big        ... bench: 335,936,236 ns/iter (+/- 6,129,233) = 109 MB/s
test extract_sync_compressible_small      ... bench:     168,939 ns/iter (+/- 7,269) = 37 MB/s
test extract_sync_random                  ... bench:  33,435,824 ns/iter (+/- 1,222,875) = 302 MB/s

The *_pipelined_* benchmarks use the new pipelined parallel extraction method, and the *_compressible_big benchmarks demonstrate almost a 5x speedup, while the *_random benchmarks demonstrate a 1.4x speedup. Note that the *_compressible_small benchmark is slower in the pipelined case, but this is such a small input that we actually lose very little.

Downsides

This brings in rayon and a few other dependencies which we would probably want to assign to a flag. As @NobodyXu mentioned in https://github.com/zip-rs/zip/issues/403#issuecomment-1712451398, this also imposes a Clone requirement on the reader:

or requires the reader to implement Clone by storing File to be stored inside an Arc and keep track of the curent location of cursor in the reader

TODO

As mentioned above, this also loses performance against small inputs. I think a fully async approach with the async-executor crate might be a much cleaner approach than trying to scale our rayon threadpools up and down according to the size of the input.

@Pr0methean
Copy link
Member

Replaced with zip-rs/zip2#72.

@Pr0methean Pr0methean closed this May 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel extraction

2 participants