Skip to content

zstdfile: ensure we do not read more than size / IO_BLOCK_SIZE#217

Merged
Khatskevich merged 1 commit intomainfrom
joelynch/memory-usage-zstd
Dec 23, 2025
Merged

zstdfile: ensure we do not read more than size / IO_BLOCK_SIZE#217
Khatskevich merged 1 commit intomainfrom
joelynch/memory-usage-zstd

Conversation

@joelynch
Copy link
Contributor

In the previous implementation, _ZtsdFileReader.read could produce output of arbirary size. This can cause memory spikes while decompressing a file. Instead, we should use a ZstdDecompressor.stream_reader which decompresses incrementally into a fixed size output buffer.

About this change - What it does

Resolves: #xxxxx

Why this way

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses memory spikes during zstd decompression by refactoring the _ZtsdFileReader class to use ZstdDecompressor.stream_reader() instead of decompressobj(). The new approach enables incremental decompression with a bounded output buffer, ensuring that read() operations respect the requested size parameter and do not produce arbitrarily large outputs.

  • Replaced decompressobj() with stream_reader() for controlled, incremental decompression
  • Modified the read() method to honor size parameter and limit output to requested bytes
  • Added comprehensive test coverage for compression and decompression with size-bounded reads

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
rohmu/zstdfile.py Refactored _ZtsdFileReader to use stream_reader() for bounded decompression and updated read() method to respect size parameter
test/test_zstdfile.py Added new test case validating compression/decompression with size-limited reads to verify the fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@joelynch joelynch force-pushed the joelynch/memory-usage-zstd branch from c8f5835 to b80c669 Compare December 22, 2025 11:25
In the previous implementation, _ZtsdFileReader.read could produce
output of arbirary size. This can cause memory spikes while
decompressing a file. Instead, we should use a ZstdDecompressor.stream_reader
which decompresses incrementally into a fixed size output buffer.
@joelynch joelynch force-pushed the joelynch/memory-usage-zstd branch from b80c669 to 5406382 Compare December 22, 2025 11:27
@joelynch joelynch requested a review from a team December 22, 2025 11:41
@Khatskevich Khatskevich merged commit 0d87620 into main Dec 23, 2025
7 checks passed
@Khatskevich Khatskevich deleted the joelynch/memory-usage-zstd branch December 23, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants