Skip to content

Optimize LocalZipStorageHandler to avoid reopening ZIP file for each WARC record lookup #25

@leewesleyv

Description

@leewesleyv

Currently, the LocalZipStorageHandler reopens the ZIP file for each WARC record lookup and when fetching the index. This approach is inefficient and should therefore be optimized.

Proposed Changes

Use a context manager or an initialization process to open the ZIP file once and keep it open for subsequent operations.
Ensure the file is properly closed when the LocalZipStorageHandler instance is no longer needed (e.g., implement __enter__ and __exit__ methods).

Tasks

  • Modify LocalZipStorageHandler to keep the ZIP file open during its lifetime.
  • Implement __enter__ and __exit__ methods to support proper resource cleanup.
  • Update existing methods to use the persistent file handle.
  • Write tests to verify that the file handle is reused and closed properly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions