Skip to content

enhance: optimize Storage File Format #26

@tinswzy

Description

@tinswzy

Currently, Woodpecker writes logs sequentially to files to ensure high throughput and persistence guarantees. However, the current file format either does not utilize compression or loses the ability to efficiently perform partial reads when compression is enabled. This limits performance tuning and restricts future extensibility (e.g., range-based recovery, fast seeking).

The goal of this issue is to design and implement a new storage file format that supports both compression and efficient partial reads.
Goals:

  • Support compression: Reduce storage footprint and improve I/O efficiency.
  • Enable partial reads: Allow reading specific data blocks without decompressing the entire file.
  • Preserve sequential write performance: Writing must remain high-throughput and append-friendly.
  • Ensure extensibility: Format should be designed to support future metadata like checksums, versioning, and block indices.

Technical Suggestions:

  • Use block-based compression (e.g., compress every N records as a block).
  • Record metadata for each block: offset, length, compression type.
  • Add a lightweight block index section to enable fast seeks.
  • Consider compression algorithms like Snappy or Zstd for a balance of speed and ratio.

Any better solution is also acceptable

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions