Skip to content

feat(storage): add compression primitives: FrameTable, V4 header#2246

Closed
levb wants to merge 7 commits intomainfrom
lev-compression-primitives
Closed

feat(storage): add compression primitives: FrameTable, V4 header#2246
levb wants to merge 7 commits intomainfrom
lev-compression-primitives

Conversation

@levb
Copy link
Copy Markdown
Contributor

@levb levb commented Mar 27, 2026

Summary

Foundational types and utilities for compressed template storage, extracted as an independent PR ahead of the full compression read/write path.

Plugged in (affects runtime now)

  • MergeMappings now returns error (propagates FrameTable.Subset errors through toDiffMappingToDiffHeader)
  • NewMultipartUploaderWithRetryConfig accepts optional metadata param (existing caller passes nil)
  • BuildMap.FrameTable field added (nil for all current V3 headers — no behavior change)
  • header.Metadata gains MetadataVersionCompressed constant
  • show-build-diff CLI updated for new MergeMappings signature

Preparation only (no callers yet)

  • FrameTable: tracks per-frame compressed/uncompressed sizes, supports Subset, Range, FrameFor, GetFetchRange for offset translation
  • Codec pools: LZ4/zstd compressor pools with CompressLZ4/DecompressLZ4 helpers
  • Compressed upload pipeline: compressStream (read → parallel compress → ordered upload), partUploader interface, CompressBytes
  • CompressConfig: env-var and LaunchDarkly-based configuration with ResolveCompressConfig
  • V4 header format: SerializeHeader/DeserializeBytes with FrameTable serialization, BuildFileInfo, ValidateHeader
  • Template path helpers: CompressedDataPath, CompressedDataName, BaseFileName
  • Feature flags: CompressConfigFlag, CompressFileTypeKind/CompressUseCaseKind context kinds
  • Multipart upload: uploadPartSlices (zero-copy multi-slice upload), Start/UploadPart/Complete methods on MultipartUploader

levb and others added 4 commits March 27, 2026 10:54
…LZ4/zstd wrappers

Foundation types for frame-based compression: CompressionType enum,
FrameTable (U-space↔C-space offset mapping), FrameOffset/FrameSize,
and encoder/decoder pooling for zstd and LZ4 block codecs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rtUploader interface

CompressConfig with LaunchDarkly feature flag support, compressStream
pipeline (parallel frame compression → ordered emit → concurrent upload),
and GCS multipart partUploader implementation with zero-copy slice uploads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…leInfo, and validation

V4 headers store per-mapping FrameTables and per-build file metadata
(size + SHA-256 checksum). The mappings block is LZ4-compressed with a
uint32 size prefix for exact-size decompression.

Adds: SerializeHeader, DeserializeBytes (auto-detecting V3/V4),
LoadHeader, StoreHeader, ValidateHeader, CloneForUpload, AddFrames,
mergeFrameTables, and compressed path helpers on TemplateFiles.

Existing Serialize/Deserialize/DeserializeBytes APIs preserved for
backward compatibility — signature changes deferred to read-path PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused OverrideJSONFlag, propagate Subset errors through
MergeMappings, align newCompressorPool signature with final, and
trim redundant/trivial tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c6962a76e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

levb and others added 2 commits March 29, 2026 09:32
… cleanup, deterministic serialization

- Switch back to LZ4 streaming API: handles incompressible data
  automatically, adds per-block xxHash32 checksums, pool encoders
  and decoders symmetrically with zstd
- Move header LZ4 compress/decompress into header package (V4 wire format)
- Always wait on uploadEG before returning (errors.Join)
- Sort BuildFiles by UUID for deterministic V4 serialization
- Remove dead decoderConcurrency key from CompressConfigFlag
- Fix CompressConfigFromLDValue ctx parameter order

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +13 to +21
type CompressConfig struct {
Enabled bool `env:"COMPRESS_ENABLED" envDefault:"false"`
Type string `env:"COMPRESS_TYPE" envDefault:"zstd"`
Level int `env:"COMPRESS_LEVEL" envDefault:"2"`
FrameSizeKB int `env:"COMPRESS_FRAME_SIZE_KB" envDefault:"2048"`
TargetPartSizeMB int `env:"COMPRESS_TARGET_PART_SIZE_MB" envDefault:"50"`
FrameEncodeWorkers int `env:"COMPRESS_FRAME_ENCODE_WORKERS" envDefault:"4"`
EncoderConcurrency int `env:"COMPRESS_ENCODER_CONCURRENCY" envDefault:"1"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this defined by the feature flags already?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to override the FF settings for the integration tests, and also to configure the service in the future without FF

Length uint64
BuildId uuid.UUID
BuildStorageOffset uint64
FrameTable *storage.FrameTable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have the frametable as pointer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not always present (not in V3), one 64 bit word instead of 4.

Copy link
Copy Markdown
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the code paths and functions that are not called anywhere. Examples, not exhaustive: DataPath, CloneForUpload.

levb added a commit that referenced this pull request Mar 31, 2026
Addresses dobrac's review comments on the compression primitives PR:

  Serialization:
  - Split serialization.go into serialization_v3.go and serialization_v4.go
  - Each version's wire format is self-contained in one file
  - Move Metadata types/constants to metadata.go
  - Remove unused exported Serialize() wrapper

  V4 wire format:
  - Replace packed CompressionTypeNumFrames uint64 with separate
    CompressionType uint32 + NumFrames uint32 (no bit-shifting)
  - Remove MaxCompressedHeaderSize limit (uint32 prefix + LZ4 frame
    boundary are sufficient)

  Naming and reuse:
  - Rename AddFrames → SetFrames (replaces, not appends)
  - Use SetFrames in MergeMappings instead of inline Subset calls
  - Remove unnecessary maps.Clone in ToDiffHeader

  Formatting:
  - Use decimal (%d) instead of hex (%#x) in error messages and
    String() methods for consistency with the rest of the codebase

  Tests:
  - Add test for uploadPartSlices (MD5 hashing, body concatenation)

  Documentation:
  - Flag BuildFiles incompleteness: only contains builds from current
    upload session, not upstream dependencies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@levb
Copy link
Copy Markdown
Contributor Author

levb commented Mar 31, 2026

Closed in favor of combined #2034, comments addressed there.

@levb levb closed this Mar 31, 2026
levb added a commit that referenced this pull request Mar 31, 2026
- Remove ValidateHeader from NewHeader — avoids rejecting zero-size
  templates on deserialization and removes O(n log n) sort from the
  hot read path. ValidateHeader remains exported for CLI/diagnostic use.
- Fix O(N²) in applyToHeader: add SubsetFrom/SetFramesFrom with cursor
  so consecutive sorted mappings walk the FrameTable once instead of
  rescanning from the start for each mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants