feat(storage): add compression primitives: FrameTable, V4 header#2246
feat(storage): add compression primitives: FrameTable, V4 header#2246
Conversation
…LZ4/zstd wrappers Foundation types for frame-based compression: CompressionType enum, FrameTable (U-space↔C-space offset mapping), FrameOffset/FrameSize, and encoder/decoder pooling for zstd and LZ4 block codecs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rtUploader interface CompressConfig with LaunchDarkly feature flag support, compressStream pipeline (parallel frame compression → ordered emit → concurrent upload), and GCS multipart partUploader implementation with zero-copy slice uploads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…leInfo, and validation V4 headers store per-mapping FrameTables and per-build file metadata (size + SHA-256 checksum). The mappings block is LZ4-compressed with a uint32 size prefix for exact-size decompression. Adds: SerializeHeader, DeserializeBytes (auto-detecting V3/V4), LoadHeader, StoreHeader, ValidateHeader, CloneForUpload, AddFrames, mergeFrameTables, and compressed path helpers on TemplateFiles. Existing Serialize/Deserialize/DeserializeBytes APIs preserved for backward compatibility — signature changes deferred to read-path PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused OverrideJSONFlag, propagate Subset errors through MergeMappings, align newCompressorPool signature with final, and trim redundant/trivial tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c6962a76e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
… cleanup, deterministic serialization - Switch back to LZ4 streaming API: handles incompressible data automatically, adds per-block xxHash32 checksums, pool encoders and decoders symmetrically with zstd - Move header LZ4 compress/decompress into header package (V4 wire format) - Always wait on uploadEG before returning (errors.Join) - Sort BuildFiles by UUID for deterministic V4 serialization - Remove dead decoderConcurrency key from CompressConfigFlag - Fix CompressConfigFromLDValue ctx parameter order Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… into lev-compression-primitives
| type CompressConfig struct { | ||
| Enabled bool `env:"COMPRESS_ENABLED" envDefault:"false"` | ||
| Type string `env:"COMPRESS_TYPE" envDefault:"zstd"` | ||
| Level int `env:"COMPRESS_LEVEL" envDefault:"2"` | ||
| FrameSizeKB int `env:"COMPRESS_FRAME_SIZE_KB" envDefault:"2048"` | ||
| TargetPartSizeMB int `env:"COMPRESS_TARGET_PART_SIZE_MB" envDefault:"50"` | ||
| FrameEncodeWorkers int `env:"COMPRESS_FRAME_ENCODE_WORKERS" envDefault:"4"` | ||
| EncoderConcurrency int `env:"COMPRESS_ENCODER_CONCURRENCY" envDefault:"1"` | ||
| } |
There was a problem hiding this comment.
isn't this defined by the feature flags already?
There was a problem hiding this comment.
we need to override the FF settings for the integration tests, and also to configure the service in the future without FF
| Length uint64 | ||
| BuildId uuid.UUID | ||
| BuildStorageOffset uint64 | ||
| FrameTable *storage.FrameTable |
There was a problem hiding this comment.
why do we have the frametable as pointer?
There was a problem hiding this comment.
Because it is not always present (not in V3), one 64 bit word instead of 4.
dobrac
left a comment
There was a problem hiding this comment.
I would remove the code paths and functions that are not called anywhere. Examples, not exhaustive: DataPath, CloneForUpload.
Addresses dobrac's review comments on the compression primitives PR:
Serialization:
- Split serialization.go into serialization_v3.go and serialization_v4.go
- Each version's wire format is self-contained in one file
- Move Metadata types/constants to metadata.go
- Remove unused exported Serialize() wrapper
V4 wire format:
- Replace packed CompressionTypeNumFrames uint64 with separate
CompressionType uint32 + NumFrames uint32 (no bit-shifting)
- Remove MaxCompressedHeaderSize limit (uint32 prefix + LZ4 frame
boundary are sufficient)
Naming and reuse:
- Rename AddFrames → SetFrames (replaces, not appends)
- Use SetFrames in MergeMappings instead of inline Subset calls
- Remove unnecessary maps.Clone in ToDiffHeader
Formatting:
- Use decimal (%d) instead of hex (%#x) in error messages and
String() methods for consistency with the rest of the codebase
Tests:
- Add test for uploadPartSlices (MD5 hashing, body concatenation)
Documentation:
- Flag BuildFiles incompleteness: only contains builds from current
upload session, not upstream dependencies
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Closed in favor of combined #2034, comments addressed there. |
- Remove ValidateHeader from NewHeader — avoids rejecting zero-size templates on deserialization and removes O(n log n) sort from the hot read path. ValidateHeader remains exported for CLI/diagnostic use. - Fix O(N²) in applyToHeader: add SubsetFrom/SetFramesFrom with cursor so consecutive sorted mappings walk the FrameTable once instead of rescanning from the start for each mapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Foundational types and utilities for compressed template storage, extracted as an independent PR ahead of the full compression read/write path.
Plugged in (affects runtime now)
MergeMappingsnow returnserror(propagatesFrameTable.Subseterrors throughtoDiffMapping→ToDiffHeader)NewMultipartUploaderWithRetryConfigaccepts optionalmetadataparam (existing caller passesnil)BuildMap.FrameTablefield added (nil for all current V3 headers — no behavior change)header.MetadatagainsMetadataVersionCompressedconstantshow-build-diffCLI updated for newMergeMappingssignaturePreparation only (no callers yet)
Subset,Range,FrameFor,GetFetchRangefor offset translationCompressLZ4/DecompressLZ4helperscompressStream(read → parallel compress → ordered upload),partUploaderinterface,CompressBytesResolveCompressConfigSerializeHeader/DeserializeByteswith FrameTable serialization,BuildFileInfo,ValidateHeaderCompressedDataPath,CompressedDataName,BaseFileNameCompressConfigFlag,CompressFileTypeKind/CompressUseCaseKindcontext kindsuploadPartSlices(zero-copy multi-slice upload),Start/UploadPart/Completemethods onMultipartUploader