feat(tsdb): pipeline-based numeric codec with self-describing format and per-field encoding#141353
feat(tsdb): pipeline-based numeric codec with self-describing format and per-field encoding#141353salvatore-campagna wants to merge 46 commits intoelastic:mainfrom
Conversation
This is a proof of concept exploring a different approach to encoding numeric fields in TSDB indices. Instead of the monolithic encoder used in ES819, this introduces a pipeline architecture where encoding stages can be composed together. The pipeline currently chains: delta -> offset -> gcd -> bitPack. Each stage transforms the data and passes it to the next, with metadata recorded so decoding can reverse the process. Additional stages exist for specific cases: PatchedPFor handles blocks with outliers, Zigzag handles signed values, and Zstd provides optional compression. What makes this a POC rather than production-ready: - Only numeric doc values use the new encoding. Sorted, binary, and sorted set fields are unchanged from ES819 -- they are copied as-is. - The pipeline configuration is static. Every numeric field gets the same delta-offset-gcd-bitPack chain regardless of data patterns. Production would need heuristics to pick the right stages. - Format is incompatible with ES819. The codec name and wire format differ, so this only works on fresh indices. No migration path exists. - Limited real-world validation. The tests verify correctness but the approach has not been validated against production workloads. The pipeline abstraction includes FieldDescriptor and PipelineDescriptor classes that are not currently used. These document the evolution path toward self-describing formats where pipeline configuration is written to metadata, enabling per-field pipelines and backward compatibility.
Pad values to the block size before compression so decoding always sees full blocks and add a partial-block round-trip test.
Short-circuit encoding when negatives are present and add a skip test for signed inputs.
Reuse a single decoder per producer and close the codec on close to avoid per-call allocation.
Add a shared helper to assert a stage is not applied and exercise skip paths for delta, offset, and gcd.
…agna/elasticsearch into feature/tsdb-pipeline-poc
es94 doc values format with pipeline-based numeric encoding
| this.numericBlockShift = blockShift; | ||
| this.numericBlockSize = 1 << blockShift; | ||
| this.numericBlockMask = numericBlockSize - 1; | ||
| this.numericCodec = ES94TSDBDocValuesFormat.createNumericCodec(numericBlockSize); |
There was a problem hiding this comment.
Encoder/decoder are created separately. Ideally we should inject a single NumericCodec from ES94TSDBDocValuesFormat to both the producer and consumer to guarantee symmetric pipeline construction and avoid future drift. Then the consumer would only create the encoder (numericCodec.newEncoder()) and the producer would only create the decoder (numericCodec.newDecoder()).
| this.numericBlockShift = numericBlockShift; | ||
| this.numericBlockSize = 1 << numericBlockShift; | ||
|
|
||
| this.numericCodec = ES94TSDBDocValuesFormat.createNumericCodec(numericBlockSize); |
| } | ||
|
|
||
| private int findOptimalBitWidth(long[] values, int valueCount, int maxBits, int maxExceptions) { | ||
| int[] histogram = new int[65]; |
There was a problem hiding this comment.
Will remove this allocation from the hot path.
…mprovements - Add ALP Float codec stages (transform, payload, RD variants) with AlpFloatUtils for 32-bit float ALP math - Add DeltaDelta codec stage for second-order differencing and route @timestamp fields through it via optimize_for in DateFieldMapper - Add FPC transform stage for floating-point compression - Fuse quantization into all ALP encode stages using fast rounding, eliminating separate QuantizeDouble stage transition for ALP combos - Derive ALP maxExponent from quantize maxError for tighter encoding - Right-size metadata buffer based on per-stage budgets - Add equals, hashCode, and toString to all pipeline stages - Add trace/debug logging to pipeline resolution and codec paths - Fix FieldDescriptor/skip-index write order in addNumericField - Clean up tests: inline variables, remove unnecessary comments, apply spotless formatting
ef370ea to
5416d16
Compare
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
es94 doc values format with pipeline-based numeric encoding…ALP utils bestE is always non-negative (ALP exponent in range [0, maxExponent]), so Math.abs() was unnecessary. Removes the two forbidden API violations.
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
…equiresExplicitClose Move Zstd native buffers to a thread-local ZstdBuffers holder with Cleaner-based lifecycle, eliminating per-instance buffer allocation. Remove requiresExplicitClose() from PayloadEncoder/PayloadDecoder interfaces and all implementations since it is no longer needed.
Add LZ4 as a new payload stage (StageId 0xA5) alongside Zstd and BitPack. Supports both fast compression (ESLZ4Compressor) and high compression (LZ4HCJavaSafeCompressor) via a boolean parameter. Thread-local byte[] buffers avoid per-instance allocation. Wire into StageFactory, PipelineConfig builders, and OptimalPipelines.
Pre-allocate at INITIAL_BLOCK_SIZE=512 to cover production block sizes (128 and 512) without any growth. Buffers grow via ensureCapacity() only for larger test block sizes.
Thread-local buffers make per-instance resource management unnecessary. Remove Closeable from PayloadEncoder, PayloadDecoder, pipelines, NumericEncoder, NumericDecoder, and the perFieldEncoders/Decoders tracking lists in ES94 consumer/producer.
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
Switch the default (no optimize_for hint) pipeline for TSDB double gauges, float gauges, and double counters from ALP/Gorilla to FPC with offset + gcd + bitpack. Timestamp fields use the default codec.
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
Default pipelines when no optimize_for hint is set in TSDB: - Double gauges: ALP with 1e-6 quantization + offset + gcd + bitpack - Float gauges: ALP float + offset + gcd + bitpack - Double counters: FPC + offset + gcd + bitpack - Float counters: FPC + offset + gcd + bitpack Timestamp fields use the default codec.
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
1 similar comment
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
Refactor DefaultPipelineResolver into explicit per-type methods: - Double gauges: ALP (1e-6) for storage/default, ALP (1e-12) for balanced - Double counters: Gorilla for storage, FPC for default - Float gauges: ALP float, Float counters: FPC - @timestamp: delta-of-delta + offset + gcd + bitpack (TSDB and LogsDB) Set optimize_for to storage in OTel metrics template for double gauges and double counters.
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
Remove PRECISION_MARGIN and MAX_F_CANDIDATES constraints that artificially limited the ALP exponent/factor search space. The original ALP paper (SIGMOD 24) searches all (e,f) combinations where f <= e <= maxExponent. Our implementation restricted e to estimatedPrecision+2 and f to 0..3, preventing discovery of high-exponent solutions (e.g. e=14, f=12) that achieve zero roundtrip failures for division-based decimal values like N/100.0. Also rename scratch buffers to positions/exceptions across all ALP encode stages and remove pointless local aliases.
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
Move quantize() from AlpDoubleUtils and quantizeInPlace() from AlpFloatUtils into a new QuantizeUtils class. All ALP, Chimp, and QuantizeDouble stages now call QuantizeUtils.quantizeDoubles() or QuantizeUtils.quantizeFloats() instead.
Add quantizeStep field to FpcTransformEncodeStage, following the same pattern as Chimp stages. Wire maxError through StageSpec.FpcStage, StageFactory, and PipelineConfig builders for both FPC and Chimp.
…PC decode Enable QUANTIZE_STORAGE for chimp double/float stages in balanced pipeline mode. Simplify FPC decode loop by replacing split full-byte + remainder iteration with a single loop using bit-indexed selector access. Pre-allocate the selector buffer in the constructor to avoid per-block allocation.
…Delta dates Switch balanced double/float gauges from chimp with quantization to lossless ALP. Fix date pipeline from delta-rle to deltaDelta to match the storage experiment that achieved -12.7% overall. Double counters keep FPC for balanced (with skip logic to prevent inflation).
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
… size resolvers Introduce infrastructure for deferring pipeline resolution to encode time: - Widen PipelineResolver to accept data samples, rename to StaticPipelineResolver - Add FieldDescriptorWriter for deferred descriptor writing after first encode - Add FieldContextResolver and BlockSizeResolver with static implementations - Wire deferred resolution into ES94TSDBDocValuesConsumer and format classes
Add data-driven pipeline selection that profiles each block at encode time: - BlockProfiler computes per-block statistics (range, GCD, runs, monotonicity, bit widths) - PipelineSelector maps block profiles to optimal pipeline configurations - AdaptivePipelineResolver orchestrates profiling and selection per field - Add index setting to toggle adaptive profiler (index.codec.tsdb.adaptive_encoding_profiler) - Wire adaptive resolver into PerFieldFormatSupplier with full test coverage
Add human-readable display names to StageId enum for log readability, describeStages() to PipelineConfig, and update PipelineDescriptor toString.
…bit-width reduction
|
Buildkite benchmark this with tsdb-metricsgen-270m please |
⏳ Build in-progress
This build attempts two tsdb-metricsgen-270m benchmarks to evaluate performance impact of this PR. To estimate benchmark completion time inspect previous nightly runs here. History
|
This PR introduces a pipeline-based numeric codec for TSDB indices (
es94) that replaces the monolithic encoding approach used byes87andes819. Instead of a fixed encoder that applies a single strategy to all numeric fields, encoding happens through a configurable chain of small, focused stages, each doing one transformation well. The pipeline is self-describing: each field writes a compact descriptor (FieldDescriptorcontaining aPipelineDescriptor) so the decoder reconstructs the correct pipeline dynamically, without implicit format knowledge.The pipeline currently handles
NUMERICandSORTED_NUMERICdoc values for three data types:LONG,DOUBLE, andFLOAT. ADefaultPipelineResolverselects the pipeline per field based on index mode (TSDB, LogsDB, standard), field type, metric type, and anoptimize_forhint. Non-numeric doc values (sorted, binary, sorted set) pass through unchanged using the existinges819code paths.Advantages over
es819es819uses the same encoding chain for every numeric field.es94selects a pipeline per field: timestamps get delta-of-delta with patched PFor bit-packing, gauge doubles get ALP-based floating-point compression, counters get Gorilla XOR encoding. The resolver considers index mode, field type, metric type, and anoptimize_forhint to pick the right chain for each field.The format is also self-describing.
es819decoders implicitly know the encoding format, which couples format evolution to code changes.es94writes aFieldDescriptorper field containing the full pipeline spec (stage IDs, block size, data type), so adding new stages or changing pipelines does not require format version bumps; it only requires newStageIdentries. Once a stage ID is assigned, it never changes.The biggest compression win comes from dedicated floating-point encoding.
es819stores doubles as sortable longs and compresses them with the integer pipeline, ignoring IEEE 754 structure entirely.es94adds ALP (Adaptive Lossless floating-Point), ALP-RD (real-valued dictionary variant), Gorilla (XOR-based), and FPC (predictor-based) for bothDOUBLEandFLOAT. For metric data where values share decimal structure, ALP in particular compresses dramatically better than treating doubles as opaque 64-bit integers.es94also supports lossy compression via a configurablemaxErrorparameter. ALP stages fuse quantization directly into the encoding path, rounding values to the nearest2 * maxErrorstep before the(e,f)search. This trades precision for smaller encoded output when exact values are not needed. Theoptimize_forhint controls precision:STORAGEquantizes to 6 decimal digits (maxError = 1e-6),BALANCEDquantizes to 12 decimal digits (maxError = 1e-12), and the default (no hint) uses lossless ALP encoding with no quantization.The block size is configurable per pipeline, unlike
es819which hardcodes a fixed block size for all fields.es94uses 512-value blocks for TSDB (optimized for temporal locality in time-series data) and 128-value blocks for LogsDB (better suited for variable-rate log ingestion). The block size is recorded in thePipelineDescriptor, so different fields and index modes can use different sizes without ambiguity.Finally, each block carries a bitmap indicating which stages actually ran. Stages can opt out dynamically:
Deltaskips non-monotonic blocks,ALPfalls back to raw mode when exceptions exceed a threshold. This meanses94adapts per block without changing the pipeline shape.es819always applies every stage regardless of what the data looks like.The design is particularly relevant for synthetic source performance. Synthetic
_sourcereconstructs documents by reading doc values across many fields per document, so the decode path is exercised heavily. The stateless singleton decoders, zero-allocation decode loops, and the deliberate choice to favor decode speed over encode speed all directly reduce the per-field cost of source reconstruction. Because decode stages carry no mutable state, they can be shared across fields and segments with no reset logic or concurrency concerns, which matters when synthetic source is decoding dozens of fields for every document.Architecture and encoding stages
A pipeline is an ordered list of stages. Transform stages modify a shared
long[]in-place (Delta,Offset,GCD,ALPtransform, and others). The final payload stage serializes the result to bytes (BitPack,Gorilla,Zstd). Each block writes a bitmap, the payload bytes, and then stage metadata in reverse order. The reverse ordering is deliberate: it matches the decoder's traversal direction, so metadata reads happen as a single forward sequential pass with no seeking.Each field stores a
FieldDescriptor(format version +PipelineDescriptor) in.dvmmetadata. The descriptor contains the orderedStageIdentries, the block shift (log2 of block size), and the data type. Decoders reconstruct the pipeline from the descriptor alone, with no external schema or implicit knowledge required.DefaultPipelineResolvermaps the combination of index mode, field type, metric type, andoptimize_forhint to aPipelineConfig. The key routing rules are:@timestamp): delta-of-delta, offset removal, patched PFor, bit-packing (block size 512 in TSDB, 128 in LogsDB)optimize_for: storage): ALP with 6-digit quantization (maxError = 1e-6), offset removal, GCD factoring, bit-packingoptimize_for: balanced): ALP with 12-digit quantization (maxError = 1e-12), offset removal, GCD factoring, bit-packingoptimize_for: speed): XOR differencing, patched PFor, bit-packinges819)There are 21 encoding stages in total, organized into three categories:
Delta,DeltaDelta,Offset,GCD,XOR,PatchedPFor,RLEALP Double/Float,ALP-RD Double/Float,FPC,QuantizeDoubleBitPack,Zstd,Gorilla,RLE Payload,ALP Double/Float,ALP-RD Double/FloatThe integer pipeline (delta, offset removal, GCD factoring, bit-packing) matches
es819's encoding, so compression ratios forLONGfields are comparable. The float pipelines are new and represent the primary compression improvement. New algorithms slot in by adding aStageId, aStageSpec, and the stage class, with no changes to the pipeline framework itself.Design philosophy
The hot path allocates nothing. Transform stages operate on a shared
long[]in-place with no intermediate arrays.MetadataBufferis a reusable growable byte buffer that avoids per-block allocation.EncodingContextandDecodingContextare allocated once per field and reused across all blocks. The only stage that touches native memory isZstd; everything else is allocation-free in steady state.A naive pipeline loop calling
stages[pos].encode()sees 13+ concreteTransformEncodertypes at that call site, which makes it permanently megamorphic: the JIT gives up inlining and falls back to vtable lookup. To avoid this, the encode pipeline dispatches through aswitchonStageIdvalues with static method calls, giving the JIT monomorphic call sites it can actually inline.Decode stages carry no mutable state.
DeltaCodecStage.INSTANCE,OffsetCodecStage.INSTANCE, and others are shared singletons, reusable across fields and segments without reset logic and with no concurrency concerns on the read path.The design consistently favors decode speed over encode speed. Metadata is written in reverse stage order so the decoder reads it sequentially. Block offsets use
DirectMonotonicWriterfor O(1) random access.BitPackdecodes through Lucene'sForUtilwith SIMD-friendly loops.ALPfuses exception collection into the encoding pass so that the encoder does the extra work and the decoder stays simple. The general principle is that encoding happens once at index time, but decoding happens on every query, so any complexity budget should be spent on the write path.Per-block bitmaps let stages opt out dynamically without reconfiguring the pipeline. A
Deltastage encountering non-monotonic data writes a0-bit and passes values through unchanged;ALPexceeding its exception threshold falls back to raw mode. The pipeline shape is fixed per field, but each block adapts to its own data independently.Current limitations
The pipeline handles
NUMERICandSORTED_NUMERICfields only. Sorted, binary, and sorted set doc values use the existinges819code paths unchanged.There is no backward compatibility gate yet. Once a segment is written with
es94, it requires the pipeline decoder to read it back. Production deployment will need a codec version gate so that older nodes do not attempt to openes94segments.Block sizes (128 for LogsDB, 512 for TSDB), ALP
maxErrorthresholds, and pipeline assignments are reasonable starting points but have not been tuned against production workloads.Backward compatibility
Although there is no BWC gate in this PR yet, the format is designed to make backward compatibility straightforward. The key property is that every field carries the information the decoder needs to interpret it. The
FieldDescriptorwritten per field contains the exact list ofStageIdentries, the block size, and the data type. The per-block bitmap records which stages were actually applied to each block. Together, these two pieces of metadata mean the decoder never has to guess or infer anything about the encoding; it reads the descriptor, walks the stage list, and applies the inverse transforms in order.Because
StageIdvalues are immutable stable byte constants, a newer decoder always knows a superset of all stage IDs that any older encoder could have written. This means segments written by older versions ofes94will always be readable by newer code without any special handling. The encoder and decoder evolve independently: the encoder can adopt new stages or rearrange pipelines over time, and any decoder that knows thoseStageIdvalues can decode the result. Adding a new compression algorithm is a purely additive change: assign a new byte ID, implement the encode and decode sides, and all existing segments remain untouched.This also handles mapping changes gracefully. In Lucene, each flush produces a new immutable segment, and the pipeline is selected at flush time based on the current mapping. If a field's mapping changes between flushes (for example, a metric type changes from
gaugetocounter, or theoptimize_forhint is updated), the next segment will simply be written with the new pipeline. Older segments keep whatever pipeline was recorded in theirFieldDescriptor. At read time, each segment decodes independently using its own descriptor, so different segments can use entirely different pipelines for the same field without conflict. During merges, segments are decoded with their original pipelines and re-encoded with whatever pipeline the current mapping dictates.In previous codecs like
es87andes819, the decoder implicitly knows the encoding format because it is hardcoded in the codec version. Any change to the encoding requires a new codec version, and both sides must agree on what that version means. With a self-describing format, the wire data itself is the contract. This means that future pipeline changes withines94, such as new stages or different per-field routing, will not require format version bumps. Only newStageIdregistrations are needed.Testing
Each stage has isolated round-trip tests that verify correctness across data patterns: constant, monotonic, random, special values (
NaN,Infinity, subnormals), and edge cases like exceptions at block boundaries or high exception rates. Full codec round-trips run throughES94TSDBDocValuesFormatTests. Pipeline-level integration tests inPipelineStageIntegrationTestscompose multi-stage pipelines and verify end-to-end encoding and decoding.StageEqualsHashCodeToStringTestscovers contract compliance for all stage implementations.Summary of changes