feat(tsdb): pipeline-based numeric codec with self-describing format and per-field encoding by salvatore-campagna · Pull Request #141353 · elastic/elasticsearch

salvatore-campagna · 2026-01-27T13:15:02Z

This PR introduces a pipeline-based numeric codec for TSDB indices (es94) that replaces the monolithic encoding approach used by es87 and es819. Instead of a fixed encoder that applies a single strategy to all numeric fields, encoding happens through a configurable chain of small, focused stages, each doing one transformation well. The pipeline is self-describing: each field writes a compact descriptor (FieldDescriptor containing a PipelineDescriptor) so the decoder reconstructs the correct pipeline dynamically, without implicit format knowledge.

The pipeline currently handles NUMERIC and SORTED_NUMERIC doc values for three data types: LONG, DOUBLE, and FLOAT. A DefaultPipelineResolver selects the pipeline per field based on index mode (TSDB, LogsDB, standard), field type, metric type, and an optimize_for hint. Non-numeric doc values (sorted, binary, sorted set) pass through unchanged using the existing es819 code paths.

Advantages over `es819`

es819 uses the same encoding chain for every numeric field. es94 selects a pipeline per field: timestamps get delta-of-delta with patched PFor bit-packing, gauge doubles get ALP-based floating-point compression, counters get Gorilla XOR encoding. The resolver considers index mode, field type, metric type, and an optimize_for hint to pick the right chain for each field.

The format is also self-describing. es819 decoders implicitly know the encoding format, which couples format evolution to code changes. es94 writes a FieldDescriptor per field containing the full pipeline spec (stage IDs, block size, data type), so adding new stages or changing pipelines does not require format version bumps; it only requires new StageId entries. Once a stage ID is assigned, it never changes.

The biggest compression win comes from dedicated floating-point encoding. es819 stores doubles as sortable longs and compresses them with the integer pipeline, ignoring IEEE 754 structure entirely. es94 adds ALP (Adaptive Lossless floating-Point), ALP-RD (real-valued dictionary variant), Gorilla (XOR-based), and FPC (predictor-based) for both DOUBLE and FLOAT. For metric data where values share decimal structure, ALP in particular compresses dramatically better than treating doubles as opaque 64-bit integers.

es94 also supports lossy compression via a configurable maxError parameter. ALP stages fuse quantization directly into the encoding path, rounding values to the nearest 2 * maxError step before the (e,f) search. This trades precision for smaller encoded output when exact values are not needed. The optimize_for hint controls precision: STORAGE quantizes to 6 decimal digits (maxError = 1e-6), BALANCED quantizes to 12 decimal digits (maxError = 1e-12), and the default (no hint) uses lossless ALP encoding with no quantization.

The block size is configurable per pipeline, unlike es819 which hardcodes a fixed block size for all fields. es94 uses 512-value blocks for TSDB (optimized for temporal locality in time-series data) and 128-value blocks for LogsDB (better suited for variable-rate log ingestion). The block size is recorded in the PipelineDescriptor, so different fields and index modes can use different sizes without ambiguity.

Finally, each block carries a bitmap indicating which stages actually ran. Stages can opt out dynamically: Delta skips non-monotonic blocks, ALP falls back to raw mode when exceptions exceed a threshold. This means es94 adapts per block without changing the pipeline shape. es819 always applies every stage regardless of what the data looks like.

The design is particularly relevant for synthetic source performance. Synthetic _source reconstructs documents by reading doc values across many fields per document, so the decode path is exercised heavily. The stateless singleton decoders, zero-allocation decode loops, and the deliberate choice to favor decode speed over encode speed all directly reduce the per-field cost of source reconstruction. Because decode stages carry no mutable state, they can be shared across fields and segments with no reset logic or concurrency concerns, which matters when synthetic source is decoding dozens of fields for every document.

Architecture and encoding stages

A pipeline is an ordered list of stages. Transform stages modify a shared long[] in-place (Delta, Offset, GCD, ALP transform, and others). The final payload stage serializes the result to bytes (BitPack, Gorilla, Zstd). Each block writes a bitmap, the payload bytes, and then stage metadata in reverse order. The reverse ordering is deliberate: it matches the decoder's traversal direction, so metadata reads happen as a single forward sequential pass with no seeking.

Each field stores a FieldDescriptor (format version + PipelineDescriptor) in .dvm metadata. The descriptor contains the ordered StageId entries, the block shift (log2 of block size), and the data type. Decoders reconstruct the pipeline from the descriptor alone, with no external schema or implicit knowledge required.

DefaultPipelineResolver maps the combination of index mode, field type, metric type, and optimize_for hint to a PipelineConfig. The key routing rules are:

Timestamps (@timestamp): delta-of-delta, offset removal, patched PFor, bit-packing (block size 512 in TSDB, 128 in LogsDB)
Double gauges (default): lossless ALP encoding, offset removal, GCD factoring, bit-packing
Double gauges (optimize_for: storage): ALP with 6-digit quantization (maxError = 1e-6), offset removal, GCD factoring, bit-packing
Double gauges (optimize_for: balanced): ALP with 12-digit quantization (maxError = 1e-12), offset removal, GCD factoring, bit-packing
Double gauges (optimize_for: speed): XOR differencing, patched PFor, bit-packing
Float gauges: lossless ALP float encoding, offset removal, GCD factoring, bit-packing
Double counters: Gorilla XOR-based encoding
Long fields: delta, offset removal, GCD factoring, bit-packing (matches es819)

There are 21 encoding stages in total, organized into three categories:

Category	Stages
Integer transforms	`Delta`, `DeltaDelta`, `Offset`, `GCD`, `XOR`, `PatchedPFor`, `RLE`
Float transforms	`ALP Double/Float`, `ALP-RD Double/Float`, `FPC`, `QuantizeDouble`
Payload codecs	`BitPack`, `Zstd`, `Gorilla`, `RLE Payload`, `ALP Double/Float`, `ALP-RD Double/Float`

The integer pipeline (delta, offset removal, GCD factoring, bit-packing) matches es819's encoding, so compression ratios for LONG fields are comparable. The float pipelines are new and represent the primary compression improvement. New algorithms slot in by adding a StageId, a StageSpec, and the stage class, with no changes to the pipeline framework itself.

Design philosophy

The hot path allocates nothing. Transform stages operate on a shared long[] in-place with no intermediate arrays. MetadataBuffer is a reusable growable byte buffer that avoids per-block allocation. EncodingContext and DecodingContext are allocated once per field and reused across all blocks. The only stage that touches native memory is Zstd; everything else is allocation-free in steady state.

A naive pipeline loop calling stages[pos].encode() sees 13+ concrete TransformEncoder types at that call site, which makes it permanently megamorphic: the JIT gives up inlining and falls back to vtable lookup. To avoid this, the encode pipeline dispatches through a switch on StageId values with static method calls, giving the JIT monomorphic call sites it can actually inline.

Decode stages carry no mutable state. DeltaCodecStage.INSTANCE, OffsetCodecStage.INSTANCE, and others are shared singletons, reusable across fields and segments without reset logic and with no concurrency concerns on the read path.

The design consistently favors decode speed over encode speed. Metadata is written in reverse stage order so the decoder reads it sequentially. Block offsets use DirectMonotonicWriter for O(1) random access. BitPack decodes through Lucene's ForUtil with SIMD-friendly loops. ALP fuses exception collection into the encoding pass so that the encoder does the extra work and the decoder stays simple. The general principle is that encoding happens once at index time, but decoding happens on every query, so any complexity budget should be spent on the write path.

Per-block bitmaps let stages opt out dynamically without reconfiguring the pipeline. A Delta stage encountering non-monotonic data writes a 0-bit and passes values through unchanged; ALP exceeding its exception threshold falls back to raw mode. The pipeline shape is fixed per field, but each block adapts to its own data independently.

Current limitations

The pipeline handles NUMERIC and SORTED_NUMERIC fields only. Sorted, binary, and sorted set doc values use the existing es819 code paths unchanged.

There is no backward compatibility gate yet. Once a segment is written with es94, it requires the pipeline decoder to read it back. Production deployment will need a codec version gate so that older nodes do not attempt to open es94 segments.

Block sizes (128 for LogsDB, 512 for TSDB), ALP maxError thresholds, and pipeline assignments are reasonable starting points but have not been tuned against production workloads.

Backward compatibility

Although there is no BWC gate in this PR yet, the format is designed to make backward compatibility straightforward. The key property is that every field carries the information the decoder needs to interpret it. The FieldDescriptor written per field contains the exact list of StageId entries, the block size, and the data type. The per-block bitmap records which stages were actually applied to each block. Together, these two pieces of metadata mean the decoder never has to guess or infer anything about the encoding; it reads the descriptor, walks the stage list, and applies the inverse transforms in order.

Because StageId values are immutable stable byte constants, a newer decoder always knows a superset of all stage IDs that any older encoder could have written. This means segments written by older versions of es94 will always be readable by newer code without any special handling. The encoder and decoder evolve independently: the encoder can adopt new stages or rearrange pipelines over time, and any decoder that knows those StageId values can decode the result. Adding a new compression algorithm is a purely additive change: assign a new byte ID, implement the encode and decode sides, and all existing segments remain untouched.

This also handles mapping changes gracefully. In Lucene, each flush produces a new immutable segment, and the pipeline is selected at flush time based on the current mapping. If a field's mapping changes between flushes (for example, a metric type changes from gauge to counter, or the optimize_for hint is updated), the next segment will simply be written with the new pipeline. Older segments keep whatever pipeline was recorded in their FieldDescriptor. At read time, each segment decodes independently using its own descriptor, so different segments can use entirely different pipelines for the same field without conflict. During merges, segments are decoded with their original pipelines and re-encoded with whatever pipeline the current mapping dictates.

In previous codecs like es87 and es819, the decoder implicitly knows the encoding format because it is hardcoded in the codec version. Any change to the encoding requires a new codec version, and both sides must agree on what that version means. With a self-describing format, the wire data itself is the contract. This means that future pipeline changes within es94, such as new stages or different per-field routing, will not require format version bumps. Only new StageId registrations are needed.

Testing

Each stage has isolated round-trip tests that verify correctness across data patterns: constant, monotonic, random, special values (NaN, Infinity, subnormals), and edge cases like exceptions at block boundaries or high exception rates. Full codec round-trips run through ES94TSDBDocValuesFormatTests. Pipeline-level integration tests in PipelineStageIntegrationTests compose multi-stage pipelines and verify end-to-end encoding and decoding. StageEqualsHashCodeToStringTests covers contract compliance for all stage implementations.

Summary of changes

- server/src/main/java/.../tsdb/es94/              - ES94 codec (consumer, producer, format)
- server/src/main/java/.../tsdb/pipeline/          - Pipeline abstraction and encoding stages
- server/src/main/java/.../index/codec/            - PerFieldFormatSupplier wiring
- server/src/main/java/.../index/mapper/           - DateFieldMapper (optimize_for), NumberFieldMapper
- server/src/test/java/.../tsdb/es94/              - ES94 doc values format tests
- server/src/test/java/.../tsdb/pipeline/          - Pipeline, stage, and integration tests
- benchmarks/src/main/java/.../tsdb/               - JMH encode/decode benchmarks
- x-pack/.../otel-data/.../metrics-otel@mappings   - OTel metrics mapping update

This is a proof of concept exploring a different approach to encoding numeric fields in TSDB indices. Instead of the monolithic encoder used in ES819, this introduces a pipeline architecture where encoding stages can be composed together. The pipeline currently chains: delta -> offset -> gcd -> bitPack. Each stage transforms the data and passes it to the next, with metadata recorded so decoding can reverse the process. Additional stages exist for specific cases: PatchedPFor handles blocks with outliers, Zigzag handles signed values, and Zstd provides optional compression. What makes this a POC rather than production-ready: - Only numeric doc values use the new encoding. Sorted, binary, and sorted set fields are unchanged from ES819 -- they are copied as-is. - The pipeline configuration is static. Every numeric field gets the same delta-offset-gcd-bitPack chain regardless of data patterns. Production would need heuristics to pick the right stages. - Format is incompatible with ES819. The codec name and wire format differ, so this only works on fresh indices. No migration path exists. - Limited real-world validation. The tests verify correctness but the approach has not been validated against production workloads. The pipeline abstraction includes FieldDescriptor and PipelineDescriptor classes that are not currently used. These document the evolution path toward self-describing formats where pipeline configuration is written to metadata, enabling per-field pipelines and backward compatibility.

Pad values to the block size before compression so decoding always sees full blocks and add a partial-block round-trip test.

Short-circuit encoding when negatives are present and add a skip test for signed inputs.

Reuse a single decoder per producer and close the codec on close to avoid per-call allocation.

Add a shared helper to assert a stage is not applied and exercise skip paths for delta, offset, and gcd.

…agna/elasticsearch into feature/tsdb-pipeline-poc

salvatore-campagna · 2026-01-28T09:02:21Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es94/ES94TSDBDocValuesProducer.java

+        this.numericBlockShift = blockShift;
+        this.numericBlockSize = 1 << blockShift;
+        this.numericBlockMask = numericBlockSize - 1;
+        this.numericCodec = ES94TSDBDocValuesFormat.createNumericCodec(numericBlockSize);


Encoder/decoder are created separately. Ideally we should inject a single NumericCodec from ES94TSDBDocValuesFormat to both the producer and consumer to guarantee symmetric pipeline construction and avoid future drift. Then the consumer would only create the encoder (numericCodec.newEncoder()) and the producer would only create the decoder (numericCodec.newDecoder()).

salvatore-campagna · 2026-01-28T09:04:00Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es94/ES94TSDBDocValuesConsumer.java

+        this.numericBlockShift = numericBlockShift;
+        this.numericBlockSize = 1 << numericBlockShift;
+
+        this.numericCodec = ES94TSDBDocValuesFormat.createNumericCodec(numericBlockSize);


See comment: #141353 (comment)

salvatore-campagna · 2026-01-30T09:10:34Z

...n/java/org/elasticsearch/index/codec/tsdb/pipeline/numeric/stages/PatchedPForCodecStage.java

+    }
+
+    private int findOptimalBitWidth(long[] values, int valueCount, int maxBits, int maxExceptions) {
+        int[] histogram = new int[65];


Will remove this allocation from the hot path.

@timestamp

…mprovements - Add ALP Float codec stages (transform, payload, RD variants) with AlpFloatUtils for 32-bit float ALP math - Add DeltaDelta codec stage for second-order differencing and route @timestamp fields through it via optimize_for in DateFieldMapper - Add FPC transform stage for floating-point compression - Fuse quantization into all ALP encode stages using fast rounding, eliminating separate QuantizeDouble stage transition for ALP combos - Derive ALP maxExponent from quantize maxError for tighter encoding - Right-size metadata buffer based on per-stage budgets - Add equals, hashCode, and toString to all pipeline stages - Add trace/debug logging to pipeline resolution and codec paths - Fix FieldDescriptor/skip-index write order in addNumericField - Clean up tests: inline variables, remove unnecessary comments, apply spotless formatting

salvatore-campagna · 2026-02-11T19:25:24Z