Skip to content

Comments

feat(tsdb): pipeline-based numeric codec with self-describing format and per-field encoding#141353

Draft
salvatore-campagna wants to merge 46 commits intoelastic:mainfrom
salvatore-campagna:feature/tsdb-pipeline-poc
Draft

feat(tsdb): pipeline-based numeric codec with self-describing format and per-field encoding#141353
salvatore-campagna wants to merge 46 commits intoelastic:mainfrom
salvatore-campagna:feature/tsdb-pipeline-poc

Conversation

@salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Jan 27, 2026

This PR introduces a pipeline-based numeric codec for TSDB indices (es94) that replaces the monolithic encoding approach used by es87 and es819. Instead of a fixed encoder that applies a single strategy to all numeric fields, encoding happens through a configurable chain of small, focused stages, each doing one transformation well. The pipeline is self-describing: each field writes a compact descriptor (FieldDescriptor containing a PipelineDescriptor) so the decoder reconstructs the correct pipeline dynamically, without implicit format knowledge.

The pipeline currently handles NUMERIC and SORTED_NUMERIC doc values for three data types: LONG, DOUBLE, and FLOAT. A DefaultPipelineResolver selects the pipeline per field based on index mode (TSDB, LogsDB, standard), field type, metric type, and an optimize_for hint. Non-numeric doc values (sorted, binary, sorted set) pass through unchanged using the existing es819 code paths.

Advantages over es819

es819 uses the same encoding chain for every numeric field. es94 selects a pipeline per field: timestamps get delta-of-delta with patched PFor bit-packing, gauge doubles get ALP-based floating-point compression, counters get Gorilla XOR encoding. The resolver considers index mode, field type, metric type, and an optimize_for hint to pick the right chain for each field.

The format is also self-describing. es819 decoders implicitly know the encoding format, which couples format evolution to code changes. es94 writes a FieldDescriptor per field containing the full pipeline spec (stage IDs, block size, data type), so adding new stages or changing pipelines does not require format version bumps; it only requires new StageId entries. Once a stage ID is assigned, it never changes.

The biggest compression win comes from dedicated floating-point encoding. es819 stores doubles as sortable longs and compresses them with the integer pipeline, ignoring IEEE 754 structure entirely. es94 adds ALP (Adaptive Lossless floating-Point), ALP-RD (real-valued dictionary variant), Gorilla (XOR-based), and FPC (predictor-based) for both DOUBLE and FLOAT. For metric data where values share decimal structure, ALP in particular compresses dramatically better than treating doubles as opaque 64-bit integers.

es94 also supports lossy compression via a configurable maxError parameter. ALP stages fuse quantization directly into the encoding path, rounding values to the nearest 2 * maxError step before the (e,f) search. This trades precision for smaller encoded output when exact values are not needed. The optimize_for hint controls precision: STORAGE quantizes to 6 decimal digits (maxError = 1e-6), BALANCED quantizes to 12 decimal digits (maxError = 1e-12), and the default (no hint) uses lossless ALP encoding with no quantization.

The block size is configurable per pipeline, unlike es819 which hardcodes a fixed block size for all fields. es94 uses 512-value blocks for TSDB (optimized for temporal locality in time-series data) and 128-value blocks for LogsDB (better suited for variable-rate log ingestion). The block size is recorded in the PipelineDescriptor, so different fields and index modes can use different sizes without ambiguity.

Finally, each block carries a bitmap indicating which stages actually ran. Stages can opt out dynamically: Delta skips non-monotonic blocks, ALP falls back to raw mode when exceptions exceed a threshold. This means es94 adapts per block without changing the pipeline shape. es819 always applies every stage regardless of what the data looks like.

The design is particularly relevant for synthetic source performance. Synthetic _source reconstructs documents by reading doc values across many fields per document, so the decode path is exercised heavily. The stateless singleton decoders, zero-allocation decode loops, and the deliberate choice to favor decode speed over encode speed all directly reduce the per-field cost of source reconstruction. Because decode stages carry no mutable state, they can be shared across fields and segments with no reset logic or concurrency concerns, which matters when synthetic source is decoding dozens of fields for every document.

Architecture and encoding stages

A pipeline is an ordered list of stages. Transform stages modify a shared long[] in-place (Delta, Offset, GCD, ALP transform, and others). The final payload stage serializes the result to bytes (BitPack, Gorilla, Zstd). Each block writes a bitmap, the payload bytes, and then stage metadata in reverse order. The reverse ordering is deliberate: it matches the decoder's traversal direction, so metadata reads happen as a single forward sequential pass with no seeking.

Each field stores a FieldDescriptor (format version + PipelineDescriptor) in .dvm metadata. The descriptor contains the ordered StageId entries, the block shift (log2 of block size), and the data type. Decoders reconstruct the pipeline from the descriptor alone, with no external schema or implicit knowledge required.

DefaultPipelineResolver maps the combination of index mode, field type, metric type, and optimize_for hint to a PipelineConfig. The key routing rules are:

  • Timestamps (@timestamp): delta-of-delta, offset removal, patched PFor, bit-packing (block size 512 in TSDB, 128 in LogsDB)
  • Double gauges (default): lossless ALP encoding, offset removal, GCD factoring, bit-packing
  • Double gauges (optimize_for: storage): ALP with 6-digit quantization (maxError = 1e-6), offset removal, GCD factoring, bit-packing
  • Double gauges (optimize_for: balanced): ALP with 12-digit quantization (maxError = 1e-12), offset removal, GCD factoring, bit-packing
  • Double gauges (optimize_for: speed): XOR differencing, patched PFor, bit-packing
  • Float gauges: lossless ALP float encoding, offset removal, GCD factoring, bit-packing
  • Double counters: Gorilla XOR-based encoding
  • Long fields: delta, offset removal, GCD factoring, bit-packing (matches es819)

There are 21 encoding stages in total, organized into three categories:

Category Stages
Integer transforms Delta, DeltaDelta, Offset, GCD, XOR, PatchedPFor, RLE
Float transforms ALP Double/Float, ALP-RD Double/Float, FPC, QuantizeDouble
Payload codecs BitPack, Zstd, Gorilla, RLE Payload, ALP Double/Float, ALP-RD Double/Float

The integer pipeline (delta, offset removal, GCD factoring, bit-packing) matches es819's encoding, so compression ratios for LONG fields are comparable. The float pipelines are new and represent the primary compression improvement. New algorithms slot in by adding a StageId, a StageSpec, and the stage class, with no changes to the pipeline framework itself.

Design philosophy

The hot path allocates nothing. Transform stages operate on a shared long[] in-place with no intermediate arrays. MetadataBuffer is a reusable growable byte buffer that avoids per-block allocation. EncodingContext and DecodingContext are allocated once per field and reused across all blocks. The only stage that touches native memory is Zstd; everything else is allocation-free in steady state.

A naive pipeline loop calling stages[pos].encode() sees 13+ concrete TransformEncoder types at that call site, which makes it permanently megamorphic: the JIT gives up inlining and falls back to vtable lookup. To avoid this, the encode pipeline dispatches through a switch on StageId values with static method calls, giving the JIT monomorphic call sites it can actually inline.

Decode stages carry no mutable state. DeltaCodecStage.INSTANCE, OffsetCodecStage.INSTANCE, and others are shared singletons, reusable across fields and segments without reset logic and with no concurrency concerns on the read path.

The design consistently favors decode speed over encode speed. Metadata is written in reverse stage order so the decoder reads it sequentially. Block offsets use DirectMonotonicWriter for O(1) random access. BitPack decodes through Lucene's ForUtil with SIMD-friendly loops. ALP fuses exception collection into the encoding pass so that the encoder does the extra work and the decoder stays simple. The general principle is that encoding happens once at index time, but decoding happens on every query, so any complexity budget should be spent on the write path.

Per-block bitmaps let stages opt out dynamically without reconfiguring the pipeline. A Delta stage encountering non-monotonic data writes a 0-bit and passes values through unchanged; ALP exceeding its exception threshold falls back to raw mode. The pipeline shape is fixed per field, but each block adapts to its own data independently.

Current limitations

The pipeline handles NUMERIC and SORTED_NUMERIC fields only. Sorted, binary, and sorted set doc values use the existing es819 code paths unchanged.

There is no backward compatibility gate yet. Once a segment is written with es94, it requires the pipeline decoder to read it back. Production deployment will need a codec version gate so that older nodes do not attempt to open es94 segments.

Block sizes (128 for LogsDB, 512 for TSDB), ALP maxError thresholds, and pipeline assignments are reasonable starting points but have not been tuned against production workloads.

Backward compatibility

Although there is no BWC gate in this PR yet, the format is designed to make backward compatibility straightforward. The key property is that every field carries the information the decoder needs to interpret it. The FieldDescriptor written per field contains the exact list of StageId entries, the block size, and the data type. The per-block bitmap records which stages were actually applied to each block. Together, these two pieces of metadata mean the decoder never has to guess or infer anything about the encoding; it reads the descriptor, walks the stage list, and applies the inverse transforms in order.

Because StageId values are immutable stable byte constants, a newer decoder always knows a superset of all stage IDs that any older encoder could have written. This means segments written by older versions of es94 will always be readable by newer code without any special handling. The encoder and decoder evolve independently: the encoder can adopt new stages or rearrange pipelines over time, and any decoder that knows those StageId values can decode the result. Adding a new compression algorithm is a purely additive change: assign a new byte ID, implement the encode and decode sides, and all existing segments remain untouched.

This also handles mapping changes gracefully. In Lucene, each flush produces a new immutable segment, and the pipeline is selected at flush time based on the current mapping. If a field's mapping changes between flushes (for example, a metric type changes from gauge to counter, or the optimize_for hint is updated), the next segment will simply be written with the new pipeline. Older segments keep whatever pipeline was recorded in their FieldDescriptor. At read time, each segment decodes independently using its own descriptor, so different segments can use entirely different pipelines for the same field without conflict. During merges, segments are decoded with their original pipelines and re-encoded with whatever pipeline the current mapping dictates.

In previous codecs like es87 and es819, the decoder implicitly knows the encoding format because it is hardcoded in the codec version. Any change to the encoding requires a new codec version, and both sides must agree on what that version means. With a self-describing format, the wire data itself is the contract. This means that future pipeline changes within es94, such as new stages or different per-field routing, will not require format version bumps. Only new StageId registrations are needed.

Testing

Each stage has isolated round-trip tests that verify correctness across data patterns: constant, monotonic, random, special values (NaN, Infinity, subnormals), and edge cases like exceptions at block boundaries or high exception rates. Full codec round-trips run through ES94TSDBDocValuesFormatTests. Pipeline-level integration tests in PipelineStageIntegrationTests compose multi-stage pipelines and verify end-to-end encoding and decoding. StageEqualsHashCodeToStringTests covers contract compliance for all stage implementations.

Summary of changes

- server/src/main/java/.../tsdb/es94/              - ES94 codec (consumer, producer, format)
- server/src/main/java/.../tsdb/pipeline/          - Pipeline abstraction and encoding stages
- server/src/main/java/.../index/codec/            - PerFieldFormatSupplier wiring
- server/src/main/java/.../index/mapper/           - DateFieldMapper (optimize_for), NumberFieldMapper
- server/src/test/java/.../tsdb/es94/              - ES94 doc values format tests
- server/src/test/java/.../tsdb/pipeline/          - Pipeline, stage, and integration tests
- benchmarks/src/main/java/.../tsdb/               - JMH encode/decode benchmarks
- x-pack/.../otel-data/.../metrics-otel@mappings   - OTel metrics mapping update

This is a proof of concept exploring a different approach to encoding
numeric fields in TSDB indices. Instead of the monolithic encoder used
in ES819, this introduces a pipeline architecture where encoding stages
can be composed together.

The pipeline currently chains: delta -> offset -> gcd -> bitPack. Each
stage transforms the data and passes it to the next, with metadata
recorded so decoding can reverse the process. Additional stages exist
for specific cases: PatchedPFor handles blocks with outliers, Zigzag
handles signed values, and Zstd provides optional compression.

What makes this a POC rather than production-ready:

- Only numeric doc values use the new encoding. Sorted, binary, and
  sorted set fields are unchanged from ES819 -- they are copied as-is.

- The pipeline configuration is static. Every numeric field gets the
  same delta-offset-gcd-bitPack chain regardless of data patterns.
  Production would need heuristics to pick the right stages.

- Format is incompatible with ES819. The codec name and wire format
  differ, so this only works on fresh indices. No migration path exists.

- Limited real-world validation. The tests verify correctness but the
  approach has not been validated against production workloads.

The pipeline abstraction includes FieldDescriptor and PipelineDescriptor
classes that are not currently used. These document the evolution path
toward self-describing formats where pipeline configuration is written
to metadata, enabling per-field pipelines and backward compatibility.
@salvatore-campagna salvatore-campagna self-assigned this Jan 27, 2026
@salvatore-campagna salvatore-campagna changed the title Add ES94 TSDB doc values format with pipeline-based numeric encoding feat(tsdb): add ES94 doc values format with pipeline-based numeric encoding Jan 27, 2026
salvatore-campagna and others added 5 commits January 27, 2026 17:26
Pad values to the block size before compression so decoding
always sees full blocks and add a partial-block round-trip test.
Short-circuit encoding when negatives are present and add a
skip test for signed inputs.
Reuse a single decoder per producer and close the codec on
close to avoid per-call allocation.
Add a shared helper to assert a stage is not applied and
exercise skip paths for delta, offset, and gcd.
@salvatore-campagna salvatore-campagna changed the title feat(tsdb): add ES94 doc values format with pipeline-based numeric encoding feat(tsdb): es94 doc values format with pipeline-based numeric encoding Jan 27, 2026
@salvatore-campagna salvatore-campagna changed the title feat(tsdb): es94 doc values format with pipeline-based numeric encoding feat(tsdb): es94 doc values format with pipeline-based numeric encoding Jan 27, 2026
this.numericBlockShift = blockShift;
this.numericBlockSize = 1 << blockShift;
this.numericBlockMask = numericBlockSize - 1;
this.numericCodec = ES94TSDBDocValuesFormat.createNumericCodec(numericBlockSize);
Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoder/decoder are created separately. Ideally we should inject a single NumericCodec from ES94TSDBDocValuesFormat to both the producer and consumer to guarantee symmetric pipeline construction and avoid future drift. Then the consumer would only create the encoder (numericCodec.newEncoder()) and the producer would only create the decoder (numericCodec.newDecoder()).

this.numericBlockShift = numericBlockShift;
this.numericBlockSize = 1 << numericBlockShift;

this.numericCodec = ES94TSDBDocValuesFormat.createNumericCodec(numericBlockSize);
Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment: #141353 (comment)

}

private int findOptimalBitWidth(long[] values, int valueCount, int maxBits, int maxExceptions) {
int[] histogram = new int[65];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this allocation from the hot path.

salvatore-campagna and others added 2 commits February 5, 2026 13:14
…mprovements

- Add ALP Float codec stages (transform, payload, RD variants) with
  AlpFloatUtils for 32-bit float ALP math
- Add DeltaDelta codec stage for second-order differencing and route
  @timestamp fields through it via optimize_for in DateFieldMapper
- Add FPC transform stage for floating-point compression
- Fuse quantization into all ALP encode stages using fast rounding,
  eliminating separate QuantizeDouble stage transition for ALP combos
- Derive ALP maxExponent from quantize maxError for tighter encoding
- Right-size metadata buffer based on per-stage budgets
- Add equals, hashCode, and toString to all pipeline stages
- Add trace/debug logging to pipeline resolution and codec paths
- Fix FieldDescriptor/skip-index write order in addNumericField
- Clean up tests: inline variables, remove unnecessary comments,
  apply spotless formatting
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

@salvatore-campagna salvatore-campagna changed the title feat(tsdb): es94 doc values format with pipeline-based numeric encoding feat(tsdb): pipeline-based numeric codec with self-describing format and per-field encoding Feb 11, 2026
…ALP utils

bestE is always non-negative (ALP exponent in range [0, maxExponent]),
so Math.abs() was unnecessary. Removes the two forbidden API violations.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

…equiresExplicitClose

Move Zstd native buffers to a thread-local ZstdBuffers holder with
Cleaner-based lifecycle, eliminating per-instance buffer allocation.
Remove requiresExplicitClose() from PayloadEncoder/PayloadDecoder
interfaces and all implementations since it is no longer needed.
Add LZ4 as a new payload stage (StageId 0xA5) alongside Zstd and
BitPack. Supports both fast compression (ESLZ4Compressor) and high
compression (LZ4HCJavaSafeCompressor) via a boolean parameter.
Thread-local byte[] buffers avoid per-instance allocation.
Wire into StageFactory, PipelineConfig builders, and OptimalPipelines.
Pre-allocate at INITIAL_BLOCK_SIZE=512 to cover production block sizes
(128 and 512) without any growth. Buffers grow via ensureCapacity() only
for larger test block sizes.
Thread-local buffers make per-instance resource management unnecessary.
Remove Closeable from PayloadEncoder, PayloadDecoder, pipelines,
NumericEncoder, NumericDecoder, and the perFieldEncoders/Decoders
tracking lists in ES94 consumer/producer.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

Switch the default (no optimize_for hint) pipeline for TSDB double
gauges, float gauges, and double counters from ALP/Gorilla to FPC
with offset + gcd + bitpack. Timestamp fields use the default codec.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

Default pipelines when no optimize_for hint is set in TSDB:
- Double gauges: ALP with 1e-6 quantization + offset + gcd + bitpack
- Float gauges: ALP float + offset + gcd + bitpack
- Double counters: FPC + offset + gcd + bitpack
- Float counters: FPC + offset + gcd + bitpack
Timestamp fields use the default codec.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

1 similar comment
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

Refactor DefaultPipelineResolver into explicit per-type methods:
- Double gauges: ALP (1e-6) for storage/default, ALP (1e-12) for balanced
- Double counters: Gorilla for storage, FPC for default
- Float gauges: ALP float, Float counters: FPC
- @timestamp: delta-of-delta + offset + gcd + bitpack (TSDB and LogsDB)

Set optimize_for to storage in OTel metrics template for double gauges
and double counters.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

Remove PRECISION_MARGIN and MAX_F_CANDIDATES constraints that
artificially limited the ALP exponent/factor search space. The original
ALP paper (SIGMOD 24) searches all (e,f) combinations where f <= e <= maxExponent.
Our implementation restricted e to estimatedPrecision+2 and f to 0..3,
preventing discovery of high-exponent solutions (e.g. e=14, f=12) that
achieve zero roundtrip failures for division-based decimal values like
N/100.0.

Also rename scratch buffers to positions/exceptions across all ALP
encode stages and remove pointless local aliases.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

salvatore-campagna and others added 6 commits February 16, 2026 20:54
Move quantize() from AlpDoubleUtils and quantizeInPlace() from
AlpFloatUtils into a new QuantizeUtils class. All ALP, Chimp, and
QuantizeDouble stages now call QuantizeUtils.quantizeDoubles() or
QuantizeUtils.quantizeFloats() instead.
Add quantizeStep field to FpcTransformEncodeStage, following the same
pattern as Chimp stages. Wire maxError through StageSpec.FpcStage,
StageFactory, and PipelineConfig builders for both FPC and Chimp.
…PC decode

Enable QUANTIZE_STORAGE for chimp double/float stages in balanced pipeline
mode. Simplify FPC decode loop by replacing split full-byte + remainder
iteration with a single loop using bit-indexed selector access. Pre-allocate
the selector buffer in the constructor to avoid per-block allocation.
…Delta dates

Switch balanced double/float gauges from chimp with quantization to lossless
ALP. Fix date pipeline from delta-rle to deltaDelta to match the storage
experiment that achieved -12.7% overall. Double counters keep FPC for
balanced (with skip logic to prevent inflation).
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

salvatore-campagna and others added 6 commits February 20, 2026 23:30
… size resolvers

Introduce infrastructure for deferring pipeline resolution to encode time:
- Widen PipelineResolver to accept data samples, rename to StaticPipelineResolver
- Add FieldDescriptorWriter for deferred descriptor writing after first encode
- Add FieldContextResolver and BlockSizeResolver with static implementations
- Wire deferred resolution into ES94TSDBDocValuesConsumer and format classes
Add data-driven pipeline selection that profiles each block at encode time:
- BlockProfiler computes per-block statistics (range, GCD, runs, monotonicity, bit widths)
- PipelineSelector maps block profiles to optimal pipeline configurations
- AdaptivePipelineResolver orchestrates profiling and selection per field
- Add index setting to toggle adaptive profiler (index.codec.tsdb.adaptive_encoding_profiler)
- Wire adaptive resolver into PerFieldFormatSupplier with full test coverage
Add human-readable display names to StageId enum for log readability,
describeStages() to PipelineConfig, and update PipelineDescriptor toString.
@salvatore-campagna
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-270m please

@elasticmachine
Copy link
Collaborator

⏳ Build in-progress

This build attempts two tsdb-metricsgen-270m benchmarks to evaluate performance impact of this PR. To estimate benchmark completion time inspect previous nightly runs here.

History

cc @salvatore-campagna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants