Skip to content

Comments

fix(esql): check per-shard DateFieldType for DocValuesSkipper#142752

Open
salvatore-campagna wants to merge 8 commits intoelastic:mainfrom
salvatore-campagna:fix/per-shard-doc-values-skipper
Open

fix(esql): check per-shard DateFieldType for DocValuesSkipper#142752
salvatore-campagna wants to merge 8 commits intoelastic:mainfrom
salvatore-campagna:fix/per-shard-doc-values-skipper

Conversation

@salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Feb 20, 2026

Summary

SearchContextStats.min()/max() derived hasDocValuesSkipper once from the first shard's DateFieldType and applied it globally to all shards via doWithContexts. In mixed environments (TSDB shards using DocValuesSkipper + standard shards using PointValues), the wrong API gets called on some shards, causing sentinel values to leak into min/max results.

The two APIs have different sentinel behavior when a shard has no data for the field:

  • PointValues.getMinPackedValue() returns null when there are no points: callers can check for null and skip.
  • DocValuesSkipper.globalMinValue() returns Long.MIN_VALUE when a leaf reader has no skipper, and Long.MAX_VALUE when no segments have the field. globalMaxValue() returns the opposite sentinels.

When hasDocValuesSkipper is determined from the first shard (e.g. a TSDB shard) and then applied to a standard shard that only has PointValues, globalMinValue/globalMaxValue are called on readers that have no skipper. This returns the sentinels Long.MIN_VALUE/Long.MAX_VALUE, which propagate as the min/max result.

This replaces the workaround in #142726 which filtered sentinels after the fact with hasMin/hasMax booleans. Instead, this fix addresses the root cause: min() and max() now always call the right API based on each shard's own DateFieldType.hasDocValuesSkipper(), using doc values skippers when available and BKD trees (point values) otherwise, rather than deriving the choice once from the first shard and applying it globally. This way sentinels can never leak in the first place.

What changed

  • min() and max() now iterate contexts directly instead of using doWithContexts, preserving the context-to-leaf-reader association needed to check hasDocValuesSkipper() per shard. Each shard always uses the correct API to retrieve min/max values: doc values skippers when available, BKD trees (point values) otherwise.
  • Simplified the early-return guard: removed the (hasDocValueSkipper == false && stat.config.indexed == false) check which was incorrect for mixed TSDB/standard environments (a TSDB shard with indexed=false would cause the global indexed to be false, bailing out even when standard shards have points)
  • Extracted helper methods (docValuesSkipperMinValue/docValuesSkipperMaxValue and pointMinValue/pointMaxValue) that wrap the underlying APIs and convert sentinel values to null. This gives both code paths a uniform Long-or-null interface, and also filters the Long.MIN_VALUE sentinel from DocValuesSkipper.globalMinValue() that the previous code did not guard against. The per-leaf results are aggregated via nullableMin/nullableMax helpers.

Tests

  • testPointValuesMinMaxDoesNotReturnSentinelValues: exercises the PointValues code path. Creates multiple standard (non-TSDB) contexts where the date field is mapped but has no actual date data. Asserts hasDocValuesSkipper() is false on each context and verifies that min() and max() return null as expected. This is the path that reproduces the original stack trace: without the fix, Long.MAX_VALUE leaks as min and Long.MIN_VALUE as max, causing Rounding.prepare(min, max) to throw IllegalArgumentException: [9223372036852975807] must be <= [-9223372036852975808].

  • testDocValuesSkipperMinMaxDoesNotReturnSentinelValues: exercises the DocValuesSkipper code path. Creates multiple TSDB contexts where @timestamp is mapped with hasDocValuesSkipper()=true but the Lucene index only contains keyword docs (no timestamp data written). Verifies that min() and max() return null instead of sentinel values. In practice, data streams always have @timestamp populated, but the test intentionally forces the empty-data corner case so that the sentinel handling is self-contained and does not rely on guarantees from upper layers.

Replaces #142726

Closes #142725

…chContextStats min/max

hasDocValuesSkipper was derived once from the first shard's
DateFieldType and applied globally via doWithContexts. In mixed
environments (TSDB shards with DocValuesSkipper + standard shards
with PointValues), calling the wrong API on some shards caused
sentinel values (Long.MIN_VALUE/Long.MAX_VALUE) to leak into min/max.

Replace doWithContexts with direct per-context iteration so each
shard's own DateFieldType determines whether to use DocValuesSkipper
or PointValues. This also simplifies the early-return guard by
removing the incorrect indexed check that could bail out prematurely
in mixed modes.
salvatore-campagna and others added 3 commits February 20, 2026 12:50
…null instead of sentinels

Extract helper methods that convert sentinel values to null, making
both code paths return Long (or null) uniformly. This simplifies the
min/max logic and also filters the Long.MIN_VALUE sentinel from
DocValuesSkipper.globalMinValue that the previous code did not guard
against.
@salvatore-campagna salvatore-campagna added backport pending auto-backport Automatically create backport pull requests when merged v9.3.2 v9.2.7 labels Feb 20, 2026
@salvatore-campagna salvatore-campagna marked this pull request as ready for review February 20, 2026 12:43
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder if we need to build a more general 'get min and max values of a field' API directly into lucene here?

continue;
}
final MappedFieldType ctxFieldType = context.getFieldType(field.string());
boolean ctxHasSkipper = ctxFieldType instanceof DateFieldType dft && dft.hasDocValuesSkipper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we know we're operating on a DateFieldType here (because of the instanceof check on line 236) we can use ctxFieldType.indexType().hasSkippers() directly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...no instanceof 💯

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Feb 20, 2026

LGTM. I wonder if we need to build a more general 'get min and max values of a field' API directly into lucene here?

Yeah I think it would be better to have something like that. Whatever the underlying data structure is, just give me min/max.

salvatore-campagna and others added 2 commits February 20, 2026 15:46
…anceof cast

Use MappedFieldType.indexType().hasDocValuesSkipper() to check for
doc values skipper support, avoiding the unnecessary instanceof
DateFieldType cast since the outer guard already ensures the field
type is a DateFieldType.
@salvatore-campagna
Copy link
Contributor Author

I opened this issue in Lucene too: apache/lucene#15740

Link to apache/lucene#15740 so we remember
to replace the wrapper helpers once a unified API is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SearchContextStats.min()/max() leak sentinel values causing overflow in Rounding

3 participants