fix(esql): check per-shard DateFieldType for DocValuesSkipper#142752
fix(esql): check per-shard DateFieldType for DocValuesSkipper#142752salvatore-campagna wants to merge 8 commits intoelastic:mainfrom
Conversation
…chContextStats min/max hasDocValuesSkipper was derived once from the first shard's DateFieldType and applied globally via doWithContexts. In mixed environments (TSDB shards with DocValuesSkipper + standard shards with PointValues), calling the wrong API on some shards caused sentinel values (Long.MIN_VALUE/Long.MAX_VALUE) to leak into min/max. Replace doWithContexts with direct per-context iteration so each shard's own DateFieldType determines whether to use DocValuesSkipper or PointValues. This also simplifies the early-return guard by removing the incorrect indexed check that could bail out prematurely in mixed modes.
…null instead of sentinels Extract helper methods that convert sentinel values to null, making both code paths return Long (or null) uniformly. This simplifies the min/max logic and also filters the Long.MIN_VALUE sentinel from DocValuesSkipper.globalMinValue that the previous code did not guard against.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
romseygeek
left a comment
There was a problem hiding this comment.
LGTM. I wonder if we need to build a more general 'get min and max values of a field' API directly into lucene here?
| continue; | ||
| } | ||
| final MappedFieldType ctxFieldType = context.getFieldType(field.string()); | ||
| boolean ctxHasSkipper = ctxFieldType instanceof DateFieldType dft && dft.hasDocValuesSkipper(); |
There was a problem hiding this comment.
Given that we know we're operating on a DateFieldType here (because of the instanceof check on line 236) we can use ctxFieldType.indexType().hasSkippers() directly here.
There was a problem hiding this comment.
Right...no instanceof 💯
Yeah I think it would be better to have something like that. Whatever the underlying data structure is, just give me min/max. |
…anceof cast Use MappedFieldType.indexType().hasDocValuesSkipper() to check for doc values skipper support, avoiding the unnecessary instanceof DateFieldType cast since the outer guard already ensures the field type is a DateFieldType.
|
I opened this issue in Lucene too: apache/lucene#15740 |
Link to apache/lucene#15740 so we remember to replace the wrapper helpers once a unified API is available.
Summary
SearchContextStats.min()/max()derivedhasDocValuesSkipperonce from the first shard'sDateFieldTypeand applied it globally to all shards viadoWithContexts. In mixed environments (TSDB shards usingDocValuesSkipper+ standard shards usingPointValues), the wrong API gets called on some shards, causing sentinel values to leak into min/max results.The two APIs have different sentinel behavior when a shard has no data for the field:
PointValues.getMinPackedValue()returnsnullwhen there are no points: callers can check for null and skip.DocValuesSkipper.globalMinValue()returnsLong.MIN_VALUEwhen a leaf reader has no skipper, andLong.MAX_VALUEwhen no segments have the field.globalMaxValue()returns the opposite sentinels.When
hasDocValuesSkipperis determined from the first shard (e.g. a TSDB shard) and then applied to a standard shard that only hasPointValues,globalMinValue/globalMaxValueare called on readers that have no skipper. This returns the sentinelsLong.MIN_VALUE/Long.MAX_VALUE, which propagate as the min/max result.This replaces the workaround in #142726 which filtered sentinels after the fact with
hasMin/hasMaxbooleans. Instead, this fix addresses the root cause:min()andmax()now always call the right API based on each shard's ownDateFieldType.hasDocValuesSkipper(), using doc values skippers when available and BKD trees (point values) otherwise, rather than deriving the choice once from the first shard and applying it globally. This way sentinels can never leak in the first place.What changed
min()andmax()now iterate contexts directly instead of usingdoWithContexts, preserving the context-to-leaf-reader association needed to checkhasDocValuesSkipper()per shard. Each shard always uses the correct API to retrieve min/max values: doc values skippers when available, BKD trees (point values) otherwise.(hasDocValueSkipper == false && stat.config.indexed == false)check which was incorrect for mixed TSDB/standard environments (a TSDB shard withindexed=falsewould cause the globalindexedto befalse, bailing out even when standard shards have points)docValuesSkipperMinValue/docValuesSkipperMaxValueandpointMinValue/pointMaxValue) that wrap the underlying APIs and convert sentinel values tonull. This gives both code paths a uniformLong-or-nullinterface, and also filters theLong.MIN_VALUEsentinel fromDocValuesSkipper.globalMinValue()that the previous code did not guard against. The per-leaf results are aggregated vianullableMin/nullableMaxhelpers.Tests
testPointValuesMinMaxDoesNotReturnSentinelValues: exercises thePointValuescode path. Creates multiple standard (non-TSDB) contexts where the date field is mapped but has no actual date data. AssertshasDocValuesSkipper()isfalseon each context and verifies thatmin()andmax()returnnullas expected. This is the path that reproduces the original stack trace: without the fix,Long.MAX_VALUEleaks as min andLong.MIN_VALUEas max, causingRounding.prepare(min, max)to throwIllegalArgumentException: [9223372036852975807] must be <= [-9223372036852975808].testDocValuesSkipperMinMaxDoesNotReturnSentinelValues: exercises theDocValuesSkippercode path. Creates multiple TSDB contexts where@timestampis mapped withhasDocValuesSkipper()=truebut the Lucene index only contains keyword docs (no timestamp data written). Verifies thatmin()andmax()returnnullinstead of sentinel values. In practice, data streams always have@timestamppopulated, but the test intentionally forces the empty-data corner case so that the sentinel handling is self-contained and does not rely on guarantees from upper layers.Replaces #142726
Closes #142725