Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 24, 2025

We have encountered the following error in serverless:

java.lang.NullPointerException: Cannot invoke \"org.apache.lucene.search.BulkScorer.score(org.apache.lucene.search.LeafCollector, org.apache.lucene.util.Bits, int, int)\" because \"this.bulkScorer\" is null
at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:233)
at org.elasticsearch.compute.lucene.LuceneSourceOperator.getCheckedOutput(LuceneSourceOperator.java:307)
at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:143)
at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:272)
at org.elasticsearch.compute.operator.Driver.run(Driver.java:186)
at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:420)

I spent considerable time trying to reproduce this issue but was unsuccessful, although I understand how it could occur. Weight should not be shared between threads. Most Weight implementations are safe to share, but those for term queries (e.g., TermQuery, multi-term queries) are not, as they contain mutable TermStates.

This change proposes to stop sharing Weight between Drivers.

I am not sure if we should backport this to 8.19 and 9.1, since without the SliceQueue improvement from #132774, this fix may slow down queries due to the increased cost of creating more Weight instances.

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn requested a review from nik9000 August 25, 2025 04:53
@dnhatn dnhatn marked this pull request as ready for review August 25, 2025 04:54
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question.

}
}

private static class OwningWeight extends FilterWeight {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put the Query into LuceneScorer instead of into the Weight? There's already a bunch of useful stuff. We could do:

+        private final Query query;
         private final Weight weight;

and check there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've updated the LuceneScorer to keep both the query and tags.

@dnhatn dnhatn requested a review from nik9000 September 3, 2025 21:55
@nik9000
Copy link
Member

nik9000 commented Sep 3, 2025

Makes sense to me.

@dnhatn dnhatn enabled auto-merge (squash) September 9, 2025 01:20
@dnhatn dnhatn merged commit 2f10065 into elastic:main Sep 9, 2025
33 of 34 checks passed
@dnhatn dnhatn deleted the fix-weight branch September 9, 2025 04:46
@dnhatn
Copy link
Member Author

dnhatn commented Sep 9, 2025

Thanks Nik!

rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
We have encountered the following error in serverless:

```
java.lang.NullPointerException: Cannot invoke \"org.apache.lucene.search.BulkScorer.score(org.apache.lucene.search.LeafCollector, org.apache.lucene.util.Bits, int, int)\" because \"this.bulkScorer\" is null
at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:233)
at org.elasticsearch.compute.lucene.LuceneSourceOperator.getCheckedOutput(LuceneSourceOperator.java:307)
at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:143)
at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:272)
at org.elasticsearch.compute.operator.Driver.run(Driver.java:186)
at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:420)
```

I spent considerable time trying to reproduce this issue but was 
unsuccessful, although I understand how it could occur. Weight should
not be shared between threads. Most Weight implementations are safe to
share, but those for term queries (e.g., TermQuery, multi-term queries)
are not, as they contain mutable

This change proposes to stop sharing Weight between Drivers.
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Sep 10, 2025
This reverts commit 2f10065.

# Conflicts:
#	x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSourceOperator.java
dnhatn added a commit that referenced this pull request Sep 10, 2025
This reverts commit 2f10065.

We have seen a performance regression that may be caused by this fix in 
some queries. I am reverting the PR for now and will try to introduce a
fix with fewer side effects.

Relates #133446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants