Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 21, 2025

The LuceneSourceOperator is supposed to terminate when it reaches the limit; unfortunately, we don't have a test to cover this. Due to this bug, we continue scanning all segments, even though we discard the results as the limit was reached. This can cause performance issues for simple queries like FROM .. | LIMIT 10, when Lucene indices are on the warm or cold tier. I will submit a follow-up PR to ensure we only collect up to the limit across multiple drivers.

@dnhatn dnhatn marked this pull request as ready for review February 22, 2025 00:02
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn added the auto-backport Automatically create backport pull requests when merged label Feb 22, 2025
@dnhatn dnhatn requested a review from costin February 22, 2025 00:03

public void testEarlyTermination() {
int size = between(1_000, 20_000);
int limit = between(10, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, max in between is exclusive and limit is always strictly less than size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit can be larger than the size.

@dnhatn
Copy link
Member Author

dnhatn commented Feb 24, 2025

@idegtiarenko Thanks for reviewing.

@dnhatn dnhatn merged commit 4d2b8dc into elastic:main Feb 24, 2025
17 checks passed
@dnhatn dnhatn deleted the fix-limit-operator branch February 24, 2025 16:49
@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Feb 24, 2025

Status Branch Result
8.18
8.x
9.0

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 123197

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Feb 24, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the 
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Feb 24, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the 
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
elasticsearchmachine pushed a commit that referenced this pull request Feb 24, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the 
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
elasticsearchmachine pushed a commit that referenced this pull request Feb 24, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the 
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
elasticsearchmachine pushed a commit that referenced this pull request Feb 24, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the 
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
@dnhatn
Copy link
Member Author

dnhatn commented Mar 3, 2025

💚 All backports created successfully

Status Branch Result
8.17

Questions ?

Please refer to the Backport tool documentation

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Mar 3, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.

(cherry picked from commit 4d2b8dc)
dnhatn added a commit that referenced this pull request Mar 3, 2025
The LuceneSourceOperator is supposed to terminate when it reaches the
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnhatn !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.3 v8.18.1 v8.19.0 v9.0.1 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants