Skip to content

Conversation

@ivancea
Copy link
Contributor

@ivancea ivancea commented Jul 2, 2025

By default, the FilterByFilterAggregator (Used by the "filter" and "filters" aggs) was using the DefaultBulkScorer (From Lucene), which has no cancellation mechanism.

This PR wraps it into a CancellableBulkScorer, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.

@ivancea ivancea requested review from nik9000 and not-napoleon July 2, 2025 12:53
@ivancea ivancea added :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged v8.18.4 v8.17.9 v9.2.0 v8.19.1 v8.20.0 labels Jul 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

@ivancea ivancea added >bug and removed >enhancement labels Jul 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've updated the changelog YAML for you.

* As CancellableBulkScorer does a minimum of 4096 docs per batch, this number must be low to avoid long test times.
* </p>
*/
private static final long SLEEP_SCRIPT_MS = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a pair of Semaphors like so:

private static final Semaphore scriptRunPermits = new Semaphore(0);
private static final Semaphore cancelRunPermits = new Semaphore(0);
...

In the test:
cancelRunPermits.acquire();
client().cancelTheRequest();
scriptRunPermits.release(Integer.MAX_VALUE);
...

In the script:
cancelRunPermits.release(1);
scriptRunPermits.acquire();

I'm sure there's a simpler way to do it. But this'd block the cancel task until the the script starts. Then block the script until the cancel has finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! It looks far better, faster and consistent with a Repeat(1000)

@ivancea ivancea requested a review from nik9000 July 7, 2025 12:06
@ivancea ivancea added the v9.1.1 label Jul 7, 2025
@ivancea ivancea merged commit 05dff31 into elastic:main Jul 7, 2025
34 checks passed
@ivancea ivancea deleted the aggs-filters-cancellation branch July 7, 2025 15:16
@ivancea ivancea added the v9.0.4 label Jul 7, 2025
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 7, 2025
…30452)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 7, 2025
…30452)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 7, 2025
…30452)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 7, 2025
…30452)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2025
…130736)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2025
…130745)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2025
…130746)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2025
…130744)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2025
…130743)

By default, the `FilterByFilterAggregator` (Used by the `"filter"` and `"filters"` aggs) was using the `DefaultBulkScorer` (From Lucene), which has no cancellation mechanism.

This PR wraps it into a `CancellableBulkScorer`, which instead calls the inner scorer with ranges, and checks cancellation between them.

This should solve cases of long-running tasks using these aggregators not being cancelled, or greatly reduce the time they take after cancellation.
ivancea added a commit that referenced this pull request Jul 9, 2025
Fixes #130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in #130452
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 9, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 9, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 9, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 9, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jul 9, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
elasticsearchmachine pushed a commit that referenced this pull request Jul 9, 2025
Fixes #130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in #130452
elasticsearchmachine pushed a commit that referenced this pull request Jul 9, 2025
Fixes #130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in #130452
elasticsearchmachine pushed a commit that referenced this pull request Jul 9, 2025
Fixes #130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in #130452
elasticsearchmachine pushed a commit that referenced this pull request Jul 9, 2025
Fixes #130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in #130452
elasticsearchmachine pushed a commit that referenced this pull request Jul 9, 2025
Fixes #130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in #130452
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
Fixes elastic#130770

Both a bigger thread pool and not draining the semaphore permits were leading to failing tests sometimes because of blocked threads (Too many threads searching would end up draining all permits in parallel, and getting stuck).

The test was added in elastic#130452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.9 v8.18.4 v8.19.1 v9.0.4 v9.1.1 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants