Skip to content

Conversation

@javanna
Copy link
Member

@javanna javanna commented Feb 28, 2025

Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations v9.0.0 v8.18.1 v8.19.0 v8.17.4 auto-backport Automatically create backport pull requests when merged >bug labels Mar 5, 2025
@javanna javanna marked this pull request as ready for review March 5, 2025 23:02
@javanna javanna requested a review from a team as a code owner March 5, 2025 23:02
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Mar 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

- match: { aggregations.test.buckets.1.key: "second" }
- match: { aggregations.test.buckets.1.top_hits.hits.total: 2 }
- match: { aggregations.test.buckets.1.top_hits.hits.hits.0.sort.0: "d-1.0" }
- match: { aggregations.test.buckets.1.top_hits.hits.hits.0.sort.1: 1.0 }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears that we had no tests covering sort scripts relying on _score. I added them because making _score available in sort scripts was the whole reason why script sorting wasn't thread-safe when inter-segment concurrency is enabled.

}
}
);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was also missing and if present, would have surfaced concurrency issues when using script sorting as part as top_hits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to see this test fail without the changes to ScriptSortBuilder and the two field comparators but I could not get it to fail so far, even with several hundred of iterations and several re-runs with a different root seed. I see the index contains 50 docs, how are we sure its likely to run into using several segments / concurrent situations? I remember some tweaks to our integration tests to make this more likely for small indices but cannot remember, do you recall?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure that you reverted the changes to ScriptSortBuilder in the right places, still leaving concurrency enabled for script sorting? I get it to fail as I soon as I do that. There is maybe a little less likelihood for it to fail due to randomizing the execution hint, because we parallelize only on global_ordinals execution mode, based on field cardinality. That is for the terms aggs that holds the top_hits agg. But I find that the issue reproduces pretty quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I reverted all changes that are not test-related (tried also with and wothough changes to MockScriptEngine), and ran the test very often from IDE and from command line, no failures so far. I'll try running in a tight loop once again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to keep overriding supportsParallelCollection in ScriptSortBuilder :) otherwise it does not run using inter-segment concurrency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh

@javanna javanna requested a review from cbuescher March 5, 2025 23:06
@original-brownbear
Copy link
Contributor

Looks like this fails with a legitimate and related failure: https://buildkite.com/elastic/elasticsearch-pull-request/builds/60890#01956897-cd53-49cc-9f12-67c7ab7ceef1/93-5575

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing the leafScripts map in ScriptSortBuilder makes sense to me, but I have trouble reproducing the issue without your changes with the added tests. Left a comment around that for clarification.

}
}
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to see this test fail without the changes to ScriptSortBuilder and the two field comparators but I could not get it to fail so far, even with several hundred of iterations and several re-runs with a different root seed. I see the index contains 50 docs, how are we sure its likely to run into using several segments / concurrent situations? I remember some tweaks to our integration tests to make this more likely for small indices but cannot remember, do you recall?

@javanna
Copy link
Member Author

javanna commented Mar 7, 2025

Looks like this fails with a legitimate and related failure:

thanks, I had seen the failure, it's a test issue to do with MockScriptEngine, and the eager call to get_score that I added when using top_hits and top_metrics.

@javanna javanna merged commit def4c89 into elastic:main Mar 10, 2025
17 checks passed
@javanna javanna deleted the fix/script_sort_concurrency branch March 10, 2025 20:10
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 10, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 10, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 10, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.0
8.18
8.x
8.17

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 10, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
elasticsearchmachine pushed a commit that referenced this pull request Mar 10, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
elasticsearchmachine pushed a commit that referenced this pull request Mar 10, 2025
* Fix concurrency issue in ScriptSortBuilder (#123757)

Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.

* iter
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
elasticsearchmachine pushed a commit that referenced this pull request Mar 11, 2025
* Fix concurrency issue in ScriptSortBuilder (#123757)

Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.

* iter
elasticsearchmachine pushed a commit that referenced this pull request Mar 11, 2025
* Fix concurrency issue in ScriptSortBuilder (#123757)

Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.

* iter
albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this pull request Mar 13, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 31, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit that referenced this pull request Apr 1, 2025
With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 1, 2025
With elastic#123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26012)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26013)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26011)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
…26014)

With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.17.4 v8.18.1 v8.19.0 v9.0.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants