-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix concurrency issue in ScriptSortBuilder #123757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
|
Hi @javanna, I've created a changelog YAML for you. |
| - match: { aggregations.test.buckets.1.key: "second" } | ||
| - match: { aggregations.test.buckets.1.top_hits.hits.total: 2 } | ||
| - match: { aggregations.test.buckets.1.top_hits.hits.hits.0.sort.0: "d-1.0" } | ||
| - match: { aggregations.test.buckets.1.top_hits.hits.hits.0.sort.1: 1.0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears that we had no tests covering sort scripts relying on _score. I added them because making _score available in sort scripts was the whole reason why script sorting wasn't thread-safe when inter-segment concurrency is enabled.
| } | ||
| } | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was also missing and if present, would have surfaced concurrency issues when using script sorting as part as top_hits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to see this test fail without the changes to ScriptSortBuilder and the two field comparators but I could not get it to fail so far, even with several hundred of iterations and several re-runs with a different root seed. I see the index contains 50 docs, how are we sure its likely to run into using several segments / concurrent situations? I remember some tweaks to our integration tests to make this more likely for small indices but cannot remember, do you recall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure that you reverted the changes to ScriptSortBuilder in the right places, still leaving concurrency enabled for script sorting? I get it to fail as I soon as I do that. There is maybe a little less likelihood for it to fail due to randomizing the execution hint, because we parallelize only on global_ordinals execution mode, based on field cardinality. That is for the terms aggs that holds the top_hits agg. But I find that the issue reproduces pretty quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I reverted all changes that are not test-related (tried also with and wothough changes to MockScriptEngine), and ran the test very often from IDE and from command line, no failures so far. I'll try running in a tight loop once again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to keep overriding supportsParallelCollection in ScriptSortBuilder :) otherwise it does not run using inter-segment concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh
|
Looks like this fails with a legitimate and related failure: https://buildkite.com/elastic/elasticsearch-pull-request/builds/60890#01956897-cd53-49cc-9f12-67c7ab7ceef1/93-5575 |
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing the leafScripts map in ScriptSortBuilder makes sense to me, but I have trouble reproducing the issue without your changes with the added tests. Left a comment around that for clarification.
| } | ||
| } | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to see this test fail without the changes to ScriptSortBuilder and the two field comparators but I could not get it to fail so far, even with several hundred of iterations and several re-runs with a different root seed. I see the index contains 50 docs, how are we sure its likely to run into using several segments / concurrent situations? I remember some tweaks to our integration tests to make this more likely for small indices but cannot remember, do you recall?
thanks, I had seen the failure, it's a test issue to do with MockScriptEngine, and the eager call to get_score that I added when using |
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
* Fix concurrency issue in ScriptSortBuilder (#123757) Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause. * iter
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
* Fix concurrency issue in ScriptSortBuilder (#123757) Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause. * iter
* Fix concurrency issue in ScriptSortBuilder (#123757) Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause. * iter
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
With elastic#123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
With #123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with #123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
With elastic#123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
With elastic#123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
With elastic#123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with elastic#123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
…26012) With #123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with #123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
…26013) With #123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with #123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
…26011) With #123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with #123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
…26014) With #123610 we disabled parallel collection for field and script sorted top hits, aligning its behaviour with that of top level search. This was mainly to work around a bug in script sorting that did not support inter-segment concurrency. The bug with script sort has been fixed with #123757 and concurrency re-enabled for it. While sort by field is not optimized for search concurrency, top hits benefits from it and disabling concurrency for sort by field in top hits has caused performance regressions in our nightly benchmarks. This commit re-enables concurrency for top hits with sort by field is used. This introduces back a discrepancy between top level search and top hits, in that concurrency is applied for top hits despite sort by field normally disables it. The key difference is the context where sorting is applied, and the fact that concurrency is disabled only for performance reasons on top level searches and not for functional reasons.
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.
The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.
The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on
_scorebeing available and top_hits aggregation with a scripted sort clause.