-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Limit the number of tasks that a single search can submit #115668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
f70498a
c20a03a
710063c
e215c78
9d8e64e
a6e4974
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 115668 | ||
| summary: Limit the number of tasks that a single search can submit | ||
| area: Search | ||
| type: bug | ||
| issues: [] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,6 +87,7 @@ | |
| import java.util.TreeSet; | ||
| import java.util.concurrent.Executor; | ||
| import java.util.concurrent.ThreadPoolExecutor; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
| import java.util.function.LongSupplier; | ||
| import java.util.function.ToLongFunction; | ||
|
|
||
|
|
@@ -202,7 +203,7 @@ final class DefaultSearchContext extends SearchContext { | |
| engineSearcher.getQueryCache(), | ||
| engineSearcher.getQueryCachingPolicy(), | ||
| lowLevelCancellation, | ||
| executor, | ||
| wrapExecutor(executor), | ||
| maximumNumberOfSlices, | ||
| minimumDocsPerSlice | ||
| ); | ||
|
|
@@ -229,6 +230,32 @@ final class DefaultSearchContext extends SearchContext { | |
| } | ||
| } | ||
|
|
||
| private static Executor wrapExecutor(Executor executor) { | ||
| if (executor instanceof ThreadPoolExecutor tpe) { | ||
| // let this searcher fork to a limited maximum number of tasks, to protect against situations where Lucene may | ||
| // submit too many segment level tasks. With enough parallel search requests and segments per shards, they may all see | ||
| // an empty queue and start parallelizing, filling up the queue very quickly and causing rejections, due to | ||
| // many small tasks in the queue that become no-op because the active caller thread will execute them instead. | ||
| // Note that despite all tasks are completed, TaskExecutor#invokeAll leaves the leftover no-op tasks in queue hence | ||
| // they contribute to the queue size until they are removed from it. | ||
| AtomicInteger segmentLevelTasks = new AtomicInteger(0); | ||
| return command -> { | ||
| if (segmentLevelTasks.incrementAndGet() > tpe.getMaximumPoolSize()) { | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am open to opinions on the threshold. It is quite conservative. For instance, for operations like knn query rewrite that parallelize on number of segments, we end up creating much less tasks on small nodes. yet it is probably a good idea to not create more tasks than the available threads, and there could be multiple shard level requests happening at the same time in the same node, deriving from the same search or others, so the total number of tasks is still potentially higher than max pool size anyways. We should probably improve this in Lucene as a follow-up, but this is some protection mostly for 8.x which is based on Lucene 9.12. |
||
| command.run(); | ||
|
||
| } else { | ||
| executor.execute(() -> { | ||
| try { | ||
| command.run(); | ||
| } finally { | ||
| segmentLevelTasks.decrementAndGet(); | ||
| } | ||
| }); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we decrement the counter when the task is done executing in order to allow search parallelism again later on?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I've been going back and forth on it. The current impl removes the need for CAS once we used the budget. It is also much simpler to test. These are maybe minor points though. The current solution may look too conservative around the number of tasks that get created, and if they are executed fast enough we could indeed create more tasks than the number of threads in total, although not all at the same time. I wonder how likely that is a real scenario, given that TaskExecutor submits all tasks at once, and not gradually. That is why I think that this simple solution provides the most value, assuming that all tasks are submitted at once. I guess that this impl may perhaps hurt latency a little over throughput. Also, we already apply the same limit to the number of slices, statically. We would just apply the same limit effectively to knn query rewrite and degenerate cases where we end up parallelizing from a segment level task, which seems wrong anyway and we should protect from. Additional thoughts?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess that my main concern is more that if a query parallelizes at rewrite time (or createWeight time) and then at collection time, and if the query rewrite uses all the parallelism budget, then you won't get any parallelism at collection time.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, my comment above does not take into account that I will look further into how we can make this a little better without over-complicating things.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying the same thing at one point and it's somewhat tricky to avoid a serious penalty for tiny tasks if you start increasing the amount of ref-counting that is done overall like this.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I pushed a new commit that does the decrement. Testing has become much less predictable like I expected. I do also worry about the CAS for each task. We should probably benchmark both solutions and see what the difference is. What are we losing by e.g. only parallelizing knn query rewrite and not collection when they both happen, compared to what we are losing by performing the CAS at all times? |
||
| }; | ||
| } | ||
| return executor; | ||
| } | ||
|
|
||
| static long getFieldCardinality(String field, IndexService indexService, DirectoryReader directoryReader) { | ||
| MappedFieldType mappedFieldType = indexService.mapperService().fieldType(field); | ||
| if (mappedFieldType == null) { | ||
|
|
@@ -290,6 +317,8 @@ static int determineMaximumNumberOfSlices( | |
| boolean enableQueryPhaseParallelCollection, | ||
| ToLongFunction<String> fieldCardinality | ||
| ) { | ||
| // Note: although this method refers to parallel collection, it affects any kind of parallelism, including query rewrite, | ||
| // given that if 1 is the returned value, no executor is provided to the searcher. | ||
| return executor instanceof ThreadPoolExecutor tpe | ||
| && tpe.getQueue().size() <= tpe.getMaximumPoolSize() | ||
| && isParallelCollectionSupportedForResults(resultsType, request.source(), fieldCardinality, enableQueryPhaseParallelCollection) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.