-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Stop passing search query local node fanout through transport layer #122669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b101dcc
03374cb
868da8e
ed12a21
2c012d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -603,7 +603,13 @@ private void loadOrExecuteQueryPhase(final ShardSearchRequest request, final Sea | |
| public void executeQueryPhase(ShardSearchRequest request, CancellableTask task, ActionListener<SearchPhaseResult> listener) { | ||
| assert request.canReturnNullResponseIfMatchNoDocs() == false || request.numberOfShards() > 1 | ||
| : "empty responses require more than one shard"; | ||
| final IndexShard shard = getShard(request); | ||
| final IndexShard shard; | ||
| try { | ||
| shard = getShard(request); | ||
| } catch (RuntimeException e) { | ||
| listener.onFailure(e); | ||
| return; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this additional catch fixing? Is it a bug that you observed?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Jup concurrent shard deletion can cause a shard not found exception :)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should that be a separate change then? It is not a problem introduced by your change, is it?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is introduced here, without this change the transport layer catches the exception and passes it on to the listener. |
||
| } | ||
| rewriteAndFetchShardRequest( | ||
| shard, | ||
| request, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -342,7 +342,7 @@ public void testRequestCacheWithTemplateRoleQuery() { | |
| // Since the DLS for the alias uses a stored script, this should cause the request cached to be disabled | ||
| assertSearchResponse(client1.prepareSearch(DLS_TEMPLATE_ROLE_QUERY_ALIAS).setRequestCache(true), Set.of("1"), Set.of("username")); | ||
| // No cache should be used | ||
| assertCacheState(DLS_TEMPLATE_ROLE_QUERY_INDEX, 2, 2); | ||
| assertCacheState(DLS_TEMPLATE_ROLE_QUERY_INDEX, 3, 2); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you expand on these assertions needing to be adapted?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not quite to be honest. It seems going through the transport layer updates the thread-local context variable somehow and then authz works differently. But I think the security folks need to look at this. |
||
| } | ||
|
|
||
| private void prepareIndices() { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity: if we do this, why do it only for the query phase? Also, couldn't this conditional be added to the sendExecuteQuery method instead? What kind of overhead does this save? I can imagine that this is a pretty common pattern, or is search the edge case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good question. I guess I only noticed this in benchmarks for search and for the query phase specifically. For fetch it's not super visible since you don't hit so many shards mostly and for bulk indexing you still have per-shard bulks so the cost isn't in that.
I first noticed this with batched execution where the overhead becomes super visible but it's equally visible without it for large data nodes that do coordination work already (or if queries are heavy, like a large terms query or some geo stuff or so).
The overhead saved is 1. all the lookups in the transport layer, lots of listener wrapping, child-task registration and most importantly security.
But :) that's why I need a review from security here I think. Functionally I think security still works the same way if not more efficiently. All tests pass because we auth the top level search request. DLS/FLS are applied as well but somehow those cache assertions needed adjustment and seemingly we do use the cache more now and I can't explain why.
the security overhead is considerable here, it's well in excess of the can_match cost for most rally runs it seems :O
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's this (couldn't zoom out further :P)

vs this

and on a transport thread.