-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Don't try to serialize half-baked cluster info #132756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| } finally { | ||
| executionInfo.clusterInfoInitializing(false); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it could protect from concurrent addition while serializing, however I am not following what could trigger that.
This completes synchronously before any of the listeners in analyzedPlan could fail. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem can happen when we're running an async query, and the query runner has just started to construct the executionInfo here, and we get an async GET request at the same time. That request will also use executionInfo, but when serializing it writeCollection has a race condition - it reads size first and then serializes elements. So we get broken serialization data as the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Since it is just starting, it does not contain any meaningful information worth waiting for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precisely. We don't want to actually show anybody half-initialized cluster info, even if it didn't cause the race. And since we didn't initialize it yet, it can't contain anything important.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
…improv * upstream/main: (92 commits) ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064) Fix 404s in REST API landing page (elastic#133086) Fix release tests for OptimizerVerificationTests (elastic#133100) Make Glob non-recursive (elastic#132798) Update ES|QL function list for release versions (elastic#133096) Split transport version func test into abstract base (elastic#133035) Omit project ID from snapshot metrics (elastic#133098) Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013 Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015 Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097 Rename initial to unreferenced in transport versions (elastic#133082) Rename exception type header (elastic#133045) ESQL: Pluggable tests for Operator status (elastic#132876) ESQL: Mark new signatures in MIN and MAX (elastic#132980) Don't try to serialize half-baked cluster info (elastic#132756) migrate ml_rollover_legacy_indices transport version (elastic#133008) Enable `exclude_source_vectors` by default for new indices (elastic#131907) Expose APIs needed by flush during translog replay (elastic#132960) Change reporting_user role to leverage reserved kibana privileges (elastic#132766) Update TasksIT for batched execution (elastic#132762) ...
* upstream/main: (58 commits) ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064) Fix 404s in REST API landing page (elastic#133086) Fix release tests for OptimizerVerificationTests (elastic#133100) Make Glob non-recursive (elastic#132798) Update ES|QL function list for release versions (elastic#133096) Split transport version func test into abstract base (elastic#133035) Omit project ID from snapshot metrics (elastic#133098) Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013 Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015 Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097 Rename initial to unreferenced in transport versions (elastic#133082) Rename exception type header (elastic#133045) ESQL: Pluggable tests for Operator status (elastic#132876) ESQL: Mark new signatures in MIN and MAX (elastic#132980) Don't try to serialize half-baked cluster info (elastic#132756) migrate ml_rollover_legacy_indices transport version (elastic#133008) Enable `exclude_source_vectors` by default for new indices (elastic#131907) Expose APIs needed by flush during translog replay (elastic#132960) Change reporting_user role to leverage reserved kibana privileges (elastic#132766) Update TasksIT for batched execution (elastic#132762) ...
Only serialize clusterInfo once we have a consistent initialized state of it.