Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 16, 2025

Async queries in EQL and ES|QL do not create an initial response, and the current logic does not correctly handle expiration updates when the query has already completed.

  • With initial response (no change): First, update the expiration in the async index, then update the task's expiration if the task still exists.

  • Without initial response: First, try to update the task's expiration, then attempt to get the result from the task or async index. If the result is no longer available from the task, update the expiration in the async index before retrieving it (similar to the initial response case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify the logic for both async_search and async_query, but this fix is preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get can return a NOT_FOUND error if the task completes but has not yet stored its result in the async index. This issue would also be resolved by storing an initial response up front. I will open a follow-up issue for it.

Closes #130619

@dnhatn dnhatn force-pushed the fix-async-timeout branch 2 times, most recently from fbc7545 to 79fd77e Compare August 16, 2025 22:05
@dnhatn dnhatn added :Analytics/ES|QL AKA ESQL >bug v9.1.3 v9.0.6 v8.19.3 v8.18.6 auto-backport Automatically create backport pull requests when merged labels Aug 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn force-pushed the fix-async-timeout branch from 294ca57 to 94f19e0 Compare August 16, 2025 23:48
@dnhatn dnhatn requested review from ChrisHegarty and astefan August 17, 2025 00:53
@dnhatn dnhatn marked this pull request as ready for review August 17, 2025 00:54
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've updated the changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@dnhatn
Copy link
Member Author

dnhatn commented Aug 17, 2025

/cc @kertal

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @dnhatn

@dnhatn
Copy link
Member Author

dnhatn commented Aug 18, 2025

Thanks @ChrisHegarty!

@dnhatn dnhatn merged commit 433dbf0 into elastic:main Aug 18, 2025
34 checks passed
@dnhatn dnhatn deleted the fix-async-timeout branch August 18, 2025 04:57
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.1
8.19

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 18, 2025
Async queries in EQL and ES|QL do not create an initial response, and 
the current logic does not correctly handle expiration updates when the
query has already completed.

With initial response (no change): First, update the expiration in the 
async index, then update the task's expiration if the task still exists.

Without initial response: First, try to update the task's expiration, 
then attempt to get the result from the task or async index. If the
result is no longer available from the task, update the expiration in
the async index before retrieving it (similar to the initial response
case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify 
the logic for both async_search and async_query, but this fix is
preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get 
can return a NOT_FOUND error if the task completes but has not yet
stored its result in the async index. This issue would also be resolved
by storing an initial response up front. I will open a follow-up issue
for it.

Closes elastic#130619
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 18, 2025
Async queries in EQL and ES|QL do not create an initial response, and 
the current logic does not correctly handle expiration updates when the
query has already completed.

With initial response (no change): First, update the expiration in the 
async index, then update the task's expiration if the task still exists.

Without initial response: First, try to update the task's expiration, 
then attempt to get the result from the task or async index. If the
result is no longer available from the task, update the expiration in
the async index before retrieving it (similar to the initial response
case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify 
the logic for both async_search and async_query, but this fix is
preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get 
can return a NOT_FOUND error if the task completes but has not yet
stored its result in the async index. This issue would also be resolved
by storing an initial response up front. I will open a follow-up issue
for it.

Closes elastic#130619
elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2025
Async queries in EQL and ES|QL do not create an initial response, and 
the current logic does not correctly handle expiration updates when the
query has already completed.

With initial response (no change): First, update the expiration in the 
async index, then update the task's expiration if the task still exists.

Without initial response: First, try to update the task's expiration, 
then attempt to get the result from the task or async index. If the
result is no longer available from the task, update the expiration in
the async index before retrieving it (similar to the initial response
case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify 
the logic for both async_search and async_query, but this fix is
preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get 
can return a NOT_FOUND error if the task completes but has not yet
stored its result in the async index. This issue would also be resolved
by storing an initial response up front. I will open a follow-up issue
for it.

Closes #130619
elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2025
* Fix update expiration for async query (#133021)

Async queries in EQL and ES|QL do not create an initial response, and 
the current logic does not correctly handle expiration updates when the
query has already completed.

With initial response (no change): First, update the expiration in the 
async index, then update the task's expiration if the task still exists.

Without initial response: First, try to update the task's expiration, 
then attempt to get the result from the task or async index. If the
result is no longer available from the task, update the expiration in
the async index before retrieving it (similar to the initial response
case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify 
the logic for both async_search and async_query, but this fix is
preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get 
can return a NOT_FOUND error if the task completes but has not yet
stored its result in the async index. This issue would also be resolved
by storing an initial response up front. I will open a follow-up issue
for it.

Closes #130619

* Fix compile
@kertal
Copy link
Member

kertal commented Aug 18, 2025

@dnhatn thank you, appreciate it. will test this

javanna pushed a commit to javanna/elasticsearch that referenced this pull request Aug 18, 2025
Async queries in EQL and ES|QL do not create an initial response, and 
the current logic does not correctly handle expiration updates when the
query has already completed.

With initial response (no change): First, update the expiration in the 
async index, then update the task's expiration if the task still exists.

Without initial response: First, try to update the task's expiration, 
then attempt to get the result from the task or async index. If the
result is no longer available from the task, update the expiration in
the async index before retrieving it (similar to the initial response
case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify 
the logic for both async_search and async_query, but this fix is
preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get 
can return a NOT_FOUND error if the task completes but has not yet
stored its result in the async index. This issue would also be resolved
by storing an initial response up front. I will open a follow-up issue
for it.

Closes elastic#130619
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Aug 18, 2025
Async queries in EQL and ES|QL do not create an initial response, and 
the current logic does not correctly handle expiration updates when the
query has already completed.

With initial response (no change): First, update the expiration in the 
async index, then update the task's expiration if the task still exists.

Without initial response: First, try to update the task's expiration, 
then attempt to get the result from the task or async index. If the
result is no longer available from the task, update the expiration in
the async index before retrieving it (similar to the initial response
case). This second step was introduced in this fix.

Ideally, we should always create the initial response up front to unify 
the logic for both async_search and async_query, but this fix is
preferred for now as it is more contained.

When reviewing the code, I also found a race condition where async-get 
can return a NOT_FOUND error if the task completes but has not yet
stored its result in the async index. This issue would also be resolved
by storing an initial response up front. I will open a follow-up issue
for it.

Closes elastic#130619
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.3 v9.1.3 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ES|QL] When requesting GET /_query/async/{id}?keep_alive={time} keep_alive is ignored

4 participants