Skip to content

Conversation

emreyalvac
Copy link

Summary

Fixes a bug in the ESQL async query API where updating the keep_alive value via a GET /_query/async/{id}?keep_alive=... request did not properly extend the lifetime of the async result, causing unexpected 404s after the original TTL expired.

Details

  • The initial keep_alive duration specified in the POST request was honored, but follow-up GET requests with a new keep_alive value failed to refresh the TTL.
  • This PR ensures that if a new keep_alive value is passed in the follow-up GET request, it will properly update the expiration time of the async result.

How to Reproduce (Before Fix)

  1. Submit an async ESQL query with a short keep_alive, e.g. 15s.
  2. Within the TTL window, send a GET /_query/async/{id}?keep_alive=60s
  3. Wait >15s
  4. Expect 404 (bug)

After Fix

  • Async results will persist according to the most recent keep_alive value provided in the GET call.
  • Prevents premature cleanup.

Related

Closes: #130619

@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jul 6, 2025
@emreyalvac
Copy link
Author

@kertal can you check?

@kertal
Copy link
Member

kertal commented Jul 7, 2025

thx @emreyalvac not sure who's the best to review this, since I just reported it , and it's something we need in Kibana. dear @elasticsearch-esql , can you help me (or know who can take care of it, thx) . If this is the solution I assume it needs at least one test to cover it

@ioanatia ioanatia added the :Analytics/ES|QL AKA ESQL label Jul 8, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jul 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@ioanatia ioanatia added the needs:triage Requires assignment of a team area label label Jul 8, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jul 8, 2025
@astefan astefan self-assigned this Jul 8, 2025
@astefan astefan added the >bug label Jul 8, 2025
@astefan
Copy link
Contributor

astefan commented Jul 8, 2025

buildkite test this

@astefan
Copy link
Contributor

astefan commented Jul 8, 2025

@emreyalvac thank you for providing a fix in this PR. Would you mind looking into the test failures and, also, add a test for this fix?

@astefan
Copy link
Contributor

astefan commented Jul 11, 2025

CC @ChrisHegarty if you have a quicker turnaround on this one 🙏 .

@ChrisHegarty
Copy link
Contributor

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor

I locally hacked EsqlAsyncActionIT in order to minimally reproduce the issue, and verify the fix. I managed to reproduce the observed problem, but the fix does not work. It immediately returns org.elasticsearch.ResourceNotFoundException.

@emreyalvac did you reproduce and verity the change?

@ChrisHegarty
Copy link
Contributor

buildkite test this

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Jul 11, 2025

One can set a keep-alive time through the initial query, and this works. I'm not sure that the keep-alive update over async get was ever implemented. It's been more than a year since I wrote this code, and it has changed quite a bit. I think that there is some more work to be done if we wanna support this - more than is in this PR.

@astefan astefan removed their assignment Jul 18, 2025
@emreyalvac
Copy link
Author

@ChrisHegarty

With this PR, I can extend the keep_alive parameter for GET requests.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Jul 23, 2025

With this PR, I can extend the keep_alive parameter for GET requests.

When I locally mocked up an incomplete test for this, I found that it did not work - at least not in my environment, tho it is possible that I did something wrong. That said, org.elasticsearch.xpack.esql.action.AsyncEsqlQueryActionIT.testBasicAsyncExecution fails with this change, because of org.elasticsearch.ResourceNotFoundException: ..., so something is is clearly not right with this change as it stands.

If you believe that it works, then can you please add a test that demonstrates that, as well as fixing whatever has been broken in AsyncEsqlQueryActionIT.testBasicAsyncExecution.

Finally, the test scenario coverage should include updating the keep_alive TTL both 1) when the query is still executing, as well as 2) after it has completed (but it still retained).

@kertal
Copy link
Member

kertal commented Jul 23, 2025

I also tested this PR in a Kibana PR + Elasticsearch PR a-la-carte deployment, and while the change seems to be in the right area of the code, it doesn't work, when testing it manually

@dnhatn
Copy link
Member

dnhatn commented Aug 18, 2025

@emreyalvac Thank you for working on the fix. Unfortunately, we need bigger changes than those in this PR. We need to handle the path with updateInitialResultsInStore=false separately. Since the fix is urgently needed to unblock an important Kibana feature, I had to start a fresh fix in #133021. I would have preferred to iterate on your PR if the schedule allowed. I hope you understand and don't mind if I close this PR.

Again, thank you so much for your interest in Elasticsearch!

@dnhatn dnhatn removed the v9.2.0 label Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ES|QL] When requesting GET /_query/async/{id}?keep_alive={time} keep_alive is ignored

8 participants