Skip to content

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented Aug 5, 2025

Context

// Spacetime project

The ESQL engine allows streaming already, but we just store the data and send it all at once. The idea is sending the data in 3 chunks instead:

  1. The initially available metadata (After query planning). Especially, the columns
  2. The resulting pages. We can stream those easily
  3. Finally, the rest of the content, after the query is finished. This includes everything that can't be sent at the start (Profile, stats, etc etc)

Potential problems

  • Streaming is BWC by itself, but things like errors in the middle of the computation alter some parts of the request:
    • We would send an "error" field inside the query, plus everything that was sent already (Metadata and pages)
    • We would send a 200 even if it's an error
    • So we'll probably need a query param/body field/pragma to enable/disable streaming.
  • Code duplication. An ideal solution should avoid having to maintain 2 different serializations: The current one, and the streamed one
  • Page size could make streaming useless, if the size of the pages is similar to the LIMIT, as you won't receive anything until a full page is created. Same for aggregations: You won't receive anything until the aggregation finishes. The page_size pragma helps here, as long as there aren't commands that hold the full input (STATS, SORT...)

In this PR

I'm separating the serialization (XContent) of every media type into a different class, and choosing the correct one in the RestAction:

  • EsqlQueryResponseStream: Interface, to be called by the esql/compute engine code
  • AbstractEsqlQueryResponseStream: Base class for most of the implementations
  • XContentEsqlQueryResponseStream: The base class that sends XContent responses
  • TextEsqlQueryResponseStream: For all the text formats
  • ArrowEsqlQueryResponseStream: Not implemented yet
  • NonStreamingEsqlQueryResponseStream: A class that will be used when the config is deactivated. Just a dummy, to avoid having to keep the old code around there, and make everything go through this

To be done

  • Implement all media types
  • Tests: Ensure the streamed serialization is identical o the normal one
  • Call the stream classes from the code:
  • Do not store the pages if posible; just send them. To free memory

Use cases

  • Reduced memory usage in coordinator, as no output pages will be stored
  • Clients consuming JSON (Or a custom format) in streaming, for heavy queries

@ivancea ivancea requested a review from nik9000 August 5, 2025 14:57
@ivancea ivancea added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.2.0 labels Aug 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

@ivancea ivancea added the >test-mute Use for PR that only mute tests label Aug 6, 2025
@ivancea ivancea removed the >test-mute Use for PR that only mute tests label Aug 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants