Skip to content

Fix FlightSQL client to display all batches from multi-partition queries#347

Open
matago wants to merge 1 commit intodatafusion-contrib:mainfrom
matago:fix/flightsql-multi-batch-results
Open

Fix FlightSQL client to display all batches from multi-partition queries#347
matago wants to merge 1 commit intodatafusion-contrib:mainfrom
matago:fix/flightsql-multi-batch-results

Conversation

@matago
Copy link

@matago matago commented Nov 26, 2025

Previously, the FlightSQL client would only display partial results when queries returned multiple batches (common with GROUP BY without ORDER BY due to partitioned execution). This happened because:

  1. run_flightsqls() only fetched one batch using match streams.next().await instead of iterating through all batches
  2. take_record_batches() only used the first batch when multiple batches existed, ignoring the rest

This fix:

  • Changes run_flightsqls() to use while let loop to collect all batches from the FlightSQL stream
  • Updates take_record_batches() to concatenate all batches before applying pagination indices, ensuring all rows are visible

Known limitations for future consideration:

  • All batches are collected eagerly into memory before display, which could be problematic for very large result sets
  • concat_batches() creates a temporary copy when concatenating
  • Unlike the SQL tab which uses lazy batch loading, FlightSQL now loads all data upfront (this matches expected behavior for aggregation queries but may warrant lazy loading for large streaming results in the future)

Fixes #346

Previously, the FlightSQL client would only display partial results when
queries returned multiple batches (common with GROUP BY without ORDER BY
due to partitioned execution). This happened because:

1. `run_flightsqls()` only fetched one batch using `match streams.next().await`
   instead of iterating through all batches
2. `take_record_batches()` only used the first batch when multiple batches
   existed, ignoring the rest

This fix:
- Changes `run_flightsqls()` to use `while let` loop to collect all batches
  from the FlightSQL stream
- Updates `take_record_batches()` to concatenate all batches before applying
  pagination indices, ensuring all rows are visible

Known limitations for future consideration:
- All batches are collected eagerly into memory before display, which could
  be problematic for very large result sets
- `concat_batches()` creates a temporary copy when concatenating, briefly
  doubling memory usage during pagination
- Unlike the SQL tab which uses lazy batch loading, FlightSQL now loads
  all data upfront (this matches expected behavior for aggregation queries
  but may warrant lazy loading for large streaming results in the future)

Fixes datafusion-contrib#346
@matthewmturner
Copy link
Collaborator

Ugh i'm sorry your running into this issue. I confess I'm not a huge fan of this fix as i think it could degrade performance considerably for many other cases to concat everything (in particular on large response sets). But i also clearly want to fix the issue you are having. Im inclined to move forward with this for now and then figure out a more robust pagination solution that would handle this case better (for example maybe concat enough batches to fill a page).

@matthewmturner
Copy link
Collaborator

I am working on another fix for this with proper pagination handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FlightSQL client only displays last batch from multi-partition query results

2 participants