Commit 9ad31f2
Fix sequential metadata fetching in ListingTable causing high latency (#2)
When scanning an exact list of remote Parquet files, the ListingTable was fetching file metadata (via head calls) sequentially. This was due to using `stream::iter(file_list).flatten()`, which processes each one-item stream in order. For remote blob stores, where each head call can take tens to hundreds of milliseconds, this sequential behavior significantly increased the time to create the physical plan.
This commit replaces the sequential flattening with concurrent merging using `tream::iter(file_list).flatten_unordered(meta_fetch_concurrency). With this change, the `head` requests are executed in parallel (up to the configured `meta_fetch_concurrency` limit), reducing latency when creating the physical plan.
Note that the ordering loss introduced by `flatten_unordered` is perfectly acceptable as the file list will anyways be fully sorted by path in `split_files` before being returned.
Additionally, tests have been updated to ensure that metadata fetching occurs concurrently.1 parent e26b7a0 commit 9ad31f2
1 file changed
+4
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1105 | 1105 | | |
1106 | 1106 | | |
1107 | 1107 | | |
1108 | | - | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
1109 | 1111 | | |
1110 | 1112 | | |
1111 | 1113 | | |
| |||
1122 | 1124 | | |
1123 | 1125 | | |
1124 | 1126 | | |
1125 | | - | |
| 1127 | + | |
1126 | 1128 | | |
1127 | 1129 | | |
1128 | 1130 | | |
| |||
0 commit comments