Add a multifile scan operator #3499
Replies: 3 comments 3 replies
-
|
I believe there's already an issue for this. And yeah, should live alongside the scan. We actually already have a vortex-scan crate from the past if you wanted to pull out the existing single file scan also |
Beta Was this translation helpful? Give feedback.
-
|
is supporting external threading models an orthogonal concern? We can also apply that to the current scan it feels like |
Beta Was this translation helpful? Give feedback.
-
|
I gave some thought to Spiral using this and I think the two use-cases are too different. Spiral (and I guess Iceberg) will want to read file-level statistics from the manifest file. For large manifests you might even want to push stats filters into the read of the manifest files. Spiral also uses a lot of special logic during the read which I think might be hard to glue onto this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Should we add a
MultiFilescan operator to vortex, or should be this functionality in the duckdb crate.It would contain the following features:
file_indexquery exprs.This operator is responsible for orchestrating the concurrent read of many files.
Ideally this will support external many thread models (e.g. duckdb using n-threads to drive the scan).
Beta Was this translation helpful? Give feedback.
All reactions