-
Notifications
You must be signed in to change notification settings - Fork 123
feat: Manifest File Reader #1500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| batch_result.and_then(|batch| { | ||
| self.sidecar_visitor.visit_rows_of(batch.actions())?; | ||
| Ok(batch) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| batch_result.and_then(|batch| { | |
| self.sidecar_visitor.visit_rows_of(batch.actions())?; | |
| Ok(batch) | |
| }) | |
| let batch = batch_result?; | |
| self.sidecar_visitor.visit_rows_of(batch.actions())?; | |
| Ok(batch) |
| ext => { | ||
| return Err(Error::generic(format!( | ||
| "Unsupported checkpoint extension: {}", | ||
| ext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ext => { | |
| return Err(Error::generic(format!( | |
| "Unsupported checkpoint extension: {}", | |
| ext | |
| _ => { | |
| return Err(Error::generic(format!( | |
| "Unsupported checkpoint extension: {extension}", |
| "json" => { | ||
| engine | ||
| .json_handler() | ||
| .read_json_files(&files, MANIFEST_READ_SCHMEA.clone(), None)? | ||
| } | ||
| "parquet" => engine.parquet_handler().read_parquet_files( | ||
| &files, | ||
| MANIFEST_READ_SCHMEA.clone(), | ||
| None, | ||
| )?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: I'm a bit surprised fmt treated these so differently?
(but it's the same number of lines of code either way, so meh)
| .path() | ||
| .rsplit('.') | ||
| .next() | ||
| .unwrap_or(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we don't need to unwrap... just match as:
let actions = match extension {
Some("json") => { ... }
Some("parquet") => { ... }| #[allow(unused)] | ||
| pub(crate) enum AfterManifest { | ||
| /// Has sidecars → return sidecar files | ||
| Sidecars { sidecars: Vec<FileMeta> }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable place for a tuple enum variant, since the field name is redundant?
| Sidecars { sidecars: Vec<FileMeta> }, | |
| Sidecars(Vec<FileMeta>), |
| require!( | ||
| self.is_complete, | ||
| Error::generic(format!( | ||
| "Finalized called on ManifestReader for file {:?}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Finalized called on ManifestReader for file {:?}", | |
| "Cannot finalize in-progress ManifestReader for file {:?}", |
| )) | ||
| ); | ||
|
|
||
| let sidecars: Vec<_> = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotation shouldn't be necessary here because AfterManifest::Sidecars forces it
| let sidecars: Vec<_> = self | |
| let sidecars = self |
92737a4 to
698da9b
Compare
🥞 Stacked PR
Use this link to review incremental changes.
What changes are proposed in this pull request?
How was this change tested?