-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(array): add RecordBatchStream trait
#9166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// | ||
| /// Implementation of this trait should guarantee that all `RecordBatch`'s returned by this | ||
| /// reader should have the same schema as returned from this method. | ||
| fn schema(&self) -> SchemaRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated making this async as well but (1) async trait functions gets funky (2) it's a little inconvenient; I think it'd be better overall to require resolving the schema up-front by whatever generates the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we have a synchronous API in DataFusion and it works very well: https://docs.rs/datafusion/latest/datafusion/execution/trait.RecordBatchStream.html
One nit I would suggest would be returning &SchemaRef rather than SchemaRef -- so that the caller can decide if they want to clone the Arc or not -- I know cloning Arc's isn't all that big a deal, but it does add up over time
|
I still need to add a sample implementation and tests. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good -- thanks @lidavidm
FYI @westonpace
| [features] | ||
| ffi = ["arrow-schema/ffi", "arrow-data/ffi"] | ||
| force_validate = [] | ||
| nonblocking = ["dep:futures"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could call this flag "async" (to match the rust async keyword)
We should probably also
- Document it somewhere
- Add the same name feature flag to the arrow crate (so people can use it without having to explicitly depend on arrow-array)
Which issue does this PR close?
Closes #7228.
Rationale for this change
Provide a common trait definition for people in the async ecosystem.
What changes are included in this PR?
An equivalent for the RecordBatchReader trait using futures::Stream.
Are these changes tested?
WIP
Are there any user-facing changes?
Yes.