Skip to content

Conversation

@lidavidm
Copy link
Member

Which issue does this PR close?

Closes #7228.

Rationale for this change

Provide a common trait definition for people in the async ecosystem.

What changes are included in this PR?

An equivalent for the RecordBatchReader trait using futures::Stream.

Are these changes tested?

WIP

Are there any user-facing changes?

Yes.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 14, 2026
///
/// Implementation of this trait should guarantee that all `RecordBatch`'s returned by this
/// reader should have the same schema as returned from this method.
fn schema(&self) -> SchemaRef;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated making this async as well but (1) async trait functions gets funky (2) it's a little inconvenient; I think it'd be better overall to require resolving the schema up-front by whatever generates the stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we have a synchronous API in DataFusion and it works very well: https://docs.rs/datafusion/latest/datafusion/execution/trait.RecordBatchStream.html

One nit I would suggest would be returning &SchemaRef rather than SchemaRef -- so that the caller can decide if they want to clone the Arc or not -- I know cloning Arc's isn't all that big a deal, but it does add up over time

@lidavidm
Copy link
Member Author

I still need to add a sample implementation and tests.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- thanks @lidavidm

FYI @westonpace

[features]
ffi = ["arrow-schema/ffi", "arrow-data/ffi"]
force_validate = []
nonblocking = ["dep:futures"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could call this flag "async" (to match the rust async keyword)

We should probably also

  1. Document it somewhere
  2. Add the same name feature flag to the arrow crate (so people can use it without having to explicitly depend on arrow-array)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Asynchronous RecordBatchReader equivalent

3 participants