Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Apr 25, 2025

No description provided.

@wgtmac wgtmac force-pushed the file_reader branch 3 times, most recently from 5d2c5e9 to b3a149d Compare April 25, 2025 09:21
@wgtmac
Copy link
Member Author

wgtmac commented Apr 26, 2025

@lidavidm @zhjwpku Could you help review this?

Comment on lines +106 to +108
/// \brief The batch size to read. Only applies to implementations that support
/// batching.
int64_t batch_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the unit bytes or rows?

(Should this also be optional?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, presumably bytes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Why does Split use size_t but this uses int64_t?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to number of rows in a batch (a.k.a. ArrowArray). It seems that we can use std::optional<size_t> and define a default batch size if non provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. int64_t might be appropriate as that's then consistent with C Data Interface. Clarifying the unit might be helpful.

DataLayout data_layout() const final { return DataLayout::kStructLike; }

private:
std::unique_ptr<Reader> reader_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit confused about why keeping a Reader object here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is just an adapter. If the wrapped reader directly returns kStructLike, it does nothing. Otherwise it will try to aggregate them into an ArrowArray.

@Fokko Fokko merged commit 8aa7fd7 into apache:main Apr 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants