-
Notifications
You must be signed in to change notification settings - Fork 1k
Undeprecate ArrowWriter::into_serialized_writer
and add docs
#8621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// Flushes any outstanding data before returning. | ||
/// | ||
/// This can be useful to provide more control over how files are written, for example | ||
/// to write columns in parallel. See the example on [`ArrowColumnWriter`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to drop a few breadcrumbs to find @adamreeve 's new example in #8582
/// | ||
/// You can create this structure via an [`ArrowRowGroupWriterFactory`] | ||
/// | ||
/// See the example on [`ArrowColumnWriter`] for how to encode columns in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrowRowGroupWriter
is actually not public and not used in the ArrowColumnWriter
example, so this last sentence might be a little confusing.
(ArrowRowGroupWriterFactory
is internally used to create ArrowRowGroupWriter
s, but publicly it only exposes the ability to create a Vec<ArrowColumnWriter>
. If I was starting this again from scratch I might have named things a little differently...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- I tried to improve the comments in 619210d
Thank you @adamreeve and @mbrobbel |
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
57.0.0
(October 2025) #7835Rationale for this change
While testing the arrow 57 upgrade in DataFusion I found a few things that need to be fixed
in parquet-rs.
One was that the method
ArrowWriter::into_serialized_writer
was deprecated, (which I know I suggested in #8389 🤦 ). However, when testing it turns out that the constructor ofSerializedFileWriter
does a lot of work (like creating the parquet schema from the arrow schema and messing with metadata)arrow-rs/parquet/src/arrow/arrow_writer/mod.rs
Lines 230 to 263 in c4f0fc1
Creating a
RowGroupWriterFactory
directly would involve a bunch of code duplicationWhat changes are included in this PR?
So let's not deprecate this method for now and instead add some additional docs to guide people to the right lace
Are these changes tested?
I tested manually upstream
Are there any user-facing changes?
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.