Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions parquet/src/arrow/arrow_writer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -450,11 +450,11 @@ impl<W: Write + Send> ArrowWriter<W> {
}

/// Converts this writer into a lower-level [`SerializedFileWriter`] and [`ArrowRowGroupWriterFactory`].
/// This can be useful to provide more control over how files are written.
#[deprecated(
since = "57.0.0",
note = "Construct a `SerializedFileWriter` and `ArrowRowGroupWriterFactory` directly instead"
)]
///
/// Flushes any outstanding data before returning.
///
/// This can be useful to provide more control over how files are written, for example
/// to write columns in parallel. See the example on [`ArrowColumnWriter`].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to drop a few breadcrumbs to find @adamreeve 's new example in #8582

pub fn into_serialized_writer(
mut self,
) -> Result<(SerializedFileWriter<W>, ArrowRowGroupWriterFactory)> {
Expand Down Expand Up @@ -872,6 +872,12 @@ impl ArrowColumnWriter {
}

/// Encodes [`RecordBatch`] to a parquet row group
///
/// Note: this structure is created by [`ArrowRowGroupWriterFactory`] internally used to
/// create [`ArrowRowGroupWriter`]s, but it is not exposed publicly.
///
/// See the example on [`ArrowColumnWriter`] for how to encode columns in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrowRowGroupWriter is actually not public and not used in the ArrowColumnWriter example, so this last sentence might be a little confusing.

(ArrowRowGroupWriterFactory is internally used to create ArrowRowGroupWriters, but publicly it only exposes the ability to create a Vec<ArrowColumnWriter>. If I was starting this again from scratch I might have named things a little differently...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- I tried to improve the comments in 619210d

#[derive(Debug)]
struct ArrowRowGroupWriter {
writers: Vec<ArrowColumnWriter>,
schema: SchemaRef,
Expand Down Expand Up @@ -907,6 +913,10 @@ impl ArrowRowGroupWriter {
}

/// Factory that creates new column writers for each row group in the Parquet file.
///
/// You can create this structure via an [`ArrowWriter::into_serialized_writer`].
/// See the example on [`ArrowColumnWriter`] for how to encode columns in parallel
#[derive(Debug)]
pub struct ArrowRowGroupWriterFactory {
schema: SchemaDescPtr,
arrow_schema: SchemaRef,
Expand Down Expand Up @@ -937,7 +947,7 @@ impl ArrowRowGroupWriterFactory {
Ok(ArrowRowGroupWriter::new(writers, &self.arrow_schema))
}

/// Create column writers for a new row group.
/// Create column writers for a new row group, with the given row group index
pub fn create_column_writers(&self, row_group_index: usize) -> Result<Vec<ArrowColumnWriter>> {
let mut writers = Vec::with_capacity(self.arrow_schema.fields.len());
let mut leaves = self.schema.columns().iter();
Expand Down
Loading