Skip to content

Commit 4539dfb

Browse files
authored
GH-46633: [Docs][C++][Python] Update CombineChunks documentation to specify that binary columns can be combined into multiple chunks (#46638)
### Rationale for this change The documentation for [pyarrow.Table.combine_chunks](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.combine_chunks) and [Table::CombineChunks](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow5Table13CombineChunksEP10MemoryPool) states: All the underlying chunks in the ChunkedArray of each column are concatenated into zero or one chunk. However, [this comment](https://github.com/apache/arrow/blob/d7015bd6e610b6cd6752f6cd543509bd5f8853ff/cpp/src/arrow/table.cc#L567) indicates that binary columns can be combined into multiple chunks. Multiple chunks are produced when combining into one chunk would result in a buffer overflow. A reproducible example is [here](#46633 (comment)). ### What changes are included in this PR? Change `Table::CombineChunks` and `pyarrow.Table.combine_chunks` documentation to specify that binary columns can be combined into multiple chunks. ### Are these changes tested? No, they are only documentation changes. ### Are there any user-facing changes? Yes, documentation changes. * GitHub Issue: #46633 Authored-by: Akum Kang <[email protected]> Signed-off-by: AlenkaF <[email protected]>
1 parent dc0f5a9 commit 4539dfb

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

cpp/src/arrow/table.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,9 @@ class ARROW_EXPORT Table {
214214
/// All the underlying chunks in the ChunkedArray of each column are
215215
/// concatenated into zero or one chunk.
216216
///
217+
/// To avoid buffer overflow, binary columns may be combined into
218+
/// multiple chunks. Chunks will have the maximum possible length.
219+
///
217220
/// \param[in] pool The pool for buffer allocations
218221
Result<std::shared_ptr<Table>> CombineChunks(
219222
MemoryPool* pool = default_memory_pool()) const;

python/pyarrow/table.pxi

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4510,6 +4510,9 @@ cdef class Table(_Tabular):
45104510
All the underlying chunks in the ChunkedArray of each column are
45114511
concatenated into zero or one chunk.
45124512
4513+
To avoid buffer overflow, binary columns may be combined into
4514+
multiple chunks. Chunks will have the maximum possible length.
4515+
45134516
Parameters
45144517
----------
45154518
memory_pool : MemoryPool, default None

0 commit comments

Comments
 (0)