GH-46633: [Docs][C++][Python] Update CombineChunks documentation to specify that binary columns can be combined into multiple chunks (#46638)

kangakum36 · web-flow · commit 4539dfb5c310 · 2025-06-02T14:50:18.000+02:00
### Rationale for this change The documentation for [pyarrow.Table.combine_chunks](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.combine_chunks) and [Table::CombineChunks](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow5Table13CombineChunksEP10MemoryPool) states: All the underlying chunks in the ChunkedArray of each column are concatenated into zero or one chunk. However, [this comment](https://github.com/apache/arrow/blob/d7015bd6e610b6cd6752f6cd543509bd5f8853ff/cpp/src/arrow/table.cc#L567) indicates that binary columns can be combined into multiple chunks. Multiple chunks are produced when combining into one chunk would result in a buffer overflow. A reproducible example is [here](#46633 (comment)). ### What changes are included in this PR? Change `Table::CombineChunks` and `pyarrow.Table.combine_chunks` documentation to specify that binary columns can be combined into multiple chunks. ### Are these changes tested? No, they are only documentation changes. ### Are there any user-facing changes? Yes, documentation changes. * GitHub Issue: #46633 Authored-by: Akum Kang <kangakum@Akums-MacBook-Pro-2.local> Signed-off-by: AlenkaF <frim.alenka@gmail.com>
diff --git a/cpp/src/arrow/table.h b/cpp/src/arrow/table.h
@@ -214,6 +214,9 @@ class ARROW_EXPORT Table {
   /// All the underlying chunks in the ChunkedArray of each column are
   /// concatenated into zero or one chunk.
   ///
+  /// To avoid buffer overflow, binary columns may be combined into
+  /// multiple chunks. Chunks will have the maximum possible length.
+  ///
   /// \param[in] pool The pool for buffer allocations
   Result<std::shared_ptr<Table>> CombineChunks(
       MemoryPool* pool = default_memory_pool()) const;
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
@@ -4510,6 +4510,9 @@ cdef class Table(_Tabular):
         All the underlying chunks in the ChunkedArray of each column are
         concatenated into zero or one chunk.
 
+        To avoid buffer overflow, binary columns may be combined into
+        multiple chunks. Chunks will have the maximum possible length.
+
         Parameters
         ----------
         memory_pool : MemoryPool, default None