@@ -8,6 +8,7 @@ The Sparrow PyCapsule Interface - A C++ library for exchanging Apache Arrow data
88- Exporting sparrow arrays to Python as PyCapsules (Arrow C Data Interface)
99- Importing Arrow data from Python PyCapsules into sparrow arrays
1010- Zero-copy data exchange with Python libraries like Polars, PyArrow, and pandas
11+ - A ` SparrowArray ` Python class that implements the Arrow PyCapsule Interface
1112
1213## Features
1314
@@ -17,6 +18,7 @@ The Sparrow PyCapsule Interface - A C++ library for exchanging Apache Arrow data
1718- ✅ ** Compatible with Polars, PyArrow, pandas** and other Arrow-based libraries
1819- ✅ ** Bidirectional** data flow (C++ ↔ Python)
1920- ✅ ** Type-safe** with proper ownership semantics
21+ - ✅ ** SparrowArray Python class** implementing ` __arrow_c_array__ ` protocol
2022
2123## Building
2224
@@ -53,62 +55,64 @@ ctest --output-on-failure
5355
5456## Usage Example
5557
56- ### C++ Side: Exporting Data
58+ ### C++ Side: Creating a SparrowArray for Python
5759
5860``` cpp
5961#include < sparrow-pycapsule/pycapsule.hpp>
62+ #include < sparrow-pycapsule/sparrow_array_python_class.hpp>
6063#include < sparrow/array.hpp>
6164
6265// Create a sparrow array
6366sparrow::array my_array = /* ... */ ;
6467
65- // Export to PyCapsules for Python consumption
66- auto [schema_capsule, array_capsule] =
67- sparrow::pycapsule::export_array_to_capsules (my_array);
68+ // Create a SparrowArray Python object that implements __arrow_c_array__
69+ PyObject* sparrow_array = sparrow::pycapsule::create_sparrow_array_object(std::move(my_array));
6870
69- // Pass capsules to Python (via Python C API, pybind11 , etc.)
71+ // Return to Python - it can be used directly with Polars, PyArrow , etc.
7072```
7173
72- ### Python Side: Consuming C++ Data
74+ ### Python Side: Using SparrowArray
7375
7476``` python
77+ from test_sparrow_helper import SparrowArray
7578import polars as pl
7679import pyarrow as pa
7780
78- # Receive capsules from C++
79- # schema_capsule, array_capsule = get_from_cpp()
81+ # Create SparrowArray from any Arrow-compatible object
82+ pa_array = pa.array([1 , 2 , None , 4 , 5 ], type = pa.int32())
83+ sparrow_array = SparrowArray(pa_array)
8084
81- # Import into PyArrow
82- arrow_array = pa.Array._import_from_c_capsule(schema_capsule, array_capsule)
85+ # SparrowArray implements __arrow_c_array__, so it works with Polars
86+ # Using Polars internal API for primitive arrays:
87+ from polars._plr import PySeries
88+ from polars._utils.wrap import wrap_s
8389
84- # Convert to Polars
85- series = pl.from_arrow(arrow_array)
90+ ps = PySeries.from_arrow_c_array(sparrow_array)
91+ series = wrap_s(ps)
92+ print (series) # shape: (5,), dtype: Int32
8693
87- # Use in Polars DataFrame
88- df = pl.DataFrame({"my_column": series})
94+ # Get array size
95+ print (sparrow_array.size()) # 5
8996```
9097
9198### Python Side: Exporting to C++
9299
93100``` python
94- import polars as pl
95-
96- # Create Polars data
97- series = pl.Series([1 , 2 , None , 4 , 5 ])
101+ import pyarrow as pa
98102
99- # Convert to Arrow and export as capsules
100- arrow_array = series.to_arrow()
101- schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
103+ # Any object implementing __arrow_c_array__ can be imported by sparrow
104+ arrow_array = pa.array([1 , 2 , None , 4 , 5 ])
102105
103- # Pass to C++
106+ # The SparrowArray constructor accepts any ArrowArrayExportable
107+ sparrow_array = SparrowArray(arrow_array)
104108```
105109
106110### C++ Side: Importing from Python
107111
108112``` cpp
109113#include < sparrow-pycapsule/pycapsule.hpp>
110114
111- // Receive capsules from Python
115+ // Receive capsules from Python (e.g., from __arrow_c_array__)
112116PyObject* schema_capsule = /* ... */ ;
113117PyObject* array_capsule = /* ... */ ;
114118
@@ -121,6 +125,27 @@ sparrow::array imported_array =
121125std::cout << "Array size: " << imported_array.size() << std::endl;
122126```
123127
128+ ## SparrowArray Python Class
129+
130+ The `SparrowArray` class is a Python type implemented in C++ that:
131+
132+ - **Wraps a sparrow array** and exposes it to Python
133+ - **Implements `__arrow_c_array__`** (ArrowArrayExportable protocol)
134+ - **Accepts any ArrowArrayExportable** in its constructor (PyArrow, Polars, etc.)
135+ - **Provides a `size()` method** to get the number of elements
136+
137+ ```python
138+ # Constructor accepts any object with __arrow_c_array__
139+ sparrow_array = SparrowArray(pyarrow_array)
140+ sparrow_array = SparrowArray(another_sparrow_array)
141+
142+ # Implements ArrowArrayExportable protocol
143+ schema_capsule, array_capsule = sparrow_array.__arrow_c_array__()
144+
145+ # Get array size
146+ n = sparrow_array.size()
147+ ```
148+
124149## Testing
125150
126151### C++ Unit Tests
@@ -130,13 +155,12 @@ cd build
130155./bin/Debug/test_sparrow_pycapsule_lib
131156```
132157
133- ### Polars Integration Tests
158+ ### Integration Tests
134159
135- Test bidirectional data exchange with Polars:
160+ Test bidirectional data exchange with Polars and PyArrow :
136161
137162``` bash
138-
139- # Or with direct execution (better output)
163+ # Run integration tests (recommended)
140164cmake --build . --target run_polars_tests_direct
141165
142166# Check dependencies first
@@ -153,35 +177,39 @@ The project provides several convenient CMake targets for testing:
153177| --------| -------------|
154178| ` run_tests ` | Run all C++ unit tests |
155179| ` run_tests_with_junit_report ` | Run C++ tests with JUnit XML output |
156- | ` run_polars_tests_direct ` | Run Polars test directly (recommended, better output ) |
180+ | ` run_polars_tests_direct ` | Run integration tests (recommended) |
157181| ` check_polars_deps ` | Check Python dependencies (polars, pyarrow) |
182+ | ` test_library_load ` | Debug library loading issues |
158183
159184** Usage:**
160185``` bash
161186cd build
162187
163- # Run Polars integration tests
188+ # Run integration tests
164189cmake --build . --target run_polars_tests_direct
165190
166191# Check dependencies first
167192cmake --build . --target check_polars_deps
168193```
169194
170- ### Debugging Test Failures
195+ ## API Reference
171196
172- If you encounter segmentation faults or other issues:
197+ ### SparrowArray Python Class
173198
174- ``` bash
175- cd build
199+ ``` cpp
200+ // Create a SparrowArray Python object from a sparrow::array
201+ PyObject* create_sparrow_array_object (sparrow::array&& arr);
176202
177- # Run minimal library loading test (step-by-step debugging)
178- cmake --build . --target test_library_load
203+ // Create a SparrowArray from PyCapsules
204+ PyObject* create_sparrow_array_object_from_capsules(
205+ PyObject* schema_capsule, PyObject* array_capsule);
179206
180- # Check that libraries exist and dependencies are correct
181- cmake --build . --target check_polars_deps
182- ```
207+ // Register SparrowArray type with a Python module
208+ int register_sparrow_array_type(PyObject* module);
183209
184- ## API Reference
210+ // Get the SparrowArray type object
211+ PyTypeObject* get_sparrow_array_type();
212+ ```
185213
186214### Export Functions
187215
@@ -197,9 +225,6 @@ cmake --build . --target check_polars_deps
197225
198226### Memory Management
199227
200- - ` release_arrow_schema_pycapsule(PyObject* capsule) ` - PyCapsule destructor for schema
201- - ` release_arrow_array_pycapsule(PyObject* capsule) ` - PyCapsule destructor for array
202-
203228All capsules have destructors that properly clean up Arrow structures.
204229
205230## Supported Data Types
@@ -216,27 +241,39 @@ All types support nullable values via the Arrow null bitmap.
216241## Integration with Python Libraries
217242
218243### Polars
244+
219245```python
220- series = pl.Series([1 , 2 , 3 ])
221- arrow_array = series.to_arrow()
222- schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
223- # Pass to C++
246+ from polars._plr import PySeries
247+ from polars._utils.wrap import wrap_s
248+
249+ # SparrowArray implements __arrow_c_array__, use Polars internal API
250+ sparrow_array = SparrowArray(some_arrow_array)
251+ ps = PySeries.from_arrow_c_array(sparrow_array)
252+ series = wrap_s(ps)
224253```
225254
226255### PyArrow
256+
227257``` python
228- arrow_array = pa.array([1 , 2 , 3 ])
229- schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
230- # Pass to C++
258+ import pyarrow as pa
259+
260+ # Create SparrowArray from PyArrow
261+ pa_array = pa.array([1 , 2 , 3 ])
262+ sparrow_array = SparrowArray(pa_array)
263+
264+ # Export back to PyArrow
265+ schema_capsule, array_capsule = sparrow_array.__arrow_c_array__()
231266```
232267
233268### pandas (via PyArrow)
269+
234270``` python
235271import pandas as pd
272+ import pyarrow as pa
273+
236274series = pd.Series([1 , 2 , 3 ])
237275arrow_array = pa.Array.from_pandas(series)
238- schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
239- # Pass to C++
276+ sparrow_array = SparrowArray(arrow_array)
240277```
241278
242279## License
0 commit comments