Skip to content

Commit 0978d6f

Browse files
committed
wip
1 parent 08002fb commit 0978d6f

File tree

1 file changed

+88
-51
lines changed

1 file changed

+88
-51
lines changed

README.md

Lines changed: 88 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ The Sparrow PyCapsule Interface - A C++ library for exchanging Apache Arrow data
88
- Exporting sparrow arrays to Python as PyCapsules (Arrow C Data Interface)
99
- Importing Arrow data from Python PyCapsules into sparrow arrays
1010
- Zero-copy data exchange with Python libraries like Polars, PyArrow, and pandas
11+
- A `SparrowArray` Python class that implements the Arrow PyCapsule Interface
1112

1213
## Features
1314

@@ -17,6 +18,7 @@ The Sparrow PyCapsule Interface - A C++ library for exchanging Apache Arrow data
1718
-**Compatible with Polars, PyArrow, pandas** and other Arrow-based libraries
1819
-**Bidirectional** data flow (C++ ↔ Python)
1920
-**Type-safe** with proper ownership semantics
21+
-**SparrowArray Python class** implementing `__arrow_c_array__` protocol
2022

2123
## Building
2224

@@ -53,62 +55,64 @@ ctest --output-on-failure
5355

5456
## Usage Example
5557

56-
### C++ Side: Exporting Data
58+
### C++ Side: Creating a SparrowArray for Python
5759

5860
```cpp
5961
#include <sparrow-pycapsule/pycapsule.hpp>
62+
#include <sparrow-pycapsule/sparrow_array_python_class.hpp>
6063
#include <sparrow/array.hpp>
6164

6265
// Create a sparrow array
6366
sparrow::array my_array = /* ... */;
6467

65-
// Export to PyCapsules for Python consumption
66-
auto [schema_capsule, array_capsule] =
67-
sparrow::pycapsule::export_array_to_capsules(my_array);
68+
// Create a SparrowArray Python object that implements __arrow_c_array__
69+
PyObject* sparrow_array = sparrow::pycapsule::create_sparrow_array_object(std::move(my_array));
6870

69-
// Pass capsules to Python (via Python C API, pybind11, etc.)
71+
// Return to Python - it can be used directly with Polars, PyArrow, etc.
7072
```
7173

72-
### Python Side: Consuming C++ Data
74+
### Python Side: Using SparrowArray
7375

7476
```python
77+
from test_sparrow_helper import SparrowArray
7578
import polars as pl
7679
import pyarrow as pa
7780

78-
# Receive capsules from C++
79-
# schema_capsule, array_capsule = get_from_cpp()
81+
# Create SparrowArray from any Arrow-compatible object
82+
pa_array = pa.array([1, 2, None, 4, 5], type=pa.int32())
83+
sparrow_array = SparrowArray(pa_array)
8084

81-
# Import into PyArrow
82-
arrow_array = pa.Array._import_from_c_capsule(schema_capsule, array_capsule)
85+
# SparrowArray implements __arrow_c_array__, so it works with Polars
86+
# Using Polars internal API for primitive arrays:
87+
from polars._plr import PySeries
88+
from polars._utils.wrap import wrap_s
8389

84-
# Convert to Polars
85-
series = pl.from_arrow(arrow_array)
90+
ps = PySeries.from_arrow_c_array(sparrow_array)
91+
series = wrap_s(ps)
92+
print(series) # shape: (5,), dtype: Int32
8693

87-
# Use in Polars DataFrame
88-
df = pl.DataFrame({"my_column": series})
94+
# Get array size
95+
print(sparrow_array.size()) # 5
8996
```
9097

9198
### Python Side: Exporting to C++
9299

93100
```python
94-
import polars as pl
95-
96-
# Create Polars data
97-
series = pl.Series([1, 2, None, 4, 5])
101+
import pyarrow as pa
98102

99-
# Convert to Arrow and export as capsules
100-
arrow_array = series.to_arrow()
101-
schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
103+
# Any object implementing __arrow_c_array__ can be imported by sparrow
104+
arrow_array = pa.array([1, 2, None, 4, 5])
102105

103-
# Pass to C++
106+
# The SparrowArray constructor accepts any ArrowArrayExportable
107+
sparrow_array = SparrowArray(arrow_array)
104108
```
105109

106110
### C++ Side: Importing from Python
107111

108112
```cpp
109113
#include <sparrow-pycapsule/pycapsule.hpp>
110114

111-
// Receive capsules from Python
115+
// Receive capsules from Python (e.g., from __arrow_c_array__)
112116
PyObject* schema_capsule = /* ... */;
113117
PyObject* array_capsule = /* ... */;
114118

@@ -121,6 +125,27 @@ sparrow::array imported_array =
121125
std::cout << "Array size: " << imported_array.size() << std::endl;
122126
```
123127
128+
## SparrowArray Python Class
129+
130+
The `SparrowArray` class is a Python type implemented in C++ that:
131+
132+
- **Wraps a sparrow array** and exposes it to Python
133+
- **Implements `__arrow_c_array__`** (ArrowArrayExportable protocol)
134+
- **Accepts any ArrowArrayExportable** in its constructor (PyArrow, Polars, etc.)
135+
- **Provides a `size()` method** to get the number of elements
136+
137+
```python
138+
# Constructor accepts any object with __arrow_c_array__
139+
sparrow_array = SparrowArray(pyarrow_array)
140+
sparrow_array = SparrowArray(another_sparrow_array)
141+
142+
# Implements ArrowArrayExportable protocol
143+
schema_capsule, array_capsule = sparrow_array.__arrow_c_array__()
144+
145+
# Get array size
146+
n = sparrow_array.size()
147+
```
148+
124149
## Testing
125150

126151
### C++ Unit Tests
@@ -130,13 +155,12 @@ cd build
130155
./bin/Debug/test_sparrow_pycapsule_lib
131156
```
132157

133-
### Polars Integration Tests
158+
### Integration Tests
134159

135-
Test bidirectional data exchange with Polars:
160+
Test bidirectional data exchange with Polars and PyArrow:
136161

137162
```bash
138-
139-
# Or with direct execution (better output)
163+
# Run integration tests (recommended)
140164
cmake --build . --target run_polars_tests_direct
141165

142166
# Check dependencies first
@@ -153,35 +177,39 @@ The project provides several convenient CMake targets for testing:
153177
|--------|-------------|
154178
| `run_tests` | Run all C++ unit tests |
155179
| `run_tests_with_junit_report` | Run C++ tests with JUnit XML output |
156-
| `run_polars_tests_direct` | Run Polars test directly (recommended, better output) |
180+
| `run_polars_tests_direct` | Run integration tests (recommended) |
157181
| `check_polars_deps` | Check Python dependencies (polars, pyarrow) |
182+
| `test_library_load` | Debug library loading issues |
158183

159184
**Usage:**
160185
```bash
161186
cd build
162187

163-
# Run Polars integration tests
188+
# Run integration tests
164189
cmake --build . --target run_polars_tests_direct
165190

166191
# Check dependencies first
167192
cmake --build . --target check_polars_deps
168193
```
169194

170-
### Debugging Test Failures
195+
## API Reference
171196

172-
If you encounter segmentation faults or other issues:
197+
### SparrowArray Python Class
173198

174-
```bash
175-
cd build
199+
```cpp
200+
// Create a SparrowArray Python object from a sparrow::array
201+
PyObject* create_sparrow_array_object(sparrow::array&& arr);
176202

177-
# Run minimal library loading test (step-by-step debugging)
178-
cmake --build . --target test_library_load
203+
// Create a SparrowArray from PyCapsules
204+
PyObject* create_sparrow_array_object_from_capsules(
205+
PyObject* schema_capsule, PyObject* array_capsule);
179206

180-
# Check that libraries exist and dependencies are correct
181-
cmake --build . --target check_polars_deps
182-
```
207+
// Register SparrowArray type with a Python module
208+
int register_sparrow_array_type(PyObject* module);
183209

184-
## API Reference
210+
// Get the SparrowArray type object
211+
PyTypeObject* get_sparrow_array_type();
212+
```
185213
186214
### Export Functions
187215
@@ -197,9 +225,6 @@ cmake --build . --target check_polars_deps
197225
198226
### Memory Management
199227
200-
- `release_arrow_schema_pycapsule(PyObject* capsule)` - PyCapsule destructor for schema
201-
- `release_arrow_array_pycapsule(PyObject* capsule)` - PyCapsule destructor for array
202-
203228
All capsules have destructors that properly clean up Arrow structures.
204229
205230
## Supported Data Types
@@ -216,27 +241,39 @@ All types support nullable values via the Arrow null bitmap.
216241
## Integration with Python Libraries
217242
218243
### Polars
244+
219245
```python
220-
series = pl.Series([1, 2, 3])
221-
arrow_array = series.to_arrow()
222-
schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
223-
# Pass to C++
246+
from polars._plr import PySeries
247+
from polars._utils.wrap import wrap_s
248+
249+
# SparrowArray implements __arrow_c_array__, use Polars internal API
250+
sparrow_array = SparrowArray(some_arrow_array)
251+
ps = PySeries.from_arrow_c_array(sparrow_array)
252+
series = wrap_s(ps)
224253
```
225254

226255
### PyArrow
256+
227257
```python
228-
arrow_array = pa.array([1, 2, 3])
229-
schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
230-
# Pass to C++
258+
import pyarrow as pa
259+
260+
# Create SparrowArray from PyArrow
261+
pa_array = pa.array([1, 2, 3])
262+
sparrow_array = SparrowArray(pa_array)
263+
264+
# Export back to PyArrow
265+
schema_capsule, array_capsule = sparrow_array.__arrow_c_array__()
231266
```
232267

233268
### pandas (via PyArrow)
269+
234270
```python
235271
import pandas as pd
272+
import pyarrow as pa
273+
236274
series = pd.Series([1, 2, 3])
237275
arrow_array = pa.Array.from_pandas(series)
238-
schema_capsule, array_capsule = arrow_array.__arrow_c_array__()
239-
# Pass to C++
276+
sparrow_array = SparrowArray(arrow_array)
240277
```
241278

242279
## License

0 commit comments

Comments
 (0)