Skip to content

Commit 258be86

Browse files
authored
Add polars tests (#4)
Add polars tests
1 parent 756a767 commit 258be86

File tree

14 files changed

+1190
-10
lines changed

14 files changed

+1190
-10
lines changed

.github/workflows/linux.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,11 @@ jobs:
5050
working-directory: build
5151
run: cmake --build . --target run_tests_with_junit_report
5252

53+
- name: Run sparrow integration tests
54+
if: matrix.build_shared == 'ON'
55+
working-directory: build
56+
run: cmake --build . --target run_sparrow_tests_direct
57+
5358
- name: Install
5459
working-directory: build
5560
run: cmake --install .

.github/workflows/osx.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,11 @@ jobs:
5555
working-directory: build
5656
run: cmake --build . --target run_tests_with_junit_report
5757

58+
# - name: Run Sparrow integration tests
59+
# if: matrix.build_shared == 'ON'
60+
# working-directory: build
61+
# run: cmake --build . --target run_sparrow_tests_direct
62+
5863
- name: Install
5964
working-directory: build
6065
run: cmake --install .

.github/workflows/windows.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,11 @@ jobs:
5555
run: |
5656
cmake --build . --config ${{ matrix.build_type }} --target run_tests_with_junit_report
5757
58+
- name: Run Sparrow integration tests
59+
if: matrix.build_shared == 'ON'
60+
working-directory: build
61+
run: cmake --build . --config ${{ matrix.build_type }} --target run_sparrow_tests_direct
62+
5863
- name: Install
5964
working-directory: build
6065
run: cmake --install . --config ${{ matrix.build_type }}

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
/build
22
/.vscode
3+
*.pyc

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,10 +151,12 @@ set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY_RELEASE "${BINARY_BUILD_DIR}")
151151
set(SPARROW_PYCAPSULE_HEADERS
152152
${SPARROW_PYCAPSULE_INCLUDE_DIR}/sparrow-pycapsule/config/sparrow_pycapsule_version.hpp
153153
${SPARROW_PYCAPSULE_INCLUDE_DIR}/sparrow-pycapsule/pycapsule.hpp
154+
${SPARROW_PYCAPSULE_INCLUDE_DIR}/sparrow-pycapsule/sparrow_array_python_class.hpp
154155
)
155156

156157
set(SPARROW_PYCAPSULE_SOURCES
157158
src/pycapsule.cpp
159+
src/sparrow_array_python_class.cpp
158160
)
159161

160162
option(SPARROW_PYCAPSULE_BUILD_SHARED "Build sparrow pycapsule as a shared library" ON)

README.md

Lines changed: 294 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,295 @@
11
# sparrow-pycapsule
2-
The Sparrow PyCapsuleInterface
2+
3+
The Sparrow PyCapsule Interface - A C++ library for exchanging Apache Arrow data between C++ and Python using the Arrow C Data Interface via PyCapsules.
4+
5+
## Overview
6+
7+
`sparrow-pycapsule` provides a clean C++ API for:
8+
- Exporting sparrow arrays to Python as PyCapsules (Arrow C Data Interface)
9+
- Importing Arrow data from Python PyCapsules into sparrow arrays
10+
- Zero-copy data exchange with Python libraries like Polars, PyArrow, and pandas
11+
- A `SparrowArray` Python class that implements the Arrow PyCapsule Interface
12+
13+
## Features
14+
15+
-**Zero-copy data exchange** between C++ and Python
16+
-**Arrow C Data Interface** compliant
17+
-**PyCapsule-based** for safe memory management
18+
-**Compatible with Polars, PyArrow, pandas** and other Arrow-based libraries
19+
-**Bidirectional** data flow (C++ ↔ Python)
20+
-**Type-safe** with proper ownership semantics
21+
-**SparrowArray Python class** implementing `__arrow_c_array__` protocol
22+
23+
## Building
24+
25+
### Prerequisites
26+
27+
```bash
28+
# Using conda (recommended)
29+
conda env create -f environment-dev.yml
30+
conda activate sparrow-pycapsule
31+
32+
# Or install manually
33+
# - CMake >= 3.28
34+
# - C++20 compiler
35+
# - Python 3.x with development headers
36+
# - sparrow library
37+
```
38+
39+
### Build Instructions
40+
41+
```bash
42+
mkdir build && cd build
43+
cmake .. -DCMAKE_BUILD_TYPE=Release
44+
cmake --build .
45+
```
46+
47+
### Build with Tests
48+
49+
```bash
50+
mkdir build && cd build
51+
cmake .. -DSPARROW_PYCAPSULE_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Debug
52+
cmake --build .
53+
ctest --output-on-failure
54+
```
55+
56+
## Usage Example
57+
58+
### C++ Side: Creating a SparrowArray for Python
59+
60+
```cpp
61+
#include <sparrow-pycapsule/pycapsule.hpp>
62+
#include <sparrow-pycapsule/sparrow_array_python_class.hpp>
63+
#include <sparrow/array.hpp>
64+
65+
// Create a sparrow array
66+
sparrow::array my_array = /* ... */;
67+
68+
// Create a SparrowArray Python object that implements __arrow_c_array__
69+
PyObject* sparrow_array = sparrow::pycapsule::create_sparrow_array_object(std::move(my_array));
70+
71+
// Return to Python - it can be used directly with Polars, PyArrow, etc.
72+
```
73+
74+
### Python Side: Using SparrowArray
75+
76+
```python
77+
from test_sparrow_helper import SparrowArray
78+
import polars as pl
79+
import pyarrow as pa
80+
81+
# Create SparrowArray from any Arrow-compatible object
82+
pa_array = pa.array([1, 2, None, 4, 5], type=pa.int32())
83+
sparrow_array = SparrowArray(pa_array)
84+
85+
# SparrowArray implements __arrow_c_array__, so it works with Polars
86+
# Using Polars internal API for primitive arrays:
87+
from polars._plr import PySeries
88+
from polars._utils.wrap import wrap_s
89+
90+
ps = PySeries.from_arrow_c_array(sparrow_array)
91+
series = wrap_s(ps)
92+
print(series) # shape: (5,), dtype: Int32
93+
94+
# Get array size
95+
print(sparrow_array.size()) # 5
96+
```
97+
98+
### Python Side: Exporting to C++
99+
100+
```python
101+
import pyarrow as pa
102+
103+
# Any object implementing __arrow_c_array__ can be imported by sparrow
104+
arrow_array = pa.array([1, 2, None, 4, 5])
105+
106+
# The SparrowArray constructor accepts any ArrowArrayExportable
107+
sparrow_array = SparrowArray(arrow_array)
108+
```
109+
110+
### C++ Side: Importing from Python
111+
112+
```cpp
113+
#include <sparrow-pycapsule/pycapsule.hpp>
114+
115+
// Receive capsules from Python (e.g., from __arrow_c_array__)
116+
PyObject* schema_capsule = /* ... */;
117+
PyObject* array_capsule = /* ... */;
118+
119+
// Import into sparrow array
120+
sparrow::array imported_array =
121+
sparrow::pycapsule::import_array_from_capsules(
122+
schema_capsule, array_capsule);
123+
124+
// Use the array
125+
std::cout << "Array size: " << imported_array.size() << std::endl;
126+
```
127+
128+
## SparrowArray Python Class
129+
130+
The `SparrowArray` class is a Python type implemented in C++ that:
131+
132+
- **Wraps a sparrow array** and exposes it to Python
133+
- **Implements `__arrow_c_array__`** (ArrowArrayExportable protocol)
134+
- **Accepts any ArrowArrayExportable** in its constructor (PyArrow, Polars, etc.)
135+
- **Provides a `size()` method** to get the number of elements
136+
137+
```python
138+
# Constructor accepts any object with __arrow_c_array__
139+
sparrow_array = SparrowArray(pyarrow_array)
140+
sparrow_array = SparrowArray(another_sparrow_array)
141+
142+
# Implements ArrowArrayExportable protocol
143+
schema_capsule, array_capsule = sparrow_array.__arrow_c_array__()
144+
145+
# Get array size
146+
n = sparrow_array.size()
147+
```
148+
149+
## Testing
150+
151+
### C++ Unit Tests
152+
153+
```bash
154+
cd build
155+
./bin/Debug/test_sparrow_pycapsule_lib
156+
```
157+
158+
### Integration Tests
159+
160+
Test bidirectional data exchange with Polars and PyArrow:
161+
162+
```bash
163+
# Run integration tests (recommended)
164+
cmake --build . --target run_polars_tests_direct
165+
166+
# Check dependencies first
167+
cmake --build . --target check_polars_deps
168+
```
169+
170+
See [test/README_POLARS_TESTS.md](test/README_POLARS_TESTS.md) for detailed documentation.
171+
172+
## CMake Targets
173+
174+
The project provides several convenient CMake targets for testing:
175+
176+
| Target | Description |
177+
|--------|-------------|
178+
| `run_tests` | Run all C++ unit tests |
179+
| `run_tests_with_junit_report` | Run C++ tests with JUnit XML output |
180+
| `run_polars_tests_direct` | Run integration tests (recommended) |
181+
| `check_polars_deps` | Check Python dependencies (polars, pyarrow) |
182+
| `test_library_load` | Debug library loading issues |
183+
184+
**Usage:**
185+
```bash
186+
cd build
187+
188+
# Run integration tests
189+
cmake --build . --target run_polars_tests_direct
190+
191+
# Check dependencies first
192+
cmake --build . --target check_polars_deps
193+
```
194+
195+
## API Reference
196+
197+
### SparrowArray Python Class
198+
199+
```cpp
200+
// Create a SparrowArray Python object from a sparrow::array
201+
PyObject* create_sparrow_array_object(sparrow::array&& arr);
202+
203+
// Create a SparrowArray from PyCapsules
204+
PyObject* create_sparrow_array_object_from_capsules(
205+
PyObject* schema_capsule, PyObject* array_capsule);
206+
207+
// Register SparrowArray type with a Python module
208+
int register_sparrow_array_type(PyObject* module);
209+
210+
// Get the SparrowArray type object
211+
PyTypeObject* get_sparrow_array_type();
212+
```
213+
214+
### Export Functions
215+
216+
- `export_arrow_schema_pycapsule(array& arr)` - Export schema to PyCapsule
217+
- `export_arrow_array_pycapsule(array& arr)` - Export array data to PyCapsule
218+
- `export_array_to_capsules(array& arr)` - Export both schema and array (recommended)
219+
220+
### Import Functions
221+
222+
- `get_arrow_schema_pycapsule(PyObject* capsule)` - Get ArrowSchema pointer from capsule
223+
- `get_arrow_array_pycapsule(PyObject* capsule)` - Get ArrowArray pointer from capsule
224+
- `import_array_from_capsules(PyObject* schema, PyObject* array)` - Import complete array
225+
226+
### Memory Management
227+
228+
All capsules have destructors that properly clean up Arrow structures.
229+
230+
## Supported Data Types
231+
232+
The library supports all Arrow data types that sparrow supports:
233+
- Integer types (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64)
234+
- Floating point (Float32, Float64)
235+
- Boolean
236+
- String (UTF-8)
237+
- And more...
238+
239+
All types support nullable values via the Arrow null bitmap.
240+
241+
## Integration with Python Libraries
242+
243+
### Polars
244+
245+
```python
246+
from polars._plr import PySeries
247+
from polars._utils.wrap import wrap_s
248+
249+
# SparrowArray implements __arrow_c_array__, use Polars internal API
250+
sparrow_array = SparrowArray(some_arrow_array)
251+
ps = PySeries.from_arrow_c_array(sparrow_array)
252+
series = wrap_s(ps)
253+
```
254+
255+
### PyArrow
256+
257+
```python
258+
import pyarrow as pa
259+
260+
# Create SparrowArray from PyArrow
261+
pa_array = pa.array([1, 2, 3])
262+
sparrow_array = SparrowArray(pa_array)
263+
264+
# Export back to PyArrow
265+
schema_capsule, array_capsule = sparrow_array.__arrow_c_array__()
266+
```
267+
268+
### pandas (via PyArrow)
269+
270+
```python
271+
import pandas as pd
272+
import pyarrow as pa
273+
274+
series = pd.Series([1, 2, 3])
275+
arrow_array = pa.Array.from_pandas(series)
276+
sparrow_array = SparrowArray(arrow_array)
277+
```
278+
279+
## License
280+
281+
See [LICENSE](LICENSE) file for details.
282+
283+
## Contributing
284+
285+
Contributions are welcome! Please ensure:
286+
- Code follows the existing style
287+
- All tests pass (`ctest --output-on-failure`)
288+
- New features include tests
289+
- Documentation is updated
290+
291+
## Related Projects
292+
293+
- [sparrow](https://github.com/man-group/sparrow) - Modern C++ library for Apache Arrow
294+
- [Apache Arrow](https://arrow.apache.org/) - Cross-language development platform
295+
- [Polars](https://www.pola.rs/) - Fast DataFrame library

environment-dev.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ dependencies:
1010
- python
1111
# Tests
1212
- doctest
13+
- polars
14+
- pyarrow
15+
- pytest
1316
# Documentation
1417
- doxygen
1518
- graphviz

0 commit comments

Comments
 (0)