Skip to content

Commit fbdc234

Browse files
authored
Added README (#34)
* Added README * Addressed review comments
1 parent 93aaf3e commit fbdc234

File tree

1 file changed

+135
-1
lines changed

1 file changed

+135
-1
lines changed

README.md

Lines changed: 135 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,135 @@
1-
# sparrow-ipc
1+
# sparrow-ipc
2+
3+
[![GHA Linux](https://github.com/quantstack/sparrow-ipc/actions/workflows/linux.yml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/linux.yml)
4+
[![GHA OSX](https://github.com/quantstack/sparrow-ipc/actions/workflows/osx.yml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/osx.yml)
5+
[![GHA Windows](https://github.com/quantstack/sparrow-ipc/actions/workflows/windows.yml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/windows.yml)
6+
7+
**!!!Sparrow-IPC is still under development and is not ready for production use!!!**
8+
9+
**!!!The documentation is still under development and may be incomplete or contain errors!!!**
10+
11+
## Introduction
12+
13+
`sparrow-ipc` provides high-performance serialization and deserialization of record batches, adhering to both [sparrow](https://github.com/man-group/sparrow) and [Apache Arrow IPC specifications](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc).
14+
15+
`sparrow-ipc` requires a modern C++ compiler supporting C++20.
16+
17+
## Installation
18+
19+
20+
### Install from sources
21+
22+
`sparrow-ipc` has a few dependencies that you can install in a mamba environment:
23+
24+
```bash
25+
mamba env create -f environment-dev.yml
26+
mamba activate sparrow-ipc
27+
```
28+
29+
You can then create a build directory, and build the project and install it with cmake:
30+
31+
```bash
32+
mkdir build
33+
de build
34+
cmake .. \
35+
-DCMAKE_BUILD_TYPE=Debug \
36+
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
37+
-DCMAKE_PREFIX_PATH=$CONDA_PREFIX \
38+
-DSPARROW_IPC_BUILD_TESTS=ON \
39+
-DSPARROW_IPC_BUILD_EXAMPLES=ON
40+
41+
make install
42+
```
43+
44+
## Usage
45+
46+
### Requirements
47+
48+
Compilers:
49+
- Clang 18 or higher
50+
- GCC 11.2 or higher
51+
- Apple Clang 16 or higher
52+
- MSVC 19.41 or higher
53+
54+
### Serialize record batches to a memory stream
55+
56+
```cpp
57+
#include <vector>
58+
#include <sparrow_ipc/memory_output_stream.hpp>
59+
#include <sparrow_ipc/serializer.hpp>
60+
#include <sparrow/record_batch.hpp>
61+
62+
namespace sp = sparrow;
63+
namespace sp_ipc = sparrow_ipc;
64+
65+
std::vector<uint8_t> serialize_batches_to_stream(const std::vector<sp::record_batch>& batches)
66+
{
67+
std::vector<uint8_t> stream_data;
68+
sp_ipc::memory_output_stream stream(stream_data);
69+
sp_ipc::serializer serializer(stream);
70+
71+
// Serialize all batches using the streaming operator
72+
serializer << batches << sp_ipc::end_stream;
73+
74+
return stream_data;
75+
}
76+
```
77+
78+
### Pipe a source of record batches to a stream
79+
80+
```cpp
81+
#include <optional>
82+
#include <ostream>
83+
#include <vector>
84+
#include <sparrow_ipc/memory_output_stream.hpp>
85+
#include <sparrow_ipc/serializer.hpp>
86+
#include <sparrow/record_batch.hpp>
87+
88+
namespace sp = sparrow;
89+
90+
class record_batch_source
91+
{
92+
public:
93+
std::optional<sp::record_batch> next();
94+
};
95+
96+
void stream_record_batches(std::ostream& os, record_batch_source& source)
97+
{
98+
sp::serializer serial(os);
99+
std::optional<sp::record_batch> batch = std::nullopt;
100+
while (batch = source.next())
101+
{
102+
serial << batch;
103+
}
104+
serial << sp_ipc::end_stream;
105+
}
106+
```
107+
108+
### Deserialize a stream into record batches
109+
110+
```cpp
111+
#include <vector>
112+
#include <sparrow_ipc/deserializer.hpp>
113+
#include <sparrow/record_batch.hpp>
114+
115+
namespace sp = sparrow;
116+
namespace sp_ipc = sparrow_ipc;
117+
118+
std::vector<sp::record_batch> deserialize_stream_to_batches(const std::vector<uint8_t>& stream_data)
119+
{
120+
auto batches = sp_ipc::deserialize_stream(stream_data);
121+
return batches;
122+
}
123+
```
124+
125+
## Documentation
126+
127+
The documentation (currently being written) can be found at https://quantstack.github.io/sparrow-ipc/index.html
128+
129+
## Acknowledgements
130+
131+
This project is developed by [QuantStack](quantstack.net), building on the foundations laid by the sparrow library and the Apache Arrow project.
132+
133+
## License
134+
135+
This software is licensed under the BSD-3-Clause license. See the [LICENSE](LICENSE) file for details.

0 commit comments

Comments
 (0)