Skip to content

Commit 8f1b23b

Browse files
committed
Added README
1 parent 93aaf3e commit 8f1b23b

File tree

1 file changed

+136
-1
lines changed

1 file changed

+136
-1
lines changed

README.md

Lines changed: 136 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,136 @@
1-
# sparrow-ipc
1+
# sparrow-ipc
2+
3+
[![GHA Linux](https://github.com/quantstack/sparrow-ipc/actions/workflows/linux.yml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/linux.yml)
4+
[![GHA OSX](https://github.com/quantstack/sparrow-ipc/actions/workflows/osx.yml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/osx.yml)
5+
[![GHA Windows](https://github.com/quantstack/sparrow-ipc/actions/workflows/windows.yml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/windows.yml)
6+
[![GHA Docs](https://github.com/quantstack/sparrow-ipc/actions/workflows/deploy-pages.yaml/badge.svg)](https://github.com/quantstack/sparrow-ipc/actions/workflows/deploy-pages.yaml)
7+
8+
**!!!Sparrow-IPC is still under development and is not ready for production use!!!**
9+
10+
**!!!The documentation is still under development and may be incomplete or contain errors!!!**
11+
12+
## Introduction
13+
14+
`sparrow-ipc` provides high-performance, **zero-copy** serialization and deserialization of record batches, adhering to both [sparrow](https://github.com/man-group/sparrow) and [Apache Arrow IPC specifications](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc).
15+
16+
`sparrow-ipc` requires a modern C++ compiler supporting C++20.
17+
18+
## Installation
19+
20+
21+
### Install from sources
22+
23+
`sparrow-ipc` has a few dependencies that you can install in a mamba environment:
24+
25+
```bash
26+
mamba env create -f environment-dev.yml
27+
mamba activate sparrow-ipc
28+
```
29+
30+
You can then create a build directory, and build the project and install it with cmake:
31+
32+
```bash
33+
mkdir build
34+
de build
35+
cmake .. \
36+
-DCMAKE_BUILD_TYPE=Debug \
37+
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
38+
-DCMAKE_PREFIX_PATH=$CONDA_PREFIX \
39+
-DSPARROW_IPC_BUILD_TESTS=ON \
40+
-DSPARROW_IPC_BUILD_EXAMPLES=ON
41+
42+
make install
43+
```
44+
45+
## Usage
46+
47+
### Requirements
48+
49+
Compilers:
50+
- Clang 18 or higher
51+
- GCC 11.2 or higher
52+
- Apple Clang 16 or higher
53+
- MSVC 19.41 or higher
54+
55+
### Serialize record batches to a memory stream
56+
57+
```cpp
58+
#include <vector>
59+
#include <sparrow_ipc/memory_output_stream.hpp>
60+
#include <sparrow_ipc/serializer.hpp>
61+
#include <sparrow/record_batch.hpp>
62+
63+
namespace sp = sparrow;
64+
namespace sp_ipc = sparrow_ipc;
65+
66+
std::vector<uint8_t> serialize_batches_to_stream(const std::vector<sp::record_batch>& batches)
67+
{
68+
std::vector<uint8_t> stream_data;
69+
sp_ipc::memory_output_stream stream(stream_data);
70+
sp_ipc::serializer serializer(stream);
71+
72+
// Serialize all batches using the streaming operator
73+
serializer << batches << sp_ipc::end_stream;
74+
75+
return stream_data;
76+
}
77+
```
78+
79+
### Pipe a source of record batches to a stream
80+
81+
```cpp
82+
#include <optional>
83+
#include <ostream>
84+
#include <vector>
85+
#include <sparrow_ipc/memory_output_stream.hpp>
86+
#include <sparrow_ipc/serializer.hpp>
87+
#include <sparrow/record_batch.hpp>
88+
89+
namespace sp = sparrow;
90+
91+
class record_batch_source
92+
{
93+
public:
94+
std::optional<sp::record_batch> next();
95+
};
96+
97+
void stream_record_batches(std::ostream& os, record_batch_source& source)
98+
{
99+
sp::serializer serial(os);
100+
std::optional<sp::record_batch> batch = std::nullopt;
101+
while (batch = source.next())
102+
{
103+
serial << batch;
104+
}
105+
serial << sp_ipc::end_stream;
106+
}
107+
```
108+
109+
### Deserialize a stream into record batches
110+
111+
```cpp
112+
#include <vector>
113+
#include <sparrow_ipc/deserializer.hpp>
114+
#include <sparrow/record_batch.hpp>
115+
116+
namespace sp = sparrow;
117+
namespace sp_ipc = sparrow_ipc;
118+
119+
std::vector<sp::record_batch> deserialize_stream_to_batches(const std::vector<uint8_t>& stream_data)
120+
{
121+
auto batches = sp_ipc::deserialize_stream(stream_data);
122+
return batches;
123+
}
124+
```
125+
126+
## Documentation
127+
128+
The documentation (currently being written) can be found at https://quantstack.github.io/sparrow-ipc/index.html
129+
130+
## Acknowledgements
131+
132+
This project is developed by [QuantStack](quantstack.net), building on the foundations laid by the sparrow library and the Apache Arrow project.
133+
134+
## License
135+
136+
This software is licensed under the BSD-3-Clause license. See the [LICENSE](LICENSE) file for details.

0 commit comments

Comments
 (0)