|
1 | | -# sparrow-ipc |
| 1 | +# sparrow-ipc |
| 2 | + |
| 3 | +[](https://github.com/quantstack/sparrow-ipc/actions/workflows/linux.yml) |
| 4 | +[](https://github.com/quantstack/sparrow-ipc/actions/workflows/osx.yml) |
| 5 | +[](https://github.com/quantstack/sparrow-ipc/actions/workflows/windows.yml) |
| 6 | +[](https://github.com/quantstack/sparrow-ipc/actions/workflows/deploy-pages.yaml) |
| 7 | + |
| 8 | +**!!!Sparrow-IPC is still under development and is not ready for production use!!!** |
| 9 | + |
| 10 | +**!!!The documentation is still under development and may be incomplete or contain errors!!!** |
| 11 | + |
| 12 | +## Introduction |
| 13 | + |
| 14 | +`sparrow-ipc` provides high-performance, **zero-copy** serialization and deserialization of record batches, adhering to both [sparrow](https://github.com/man-group/sparrow) and [Apache Arrow IPC specifications](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc). |
| 15 | + |
| 16 | +`sparrow-ipc` requires a modern C++ compiler supporting C++20. |
| 17 | + |
| 18 | +## Installation |
| 19 | + |
| 20 | + |
| 21 | +### Install from sources |
| 22 | + |
| 23 | +`sparrow-ipc` has a few dependencies that you can install in a mamba environment: |
| 24 | + |
| 25 | +```bash |
| 26 | +mamba env create -f environment-dev.yml |
| 27 | +mamba activate sparrow-ipc |
| 28 | +``` |
| 29 | + |
| 30 | +You can then create a build directory, and build the project and install it with cmake: |
| 31 | + |
| 32 | +```bash |
| 33 | +mkdir build |
| 34 | +de build |
| 35 | +cmake .. \ |
| 36 | + -DCMAKE_BUILD_TYPE=Debug \ |
| 37 | + -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ |
| 38 | + -DCMAKE_PREFIX_PATH=$CONDA_PREFIX \ |
| 39 | + -DSPARROW_IPC_BUILD_TESTS=ON \ |
| 40 | + -DSPARROW_IPC_BUILD_EXAMPLES=ON |
| 41 | + |
| 42 | +make install |
| 43 | +``` |
| 44 | + |
| 45 | +## Usage |
| 46 | + |
| 47 | +### Requirements |
| 48 | + |
| 49 | +Compilers: |
| 50 | +- Clang 18 or higher |
| 51 | +- GCC 11.2 or higher |
| 52 | +- Apple Clang 16 or higher |
| 53 | +- MSVC 19.41 or higher |
| 54 | + |
| 55 | +### Serialize record batches to a memory stream |
| 56 | + |
| 57 | +```cpp |
| 58 | +#include <vector> |
| 59 | +#include <sparrow_ipc/memory_output_stream.hpp> |
| 60 | +#include <sparrow_ipc/serializer.hpp> |
| 61 | +#include <sparrow/record_batch.hpp> |
| 62 | + |
| 63 | +namespace sp = sparrow; |
| 64 | +namespace sp_ipc = sparrow_ipc; |
| 65 | + |
| 66 | +std::vector<uint8_t> serialize_batches_to_stream(const std::vector<sp::record_batch>& batches) |
| 67 | +{ |
| 68 | + std::vector<uint8_t> stream_data; |
| 69 | + sp_ipc::memory_output_stream stream(stream_data); |
| 70 | + sp_ipc::serializer serializer(stream); |
| 71 | + |
| 72 | + // Serialize all batches using the streaming operator |
| 73 | + serializer << batches << sp_ipc::end_stream; |
| 74 | + |
| 75 | + return stream_data; |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +### Pipe a source of record batches to a stream |
| 80 | + |
| 81 | +```cpp |
| 82 | +#include <optional> |
| 83 | +#include <ostream> |
| 84 | +#include <vector> |
| 85 | +#include <sparrow_ipc/memory_output_stream.hpp> |
| 86 | +#include <sparrow_ipc/serializer.hpp> |
| 87 | +#include <sparrow/record_batch.hpp> |
| 88 | + |
| 89 | +namespace sp = sparrow; |
| 90 | + |
| 91 | +class record_batch_source |
| 92 | +{ |
| 93 | +public: |
| 94 | + std::optional<sp::record_batch> next(); |
| 95 | +}; |
| 96 | + |
| 97 | +void stream_record_batches(std::ostream& os, record_batch_source& source) |
| 98 | +{ |
| 99 | + sp::serializer serial(os); |
| 100 | + std::optional<sp::record_batch> batch = std::nullopt; |
| 101 | + while (batch = source.next()) |
| 102 | + { |
| 103 | + serial << batch; |
| 104 | + } |
| 105 | + serial << sp_ipc::end_stream; |
| 106 | +} |
| 107 | +``` |
| 108 | +
|
| 109 | +### Deserialize a stream into record batches |
| 110 | +
|
| 111 | +```cpp |
| 112 | +#include <vector> |
| 113 | +#include <sparrow_ipc/deserializer.hpp> |
| 114 | +#include <sparrow/record_batch.hpp> |
| 115 | +
|
| 116 | +namespace sp = sparrow; |
| 117 | +namespace sp_ipc = sparrow_ipc; |
| 118 | +
|
| 119 | +std::vector<sp::record_batch> deserialize_stream_to_batches(const std::vector<uint8_t>& stream_data) |
| 120 | +{ |
| 121 | + auto batches = sp_ipc::deserialize_stream(stream_data); |
| 122 | + return batches; |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +## Documentation |
| 127 | + |
| 128 | +The documentation (currently being written) can be found at https://quantstack.github.io/sparrow-ipc/index.html |
| 129 | + |
| 130 | +## Acknowledgements |
| 131 | + |
| 132 | +This project is developed by [QuantStack](quantstack.net), building on the foundations laid by the sparrow library and the Apache Arrow project. |
| 133 | + |
| 134 | +## License |
| 135 | + |
| 136 | +This software is licensed under the BSD-3-Clause license. See the [LICENSE](LICENSE) file for details. |
0 commit comments