-
Notifications
You must be signed in to change notification settings - Fork 4
Added README #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Added README #34
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,135 @@ | ||
| # sparrow-ipc | ||
| # sparrow-ipc | ||
|
|
||
| [](https://github.com/quantstack/sparrow-ipc/actions/workflows/linux.yml) | ||
| [](https://github.com/quantstack/sparrow-ipc/actions/workflows/osx.yml) | ||
| [](https://github.com/quantstack/sparrow-ipc/actions/workflows/windows.yml) | ||
|
|
||
| **!!!Sparrow-IPC is still under development and is not ready for production use!!!** | ||
|
|
||
| **!!!The documentation is still under development and may be incomplete or contain errors!!!** | ||
|
|
||
| ## Introduction | ||
|
|
||
| `sparrow-ipc` provides high-performance, **zero-copy** serialization and deserialization of record batches, adhering to both [sparrow](https://github.com/man-group/sparrow) and [Apache Arrow IPC specifications](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc). | ||
|
|
||
| `sparrow-ipc` requires a modern C++ compiler supporting C++20. | ||
|
|
||
| ## Installation | ||
|
|
||
|
|
||
| ### Install from sources | ||
|
|
||
| `sparrow-ipc` has a few dependencies that you can install in a mamba environment: | ||
|
|
||
| ```bash | ||
| mamba env create -f environment-dev.yml | ||
| mamba activate sparrow-ipc | ||
| ``` | ||
|
|
||
| You can then create a build directory, and build the project and install it with cmake: | ||
|
|
||
| ```bash | ||
| mkdir build | ||
| de build | ||
| cmake .. \ | ||
| -DCMAKE_BUILD_TYPE=Debug \ | ||
| -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \ | ||
| -DCMAKE_PREFIX_PATH=$CONDA_PREFIX \ | ||
| -DSPARROW_IPC_BUILD_TESTS=ON \ | ||
| -DSPARROW_IPC_BUILD_EXAMPLES=ON | ||
|
|
||
| make install | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Requirements | ||
|
|
||
| Compilers: | ||
| - Clang 18 or higher | ||
| - GCC 11.2 or higher | ||
| - Apple Clang 16 or higher | ||
| - MSVC 19.41 or higher | ||
|
|
||
| ### Serialize record batches to a memory stream | ||
|
|
||
| ```cpp | ||
| #include <vector> | ||
| #include <sparrow_ipc/memory_output_stream.hpp> | ||
| #include <sparrow_ipc/serializer.hpp> | ||
| #include <sparrow/record_batch.hpp> | ||
|
|
||
| namespace sp = sparrow; | ||
| namespace sp_ipc = sparrow_ipc; | ||
|
|
||
| std::vector<uint8_t> serialize_batches_to_stream(const std::vector<sp::record_batch>& batches) | ||
| { | ||
| std::vector<uint8_t> stream_data; | ||
| sp_ipc::memory_output_stream stream(stream_data); | ||
| sp_ipc::serializer serializer(stream); | ||
|
|
||
| // Serialize all batches using the streaming operator | ||
| serializer << batches << sp_ipc::end_stream; | ||
|
|
||
| return stream_data; | ||
| } | ||
| ``` | ||
|
|
||
| ### Pipe a source of record batches to a stream | ||
|
|
||
| ```cpp | ||
| #include <optional> | ||
| #include <ostream> | ||
| #include <vector> | ||
| #include <sparrow_ipc/memory_output_stream.hpp> | ||
| #include <sparrow_ipc/serializer.hpp> | ||
| #include <sparrow/record_batch.hpp> | ||
|
|
||
| namespace sp = sparrow; | ||
|
|
||
| class record_batch_source | ||
| { | ||
| public: | ||
| std::optional<sp::record_batch> next(); | ||
| }; | ||
|
|
||
| void stream_record_batches(std::ostream& os, record_batch_source& source) | ||
| { | ||
| sp::serializer serial(os); | ||
| std::optional<sp::record_batch> batch = std::nullopt; | ||
| while (batch = source.next()) | ||
| { | ||
| serial << batch; | ||
| } | ||
| serial << sp_ipc::end_stream; | ||
| } | ||
| ``` | ||
|
|
||
| ### Deserialize a stream into record batches | ||
|
|
||
| ```cpp | ||
| #include <vector> | ||
| #include <sparrow_ipc/deserializer.hpp> | ||
| #include <sparrow/record_batch.hpp> | ||
|
|
||
| namespace sp = sparrow; | ||
| namespace sp_ipc = sparrow_ipc; | ||
|
|
||
| std::vector<sp::record_batch> deserialize_stream_to_batches(const std::vector<uint8_t>& stream_data) | ||
| { | ||
| auto batches = sp_ipc::deserialize_stream(stream_data); | ||
| return batches; | ||
| } | ||
| ``` | ||
|
|
||
| ## Documentation | ||
|
|
||
| The documentation (currently being written) can be found at https://quantstack.github.io/sparrow-ipc/index.html | ||
|
|
||
| ## Acknowledgements | ||
|
|
||
| This project is developed by [QuantStack](quantstack.net), building on the foundations laid by the sparrow library and the Apache Arrow project. | ||
|
|
||
| ## License | ||
|
|
||
| This software is licensed under the BSD-3-Clause license. See the [LICENSE](LICENSE) file for details. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero copy except in the case we use the compression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then let's remove the zero-copy mention.