-
Notifications
You must be signed in to change notification settings - Fork 3
Rework serializing #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
85ef02d
to
f82d723
Compare
|
||
private: | ||
|
||
const uint8_t* m_buf_ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't store the buffer length, how can you make sure you're not accessing memory out of bounds?
const uint8_t* m_buf_ptr; | ||
}; | ||
|
||
[[nodiscard]] EncapsulatedMessage create_encapsulated_message(const uint8_t* buf_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you want an API like:
// Return the encapsulated message and the rest of the span
std::pair<EncapsulatedMessage, std::span<const uint8_t>> extract_encapsulated_message(std::span<const uint8_t>);
namespace sparrow_ipc | ||
{ | ||
template <typename T> | ||
[[nodiscard]] sparrow::primitive_array<T> deserialize_primitive_array_bis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename deserialize_primitive_array_bis
to deserialize_primitive_array_from_record_batch
and deserialize_primitive_array
to deserialize_primitive_array_from_buffer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok renamed
src/encapsulated_message.cpp
Outdated
{ | ||
const size_t offset = sizeof(uint32_t) * 2 // 4 bytes continuation + 4 bytes metadata size | ||
+ metadata_length(); | ||
const size_t padded_offset = (offset + 7) & ~7; // Round up to 8-byte boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use align_to_8
function everywhere instead of replicating it.
(You can change & -8
to & ~7
if preferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/encapsulated_message.cpp
Outdated
{ | ||
const size_t offset = sizeof(uint32_t) * 2 // 4 bytes continuation + 4 bytes metadata size | ||
+ metadata_length(); | ||
const size_t padded_offset = (offset + 7) & ~7; // Round up to 8-byte boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
As a side note, I think in the future, and for relatively big PRs with multiple files like this one, having multiple commits where corresponding messages describe what they are doing would make the review easier. I don't know if the additional code is making the project not buildable anymore, but maybe it's best if we remove all the versioning and install parts to focus on the core changes in this PR. |
Are we planning eventually to use functions from |
9347b29
to
620ea81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a comprehensive rework of the serialization system to improve code organization, add new functionality, and enhance testing infrastructure. The changes restructure the codebase with proper namespacing, introduce new deserialization capabilities, and add extensive integration testing with Arrow data files.
- Reorganized headers and source files with proper
sparrow_ipc
namespace structure - Added new deserialization functionality for streams and various array types
- Introduced comprehensive integration testing with Arrow testing data files
Reviewed Changes
Copilot reviewed 41 out of 44 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
tests/test_utils.cpp | Reformatted test assertions for better readability |
tests/test_primitive_array_with_files.cpp | New integration tests comparing stream vs JSON deserialization |
tests/test_primitive_array_serialization.cpp | Updated includes and minor formatting improvements |
tests/test_null_array_serialization.cpp | Updated includes to use new header structure |
tests/test_arrow_schema.cpp | New comprehensive tests for Arrow schema functionality |
tests/metadata_sample.hpp | New helper for metadata testing with endianness support |
tests/CMakeLists.txt | Added new test files and dependencies |
src/utils.cpp | Updated includes and improved code formatting |
src/serialize_null_array.cpp | Updated to use new deserialization functions |
src/serialize.cpp | Moved deserialization functions and improved formatting |
src/metadata.cpp | New utility for metadata conversion |
src/encapsulated_message.cpp | New class for handling encapsulated Arrow messages |
src/deserialize_utils.cpp | New utilities for deserialization operations |
src/deserialize_fixedsizebinary_array.cpp | New deserialization for fixed-size binary arrays |
src/deserialize.cpp | New comprehensive deserialization implementation |
Multiple header files | Reorganized with proper namespace structure and new functionality |
Comments suppressed due to low confidence (3)
src/utils.cpp:1
- [nitpick] These multi-line trailing comments are hard to read and maintain. Consider moving them above the variable declarations or making them single-line comments.
#include "sparrow_ipc/utils.hpp"
src/utils.cpp:1
- [nitpick] These multi-line trailing comments are hard to read and maintain. Consider moving them above the variable declarations or making them single-line comments.
#include "sparrow_ipc/utils.hpp"
src/utils.cpp:1
- [nitpick] These multi-line trailing comments are hard to read and maintain. Consider moving them above the variable declarations or making them single-line comments.
#include "sparrow_ipc/utils.hpp"
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/utils.cpp
Outdated
const auto map_type = org::apache::arrow::flatbuf::CreateMap(builder, false); // not | ||
// sorted | ||
// keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The multi-line comment is unnecessarily fragmented. Consider using a single-line comment: // not sorted keys
or moving the comment above the line.
const auto map_type = org::apache::arrow::flatbuf::CreateMap(builder, false); // not | |
// sorted | |
// keys | |
const auto map_type = org::apache::arrow::flatbuf::CreateMap(builder, false); // not sorted keys |
Copilot uses AI. Check for mistakes.
@@ -0,0 +1,44 @@ | |||
#include <cstdint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing header guard. Add #pragma once
at the beginning of the file to prevent multiple inclusions.
Copilot uses AI. Check for mistakes.
private: | ||
|
||
std::string m_format; | ||
const char* m_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not storing an optional string instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We keep a direct pointer to the name in the flatbuffer . The format is not stored as a string in the flatbuffer.
constexpr int SPARROW_IPC_VERSION_MINOR = 1; | ||
constexpr int SPARROW_IPC_VERSION_PATCH = 0; | ||
|
||
constexpr int SPARROW_IPC_BINARY_CURRENT = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: I think we can keep 1 for the binary_current version.
|
||
#include <sparrow/record_batch.hpp> | ||
|
||
#include "config/config.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "config/config.hpp" | |
#include "sparrow_ipc/config/config.hpp" |
{ | ||
switch (bit_width) | ||
{ | ||
// clang-format off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
@@ -50,18 +53,22 @@ namespace sparrow_ipc | |||
} | |||
|
|||
template <typename T> | |||
sparrow::primitive_array<T> deserialize_primitive_array(const std::vector<uint8_t>& buffer) { | |||
sparrow::primitive_array<T> deserialize_primitive_array(const std::vector<uint8_t>& buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should live in deserialize_primitive_array.hpp, or deserialize_primitive_array.hpp should be merged into this file.
|
||
namespace sparrow_ipc | ||
{ | ||
class EncapsulatedMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code convention: should be snake case.
#include <sparrow/buffer/dynamic_bitset/dynamic_bitset_view.hpp> | ||
#include <sparrow/record_batch.hpp> | ||
|
||
#include "config/config.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "config/config.hpp" | |
#include "sparrow_ipc/config/config.hpp" |
const ArrowArray& arrow_arr, | ||
const std::vector<int64_t>& buffers_sizes, | ||
std::vector<uint8_t>& final_buffer | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why having these methods nested in a details
namespace while their deserialize "counterparts" are not?
#include "deserialize.hpp" | ||
#include "serialize.hpp" | ||
#include "utils.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "deserialize.hpp" | |
#include "serialize.hpp" | |
#include "utils.hpp" | |
#include "sparrow_ipc/deserialize.hpp" | |
#include "sparrow_ipc/serialize.hpp" | |
#include "sparrow_ipc/utils.hpp" |
#include <string_view> | ||
#include <utility> | ||
|
||
#include "config/config.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "config/config.hpp" | |
#include "sparrow_ipc/config/config.hpp" |
No description provided.