Skip to content

Conversation

Alex-PLACET
Copy link
Member

No description provided.

@Alex-PLACET Alex-PLACET self-assigned this Sep 2, 2025
@Alex-PLACET Alex-PLACET requested a review from Copilot September 3, 2025 15:05
@Alex-PLACET Alex-PLACET force-pushed the rework_serializing branch 2 times, most recently from 85ef02d to f82d723 Compare September 3, 2025 15:07
Copilot

This comment was marked as outdated.

@Alex-PLACET Alex-PLACET marked this pull request as ready for review September 4, 2025 09:27

private:

const uint8_t* m_buf_ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't store the buffer length, how can you make sure you're not accessing memory out of bounds?

const uint8_t* m_buf_ptr;
};

[[nodiscard]] EncapsulatedMessage create_encapsulated_message(const uint8_t* buf_ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you want an API like:

// Return the encapsulated message and the rest of the span
std::pair<EncapsulatedMessage, std::span<const uint8_t>> extract_encapsulated_message(std::span<const uint8_t>);

namespace sparrow_ipc
{
template <typename T>
[[nodiscard]] sparrow::primitive_array<T> deserialize_primitive_array_bis(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename deserialize_primitive_array_bis to deserialize_primitive_array_from_record_batch and deserialize_primitive_array to deserialize_primitive_array_from_buffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok renamed

{
const size_t offset = sizeof(uint32_t) * 2 // 4 bytes continuation + 4 bytes metadata size
+ metadata_length();
const size_t padded_offset = (offset + 7) & ~7; // Round up to 8-byte boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use align_to_8 function everywhere instead of replicating it.
(You can change & -8 to & ~7 if preferred)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

{
const size_t offset = sizeof(uint32_t) * 2 // 4 bytes continuation + 4 bytes metadata size
+ metadata_length();
const size_t padded_offset = (offset + 7) & ~7; // Round up to 8-byte boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Hind-M
Copy link
Member

Hind-M commented Sep 4, 2025

As a side note, I think in the future, and for relatively big PRs with multiple files like this one, having multiple commits where corresponding messages describe what they are doing would make the review easier.
The linting changes are adding some noise as well, maybe keep that in specific independent PRs in the future.

I don't know if the additional code is making the project not buildable anymore, but maybe it's best if we remove all the versioning and install parts to focus on the core changes in this PR.

@Hind-M
Copy link
Member

Hind-M commented Sep 4, 2025

Are we planning eventually to use functions from sparrow directly (everything related to arrow_interface and comparison functions in the tests)?

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive rework of the serialization system to improve code organization, add new functionality, and enhance testing infrastructure. The changes restructure the codebase with proper namespacing, introduce new deserialization capabilities, and add extensive integration testing with Arrow data files.

  • Reorganized headers and source files with proper sparrow_ipc namespace structure
  • Added new deserialization functionality for streams and various array types
  • Introduced comprehensive integration testing with Arrow testing data files

Reviewed Changes

Copilot reviewed 41 out of 44 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_utils.cpp Reformatted test assertions for better readability
tests/test_primitive_array_with_files.cpp New integration tests comparing stream vs JSON deserialization
tests/test_primitive_array_serialization.cpp Updated includes and minor formatting improvements
tests/test_null_array_serialization.cpp Updated includes to use new header structure
tests/test_arrow_schema.cpp New comprehensive tests for Arrow schema functionality
tests/metadata_sample.hpp New helper for metadata testing with endianness support
tests/CMakeLists.txt Added new test files and dependencies
src/utils.cpp Updated includes and improved code formatting
src/serialize_null_array.cpp Updated to use new deserialization functions
src/serialize.cpp Moved deserialization functions and improved formatting
src/metadata.cpp New utility for metadata conversion
src/encapsulated_message.cpp New class for handling encapsulated Arrow messages
src/deserialize_utils.cpp New utilities for deserialization operations
src/deserialize_fixedsizebinary_array.cpp New deserialization for fixed-size binary arrays
src/deserialize.cpp New comprehensive deserialization implementation
Multiple header files Reorganized with proper namespace structure and new functionality
Comments suppressed due to low confidence (3)

src/utils.cpp:1

  • [nitpick] These multi-line trailing comments are hard to read and maintain. Consider moving them above the variable declarations or making them single-line comments.
#include "sparrow_ipc/utils.hpp"

src/utils.cpp:1

  • [nitpick] These multi-line trailing comments are hard to read and maintain. Consider moving them above the variable declarations or making them single-line comments.
#include "sparrow_ipc/utils.hpp"

src/utils.cpp:1

  • [nitpick] These multi-line trailing comments are hard to read and maintain. Consider moving them above the variable declarations or making them single-line comments.
#include "sparrow_ipc/utils.hpp"

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

src/utils.cpp Outdated
Comment on lines 380 to 382
const auto map_type = org::apache::arrow::flatbuf::CreateMap(builder, false); // not
// sorted
// keys
Copy link
Preview

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The multi-line comment is unnecessarily fragmented. Consider using a single-line comment: // not sorted keys or moving the comment above the line.

Suggested change
const auto map_type = org::apache::arrow::flatbuf::CreateMap(builder, false); // not
// sorted
// keys
const auto map_type = org::apache::arrow::flatbuf::CreateMap(builder, false); // not sorted keys

Copilot uses AI. Check for mistakes.

@@ -0,0 +1,44 @@
#include <cstdint>
Copy link
Preview

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing header guard. Add #pragma once at the beginning of the file to prevent multiple inclusions.

Copilot uses AI. Check for mistakes.

private:

std::string m_format;
const char* m_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not storing an optional string instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep a direct pointer to the name in the flatbuffer . The format is not stored as a string in the flatbuffer.

constexpr int SPARROW_IPC_VERSION_MINOR = 1;
constexpr int SPARROW_IPC_VERSION_PATCH = 0;

constexpr int SPARROW_IPC_BINARY_CURRENT = 9;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: I think we can keep 1 for the binary_current version.


#include <sparrow/record_batch.hpp>

#include "config/config.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "config/config.hpp"
#include "sparrow_ipc/config/config.hpp"

{
switch (bit_width)
{
// clang-format off
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@@ -50,18 +53,22 @@ namespace sparrow_ipc
}

template <typename T>
sparrow::primitive_array<T> deserialize_primitive_array(const std::vector<uint8_t>& buffer) {
sparrow::primitive_array<T> deserialize_primitive_array(const std::vector<uint8_t>& buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should live in deserialize_primitive_array.hpp, or deserialize_primitive_array.hpp should be merged into this file.


namespace sparrow_ipc
{
class EncapsulatedMessage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code convention: should be snake case.

#include <sparrow/buffer/dynamic_bitset/dynamic_bitset_view.hpp>
#include <sparrow/record_batch.hpp>

#include "config/config.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "config/config.hpp"
#include "sparrow_ipc/config/config.hpp"

const ArrowArray& arrow_arr,
const std::vector<int64_t>& buffers_sizes,
std::vector<uint8_t>& final_buffer
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why having these methods nested in a details namespace while their deserialize "counterparts" are not?

Comment on lines +7 to 9
#include "deserialize.hpp"
#include "serialize.hpp"
#include "utils.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "deserialize.hpp"
#include "serialize.hpp"
#include "utils.hpp"
#include "sparrow_ipc/deserialize.hpp"
#include "sparrow_ipc/serialize.hpp"
#include "sparrow_ipc/utils.hpp"

#include <string_view>
#include <utility>

#include "config/config.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "config/config.hpp"
#include "sparrow_ipc/config/config.hpp"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants