-
Notifications
You must be signed in to change notification settings - Fork 21
Many small fixes on ArrowStream #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the arrow_array_stream_proxy class with improved move semantics support and better resource management. The changes add helper functions for moving ArrowArrayStream C structs, refactor the internal API to use references instead of pointers, and reorganize tests into a more structured format with comprehensive coverage of move semantics.
Key changes:
- Added
move_array_stream()helper functions to properly transfer ownership of C struct resources - Changed
get_private_data()to return a reference instead of a pointer for cleaner API usage - Added
owns_stream()method to check stream ownership and improved null checking inpop()
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/arrow_interface/arrow_array_stream.cpp | Implements move_array_stream() helper functions for transferring ownership of ArrowArrayStream resources |
| src/arrow_interface/arrow_array_stream_proxy.cpp | Updates constructor to use move_array_stream(), changes get_private_data() return type to reference, and adds owns_stream() method |
| include/sparrow/arrow_interface/arrow_array_stream_proxy.hpp | Updates API signatures including reference return for get_private_data() and adds owns_stream() declaration |
| include/sparrow/arrow_interface/arrow_array_stream.hpp | Declares new move_array_stream() helper functions and removes extraneous blank line |
| test/test_arrow_array_stream_proxy.cpp | Reorganizes tests into subcases and adds comprehensive test coverage for move semantics, ownership checking, and edge cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto test_array = make_test_primitive_array<int32_t>(10); | ||
| ArrowArrayStream stream; | ||
| fill_arrow_array_stream(stream); | ||
| arrow_array_stream_proxy src(std::move(stream)); | ||
| src.push(std::move(test_array)); | ||
|
|
||
| ArrowArrayStream stream2; | ||
| fill_arrow_array_stream(stream2); | ||
| arrow_array_stream_proxy dst(std::move(stream2)); | ||
| dst = std::move(src); | ||
|
|
||
| auto* src_str = src.export_stream(); | ||
| REQUIRE_EQ(src_str->release, nullptr); | ||
| auto dst_arr = dst.pop(); | ||
| REQUIRE(dst_arr.has_value()); | ||
| delete src_str; | ||
| } | ||
|
|
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous test case, this has potential for memory leaks if assertions fail. The src_str pointer allocated on line 159 may not be deleted if REQUIRE_EQ on line 160 fails. Consider using RAII (e.g., std::unique_ptr) to ensure proper cleanup:
std::unique_ptr<ArrowArrayStream> src_str(src.export_stream());
REQUIRE_EQ(src_str->release, nullptr);| auto test_array = make_test_primitive_array<int32_t>(10); | ||
| ArrowArrayStream stream; | ||
| fill_arrow_array_stream(stream); | ||
| arrow_array_stream_proxy src(&stream); | ||
| src.push(std::move(test_array)); | ||
|
|
||
| ArrowArrayStream stream2; | ||
| fill_arrow_array_stream(stream2); | ||
| arrow_array_stream_proxy dst(&stream2); | ||
| dst = std::move(src); | ||
|
|
||
| auto* src_str = src.export_stream(); | ||
| REQUIRE_EQ(src_str, nullptr); | ||
| auto dst_arr = dst.pop(); | ||
| REQUIRE(dst_arr.has_value()); | ||
| } | ||
| } |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has memory leaks. Both stream and stream2 are stack-allocated and filled with fill_arrow_array_stream() which allocates private_data. When the test ends, these streams go out of scope without calling their release functions to free the private_data.
You should add cleanup code before the closing brace:
if (stream.release != nullptr)
{
stream.release(&stream);
}
if (stream2.release != nullptr)
{
stream2.release(&stream2);
}| ArrowArrayStream stream{}; | ||
| fill_arrow_array_stream(stream); | ||
| arrow_array_stream_proxy proxy(&stream); | ||
| REQUIRE(!proxy.owns_stream()); |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has a memory leak. The stack-allocated stream is filled with fill_arrow_array_stream(stream) which allocates private_data. The stream is passed by pointer to the proxy (not owned), and when the test ends, the stack-allocated stream goes out of scope without calling its release function to free the private_data.
You should add cleanup code before the closing brace:
if (stream.release != nullptr)
{
stream.release(&stream);
}| REQUIRE(!proxy.owns_stream()); | |
| REQUIRE(!proxy.owns_stream()); | |
| if (stream.release != nullptr) | |
| { | |
| stream.release(&stream); | |
| } |
| fill_arrow_array_stream(stream); | ||
| arrow_array_stream_proxy src(&stream); | ||
| arrow_array_stream_proxy dst(std::move(src)); | ||
| CHECK_THROWS_AS(src.pop(), std::runtime_error); |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has a memory leak. The stack-allocated stream is filled with fill_arrow_array_stream(stream) which allocates private_data. After being passed to src by pointer and then moved to dst, the original stream goes out of scope without calling its release function to free the private_data.
You should add cleanup code before the closing brace:
if (stream.release != nullptr)
{
stream.release(&stream);
}| CHECK_THROWS_AS(src.pop(), std::runtime_error); | |
| CHECK_THROWS_AS(src.pop(), std::runtime_error); | |
| if (stream.release != nullptr) | |
| { | |
| stream.release(&stream); | |
| } |
| /** | ||
| * @brief Gets the private data (mutable version). | ||
| * | ||
| * @return Mutable pointer to the stream's private data. |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states "Mutable pointer to the stream's private data" but the return type was changed from pointer to reference. Update the documentation to say "Mutable reference to the stream's private data".
| * @return Mutable pointer to the stream's private data. | |
| * @return Mutable reference to the stream's private data. |
| SPARROW_API ArrowArrayStream move_array_stream(ArrowArrayStream&& source); | ||
| SPARROW_API ArrowArrayStream move_array_stream(ArrowArrayStream& source); | ||
| } |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The move_array_stream functions lack documentation. These functions appear to implement move semantics for ArrowArrayStream (a C struct without native move constructor). Consider adding documentation that explains:
- What the function does (transfers ownership of the stream's resources)
- The state of the source after the move (all pointers zeroed)
- When this should be used instead of regular assignment
- Any preconditions or postconditions
Example:
/**
* @brief Move an ArrowArrayStream by transferring ownership of its resources.
*
* Performs a shallow copy of the stream structure and zeros out the source.
* After this operation, the source stream is left in a released-like state
* with all function pointers and data pointers set to null.
*
* @param source The source stream to move from.
* @return A new ArrowArrayStream with ownership of the source's resources.
*
* @post source will have all pointers set to nullptr/null
*/| SPARROW_API ArrowArrayStream move_array_stream(ArrowArrayStream&& source); | |
| SPARROW_API ArrowArrayStream move_array_stream(ArrowArrayStream& source); | |
| } | |
| /** | |
| * @brief Move an ArrowArrayStream by transferring ownership of its resources. | |
| * | |
| * Performs a shallow copy of the stream structure and zeros out the source. | |
| * After this operation, the source stream is left in a released-like state | |
| * with all function pointers and data pointers set to nullptr or zero. | |
| * | |
| * This should be used when you need to transfer ownership of the stream's resources, | |
| * rather than copying. Regular assignment does not transfer ownership and may lead to | |
| * double-free or resource leaks. | |
| * | |
| * @param source The source stream to move from (rvalue reference). | |
| * Must not be released or nullptr. | |
| * @return A new ArrowArrayStream with ownership of the source's resources. | |
| * | |
| * @post source will have all pointers set to nullptr/null/zero. | |
| * | |
| * @see https://arrow.apache.org/docs/format/CStreamInterface.html | |
| */ | |
| SPARROW_API ArrowArrayStream move_array_stream(ArrowArrayStream&& source); | |
| /** | |
| * @brief Move an ArrowArrayStream by transferring ownership of its resources. | |
| * | |
| * Performs a shallow copy of the stream structure and zeros out the source. | |
| * After this operation, the source stream is left in a released-like state | |
| * with all function pointers and data pointers set to nullptr or zero. | |
| * | |
| * This should be used when you need to transfer ownership of the stream's resources, | |
| * rather than copying. Regular assignment does not transfer ownership and may lead to | |
| * double-free or resource leaks. | |
| * | |
| * @param source The source stream to move from (lvalue reference). | |
| * Must not be released or nullptr. | |
| * @return A new ArrowArrayStream with ownership of the source's resources. | |
| * | |
| * @post source will have all pointers set to nullptr/null/zero. | |
| * | |
| * @see https://arrow.apache.org/docs/format/CStreamInterface.html | |
| */ | |
| SPARROW_API ArrowArrayStream move_array_stream(ArrowArrayStream& source); |
| { | ||
| ArrowArrayStream stream{}; | ||
| fill_arrow_array_stream(stream); | ||
| arrow_array_stream_proxy src(&stream); | ||
| arrow_array_stream_proxy dst(std::move(src)); | ||
|
|
||
| auto* src_str = src.export_stream(); | ||
| auto* dst_str = dst.export_stream(); | ||
| REQUIRE_EQ(src_str, nullptr); | ||
| REQUIRE_NE(dst_str, nullptr); | ||
| } |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has a memory leak. The stack-allocated stream is filled with fill_arrow_array_stream(stream) which allocates private_data using new. The stream is passed by pointer to the proxy (not owned), and when the test ends, the stack-allocated stream goes out of scope without calling its release function to free the private_data.
You should add cleanup code before the closing brace:
if (stream.release != nullptr)
{
stream.release(&stream);
}| { | ||
| ArrowArrayStream stream{}; | ||
| fill_arrow_array_stream(stream); | ||
| arrow_array_stream_proxy src(std::move(stream)); | ||
| arrow_array_stream_proxy dst(std::move(src)); | ||
|
|
||
| auto* src_str = src.export_stream(); | ||
| auto* dst_str = dst.export_stream(); | ||
| REQUIRE_EQ(src_str->release, nullptr); | ||
| REQUIRE_NE(dst_str->release, nullptr); | ||
| delete src_str; | ||
| delete dst_str; | ||
| } |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test has potential for memory leaks if assertions fail. The src_str and dst_str pointers are allocated by export_stream() but may not be deleted if a REQUIRE_EQ or REQUIRE_NE assertion fails before reaching the delete statements. Consider using RAII (e.g., std::unique_ptr) to ensure proper cleanup even when tests fail.
Example:
std::unique_ptr<ArrowArrayStream> src_str(src.export_stream());
std::unique_ptr<ArrowArrayStream> dst_str(dst.export_stream());
REQUIRE_EQ(src_str->release, nullptr);
REQUIRE_NE(dst_str->release, nullptr);
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #616 +/- ##
==========================================
+ Coverage 88.82% 89.18% +0.36%
==========================================
Files 107 107
Lines 8279 8288 +9
==========================================
+ Hits 7354 7392 +38
+ Misses 925 896 -29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Alex-PLACET
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok once the copilot comments fixed
No description provided.