GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145
GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145John-W-Lewis wants to merge 2 commits into
Conversation
…oveVector Use TransferPair to properly transfer buffer ownership when creating a new VectorSchemaRoot, preventing use-after-free when the original root is closed. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thank you for opening a pull request! Please label the PR with one or more of:
Also, add the 'breaking-change' label if appropriate. See CONTRIBUTING.md for details. |
|
Being neither a chicken nor an egg, I understand that I am unable to label this as "bug-fix". |
|
There are also a couple of enhancements I would suggest for VectorSchemaRoot but, based on the advice I've seen, it seems better to describe those in a separate PR. |
|
Hi @John-W-Lewis , thank you for the bug-fix. Yes, we need to wait for the maintainers to add the label and trigger the other checks. In the meantime, you can build locally, for example using Maven. Now, there is a format violation in You can fix it with the suggested command.
With the formatting fix, the build was successful for me, which means that the unit tests passed. |
axreldable
left a comment
There was a problem hiding this comment.
Other than the nit (nitpick) and the doc clarification, LGTM.
| FieldVector v = fieldVectors.get(i); | ||
| TransferPair transferPair = v.getTransferPair(v.getAllocator()); | ||
| transferPair.transfer(); | ||
| newVectors.add((FieldVector) transferPair.getTo()); |
There was a problem hiding this comment.
nit: Consider extracting the transferring logic into a separate method to avoid code duplication.
e.g.
private static FieldVector transferVector(FieldVector vector) {
TransferPair transferPair = vector.getTransferPair(vector.getAllocator());
transferPair.transfer();
return (FieldVector) transferPair.getTo();
}
| * | ||
| * <p>Buffer ownership is transferred to the returned root via {@link TransferPair}. After this | ||
| * operation, the vectors in this root are left in a transferred (empty) state. The removed | ||
| * vector's data is not transferred and is released. This root can be reused by calling {@link |
There was a problem hiding this comment.
The removed vector's data is not transferred and is released.
is released is not 100% accurate and can be confusing. The vector remains in the original schema root until it's closed/cleared.
There was a problem hiding this comment.
Thank you for spotting this. You're right that "is released" was inaccurate.
Looking more closely, it was also a code issue. For the source vectors which are not removed, transfer() creates new ArrowBufs in the corresponding target vector, each referencing the same underlying allocation as its source counterpart, while the source vector's ArrowBufs release their references. So, after the operation, those source vectors' ArrowBufs no longer own any allocated memory. But for the removed vector, we weren't doing anything -- its ArrowBufs still held references to their underlying allocations, which would only be freed when the original root was eventually closed.
I've added a clear() on the removed vector so that its buffers also release their underlying allocations, making it consistent with the other vectors in the source root after the operation.
…d vector - Extract transferVector() helper to reduce duplication in addVector/removeVector - Fix Spotless formatting violation in TestVectorSchemaRoot.java - Clear the removed vector's buffers in removeVector() so the original root is consistently empty after the operation, matching the documented behaviour Co-authored-by: Cursor <cursoragent@cursor.com>
|
@axreldable Thank you. TestVectorSchemaRoot.java formatting fixed. |
|
Closing in favour of #1149, which provides a better fix using a new Thank you @axreldable for your helpful review feedback on this PR -- it contributed to the thinking that led to the improved solution. |

What's Changed
VectorSchemaRoot#addVectorand#removeVectorpreviously created a newVectorSchemaRootby sharing rawFieldVectorreferences between the original and new root. This violates Arrow's ownership model: closing either root would release buffers still in use by the other, leading to use-after-free errors.This fix uses
TransferPairto properly transfer buffer ownership to the returned root, consistent with howVectorSchemaRoot#slicealready handles this. After the operation, the original root's vectors are left in a transferred (empty) state and can be reused viaallocateNew().Tests added to verify that the returned root's data remains valid after the original root and input vector are closed.
Closes #1142.
Made with Cursor