fix: Destroy the previously created accumulator for AggregateWindow to fix memory leak#15679
fix: Destroy the previously created accumulator for AggregateWindow to fix memory leak#15679jiangjiangtian wants to merge 2 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
88504b7 to
f684fd0
Compare
f684fd0 to
7ecefee
Compare
|
I find this code https://github.com/facebookincubator/velox/pull/5632/files#diff-b9ed2eb619a914241f656abe59881d169602eed734af2df443f6480d611850aaR122-R124 Also, maybe fix that in virtual void initializeNewGroups( } The initializeNewGroupsInternal should only be called when the condition is satisfied. |
Yes, now
Do you means that if the
Maybe not. I find that NonNumericArbitrary does not override
So this may be not correct. |
|
I mean destroy the allocator in initializeNewGroups not initializeNewGroupsInternal, this can help all the group initialization issue, but looks like community prefers to directly call the destroy function, so let us align with the current solution, find one and solve one. |
|
In the code and review, we may be carefully call the initializeNewGroups, decide if it should call destroy first |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Thanks for your fix!
|
@mbasmanova @xiaoxmeng @Yuhta Could you please help review this PR? |
| // aggregate_ function object should be initialized. | ||
| auto singleGroup = std::vector<vector_size_t>{0}; | ||
| aggregate_->clear(); | ||
| aggregate_->destroy(folly::Range<char**>(&rawSingleGroupRow_, 1)); |
There was a problem hiding this comment.
Do we still need to call 'clear' if we are calling 'destroy'?
There was a problem hiding this comment.
Maybe we need. clear clears some states of the aggregate function and when we start processing a new window frame, we need to clear the states.
There was a problem hiding this comment.
@jiangjiangtian @mbasmanova I too feel its confusing to call both these. Why does clearing an aggregate not imply destroying all the groups ?
There was a problem hiding this comment.
Shall we add call to destroy in clear?
There was a problem hiding this comment.
@mbasmanova @aditi-pandit how do you think? should we add call to destroy in clear?
There was a problem hiding this comment.
@jiangjiangtian clear() is clearing the state, but doesn't free memory. Similar to std::vector::clear. destroy() frees up memory (and also clears the state). Similar to std::vector's destructor. I would imagine it is sufficient to call destroy(). Calling clear() before destroy() is redundant. Is this not the case?
There was a problem hiding this comment.
@mbasmanova Now destroy won't clear the states of Aggregate functions. So should we call clear() in destroy()?
There was a problem hiding this comment.
destroy won't clear the states
That's strange. Why would we need to explicitly clear the state if are releasing memory used by these. Perhaps, there are bugs in some implementations of destroy. What happens when you remove 'clear' call before 'destroy'?
There was a problem hiding this comment.
What happens when you remove 'clear' call before 'destroy'?
Sorry, I have not tried it yet. But I find that 'clear' just sets the null count in Aggregate to 0 and the null count is just used to determine whether we need to actually extract bits from 'group' when we check if the 'group' is null or when we want to clear null bits of the 'group'.
After reading some code, I realize that we have to call clearNull in destroy. I think that is what we need to do. Before we destroy the accumulator, we need to check whether it is null. If it is, we need to subtract it from the null count in Aggregate to keep it aligned with the number of null groups allocated. Now we don't perform this check, so maybe that's the bug in destroy.
Based on the investigation above, I agree with you that the 'clear' call is confusing. So if we call clearNull in destroy, we can resolve the confusion. What do you think? Thanks! cc @mbasmanova
There was a problem hiding this comment.
@jiangjiangtian Yes, I believe you are correct that destroy needs to clear the null flag.
mbasmanova
left a comment
There was a problem hiding this comment.
Wondering if it would be possible to reproduce this issue with a simple test and verify it is fixed?
|
CC: @jiangjiangtian |
f57a9b1 to
c86c41e
Compare
c86c41e to
bfff504
Compare
cf703b3 to
56d83c9
Compare
|
@mbasmanova I iterate this PR based on #15680. Please take a look again, thanks! |
mbasmanova
left a comment
There was a problem hiding this comment.
@jiangjiangtian Would you confirm that the test fails without the changes?
|
/claude-review |
✅ Claude Code ReviewRequested by @mbasmanova SummaryThis PR addresses a legitimate memory leak in the AggregateWindow operator by properly destroying previously created accumulators before reinitializing them. The fix is correct and necessary for memory safety. Issues Found🟢 Positive: Fix addresses real memory leakThe PR correctly identifies and fixes a memory leak where 🟢 Positive: Proper API usageThe change from
🟢 Positive: Test demonstrates fix effectivenessThe test changes show the memory capacity was reduced from 256MB to 192MB, demonstrating that the fix actually reduces memory usage as expected. 🟢 Positive: Thread safety maintainedThe changes don't introduce any concurrency issues. The 🟢 Positive: Consistent fix across code pathsBoth incremental and non-incremental aggregation code paths in Edge Cases Examined
Performance ImpactThe performance impact should be minimal since:
Correctness VerificationI traced through the logic:
The fix is correct and complete. The memory leak is properly addressed without introducing new issues. ℹ️ About this responseThis review was generated by Claude Code. It analyzed the PR diff and codebase to provide feedback. Limitations:
Available commands:
|
@mbasmanova Yes! The test will fail if I revert this PR or #15680. |
56d83c9 to
e6e7d0a
Compare
One example I found is as follows: if the method initializeNewGroupsInternal in FirstLastAggregateBase is called multiple times and the function is not numeric, the previously created
SingleValueAccumulatorobject will be overwritten without callingdestroyto free its memory. This leads to memory leak and eventually causes OOM. This situation will occur inWindowoperator when there are multiple window frames to process.So this PR fixes the issue by calling
destroyonrawSingleGroupRow_when processing a new window frame.