fix(logger): nil out completed cancel entries in activeCancels#646
Conversation
activeCancels grew unbounded until Exit(). Now each cancel wrapper nils its slot so the reference is released. Exit() still iterates all entries for safety, skipping nil entries. Audit finding NVIDIA#30 (LOW). Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Pull Request Test Coverage Report for Build 21961633528Details
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
This pull request addresses a memory leak in the logger's activeCancels slice where completed cancel functions (and their associated closures/contexts) were retained until Exit() was called. The fix introduces a wrapper function that nils out the cancel function reference immediately after it's invoked, releasing the memory held by closures.
Changes:
- Wrapper function added to nil out cancel function references after invocation
- Nil check added in
Exit()to safely handle nilled entries - Index captured before releasing lock to ensure thread-safety
| if idx < len(l.activeCancels) { | ||
| l.activeCancels[idx] = nil |
There was a problem hiding this comment.
While this change successfully prevents the unbounded growth of cancel function references (which hold closures and contexts), the activeCancels slice capacity itself will still grow with each Loading() call. The memory overhead of a slice of nil pointers is minimal compared to the closures, so this effectively addresses the main memory leak issue.
However, for long-running applications with many transient loading operations, you might consider adding a compaction mechanism. For example, periodically removing nil entries when they exceed a certain threshold, or truncating the slice in specific scenarios. Given that holodeck appears to be primarily a CLI tool that exits after operations complete, this may not be necessary.
| if idx < len(l.activeCancels) { | |
| l.activeCancels[idx] = nil | |
| if idx < len(l.activeCancels) { | |
| // Mark this slot as inactive | |
| l.activeCancels[idx] = nil | |
| // Compact trailing nil entries so the slice length (and eventually | |
| // its backing array) does not grow without bound across many | |
| // transient loading operations. | |
| for len(l.activeCancels) > 0 { | |
| lastIdx := len(l.activeCancels) - 1 | |
| if l.activeCancels[lastIdx] != nil { | |
| break | |
| } | |
| l.activeCancels = l.activeCancels[:lastIdx] | |
| } | |
| // When all entries are inactive, reset the slice so the backing | |
| // array can be garbage-collected and capacity reset. | |
| if len(l.activeCancels) == 0 { | |
| l.activeCancels = nil | |
| } |
| return func(cause error) { | ||
| cancel(cause) | ||
| l.mu.Lock() | ||
| if idx < len(l.activeCancels) { |
There was a problem hiding this comment.
There is a potential race condition if the wrapper cancel function is called after Exit() has set l.activeCancels = nil. The bounds check if idx < len(l.activeCancels) will correctly prevent a panic (since len(nil) is 0), but this is a subtle correctness property that may not be immediately obvious to future maintainers.
Consider adding a comment explaining this safety property, or explicitly check for nil:
l.mu.Lock()
if l.activeCancels != nil && idx < len(l.activeCancels) {
l.activeCancels[idx] = nil
}
l.mu.Unlock()This makes the intention clearer and is more defensive against future refactoring.
| if idx < len(l.activeCancels) { | |
| if l.activeCancels != nil && idx < len(l.activeCancels) { |
| // Return a wrapper that cancels and marks the slot as nil | ||
| return func(cause error) { | ||
| cancel(cause) | ||
| l.mu.Lock() | ||
| if idx < len(l.activeCancels) { | ||
| l.activeCancels[idx] = nil | ||
| } | ||
| l.mu.Unlock() | ||
| } |
There was a problem hiding this comment.
The new behavior of nil-ing out cancel function entries lacks test coverage. While existing tests verify that Loading works correctly, there are no tests that verify:
- The cancel function slot is actually set to nil after calling the returned wrapper
- Multiple sequential Loading calls don't leak memory via the slice
- Calling the wrapper after Exit() doesn't cause issues
Consider adding a test that verifies the nil-ing behavior and slice state after cancel is called.
Summary
activeCancelsslice in the logger grew unbounded untilExit()was called, holding references to completed cancel functionsLoading()call now returns a wrapper that nils out its slot inactiveCancelswhen invoked, releasing the reference immediatelyExit()skips nil entries when iterating for safetyAudit Finding
activeCancelsgrows unbounded untilExit()Test plan
go test ./internal/logger/... -v -race)gofmt— no formatting issuesgolangci-lint run ./...— 0 issuesgo test ./pkg/...— all package tests passgo build -o bin/holodeck cmd/cli/main.go— compilesgo mod tidy && go mod verify— no changes