Skip to content

fix(logger): nil out completed cancel entries in activeCancels#646

Merged
ArangoGutierrez merged 1 commit intoNVIDIA:mainfrom
ArangoGutierrez:fix/logger-active-cancels
Feb 13, 2026
Merged

fix(logger): nil out completed cancel entries in activeCancels#646
ArangoGutierrez merged 1 commit intoNVIDIA:mainfrom
ArangoGutierrez:fix/logger-active-cancels

Conversation

@ArangoGutierrez
Copy link
Collaborator

Summary

  • activeCancels slice in the logger grew unbounded until Exit() was called, holding references to completed cancel functions
  • Each Loading() call now returns a wrapper that nils out its slot in activeCancels when invoked, releasing the reference immediately
  • Exit() skips nil entries when iterating for safety

Audit Finding

Test plan

  • All existing logger tests pass (go test ./internal/logger/... -v -race)
  • gofmt — no formatting issues
  • golangci-lint run ./... — 0 issues
  • go test ./pkg/... — all package tests pass
  • go build -o bin/holodeck cmd/cli/main.go — compiles
  • go mod tidy && go mod verify — no changes

activeCancels grew unbounded until Exit(). Now each cancel wrapper
nils its slot so the reference is released. Exit() still iterates
all entries for safety, skipping nil entries.

Audit finding NVIDIA#30 (LOW).

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Copilot AI review requested due to automatic review settings February 12, 2026 19:42
@coveralls
Copy link

Pull Request Test Coverage Report for Build 21961633528

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 47.501%

Totals Coverage Status
Change from base Build 21955389842: 0.0%
Covered Lines: 2500
Relevant Lines: 5263

💛 - Coveralls

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses a memory leak in the logger's activeCancels slice where completed cancel functions (and their associated closures/contexts) were retained until Exit() was called. The fix introduces a wrapper function that nils out the cancel function reference immediately after it's invoked, releasing the memory held by closures.

Changes:

  • Wrapper function added to nil out cancel function references after invocation
  • Nil check added in Exit() to safely handle nilled entries
  • Index captured before releasing lock to ensure thread-safety

Comment on lines +224 to +225
if idx < len(l.activeCancels) {
l.activeCancels[idx] = nil
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this change successfully prevents the unbounded growth of cancel function references (which hold closures and contexts), the activeCancels slice capacity itself will still grow with each Loading() call. The memory overhead of a slice of nil pointers is minimal compared to the closures, so this effectively addresses the main memory leak issue.

However, for long-running applications with many transient loading operations, you might consider adding a compaction mechanism. For example, periodically removing nil entries when they exceed a certain threshold, or truncating the slice in specific scenarios. Given that holodeck appears to be primarily a CLI tool that exits after operations complete, this may not be necessary.

Suggested change
if idx < len(l.activeCancels) {
l.activeCancels[idx] = nil
if idx < len(l.activeCancels) {
// Mark this slot as inactive
l.activeCancels[idx] = nil
// Compact trailing nil entries so the slice length (and eventually
// its backing array) does not grow without bound across many
// transient loading operations.
for len(l.activeCancels) > 0 {
lastIdx := len(l.activeCancels) - 1
if l.activeCancels[lastIdx] != nil {
break
}
l.activeCancels = l.activeCancels[:lastIdx]
}
// When all entries are inactive, reset the slice so the backing
// array can be garbage-collected and capacity reset.
if len(l.activeCancels) == 0 {
l.activeCancels = nil
}

Copilot uses AI. Check for mistakes.
return func(cause error) {
cancel(cause)
l.mu.Lock()
if idx < len(l.activeCancels) {
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential race condition if the wrapper cancel function is called after Exit() has set l.activeCancels = nil. The bounds check if idx < len(l.activeCancels) will correctly prevent a panic (since len(nil) is 0), but this is a subtle correctness property that may not be immediately obvious to future maintainers.

Consider adding a comment explaining this safety property, or explicitly check for nil:

l.mu.Lock()
if l.activeCancels != nil && idx < len(l.activeCancels) {
    l.activeCancels[idx] = nil
}
l.mu.Unlock()

This makes the intention clearer and is more defensive against future refactoring.

Suggested change
if idx < len(l.activeCancels) {
if l.activeCancels != nil && idx < len(l.activeCancels) {

Copilot uses AI. Check for mistakes.
Comment on lines +220 to +228
// Return a wrapper that cancels and marks the slot as nil
return func(cause error) {
cancel(cause)
l.mu.Lock()
if idx < len(l.activeCancels) {
l.activeCancels[idx] = nil
}
l.mu.Unlock()
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behavior of nil-ing out cancel function entries lacks test coverage. While existing tests verify that Loading works correctly, there are no tests that verify:

  1. The cancel function slot is actually set to nil after calling the returned wrapper
  2. Multiple sequential Loading calls don't leak memory via the slice
  3. Calling the wrapper after Exit() doesn't cause issues

Consider adding a test that verifies the nil-ing behavior and slice state after cancel is called.

Copilot uses AI. Check for mistakes.
@ArangoGutierrez ArangoGutierrez merged commit 7421b7d into NVIDIA:main Feb 13, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants