Skip to content

Commit 0472a3a

Browse files
fix(plugin): close health channel properly in cleanup
Close the health channel in cleanup() before niling to prevent panics in ListAndWatch() if cleanup happens during channel read. The WaitGroup ensures the health goroutine has completed before we close the channel, making this operation safe. Fixes critical blocker: health channel was never closed, leading to potential panics and resource leaks. Refs: NVIDIA#1601 Task: 4/6
1 parent e7c3cf0 commit 0472a3a

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,16 @@
3030
- Lines: 120-129
3131
- Changes: Recreate `healthCtx` and `healthCancel` after cancelling for restart support
3232
- Addresses: Elezar's concern #2 (line 128 - why nil these fields), fixes plugin restart
33-
- Commit: (pending)
33+
- Commit: cc2a0a77c
3434

3535
### Phase 3: Health Channel Lifecycle
3636

37-
- [TODO] **Task 4**: Close health channel properly in `cleanup()`
37+
- [DONE] **Task 4**: Close health channel properly in `cleanup()`
3838
- File: `internal/plugin/server.go`
3939
- Lines: 120-129
4040
- Changes: Close channel before niling to prevent panics
4141
- Addresses: Devil's advocate blocker - channel never closed
42+
- Commit: (pending)
4243

4344
- [TODO] **Task 5**: Handle closed channel in `ListAndWatch()`
4445
- File: `internal/plugin/server.go`

internal/plugin/server.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,12 @@ func (plugin *nvidiaDevicePlugin) cleanup() {
133133
plugin.healthCtx, plugin.healthCancel = context.WithCancel(plugin.ctx)
134134
}
135135
plugin.healthWg.Wait()
136+
137+
// Close health channel before niling to prevent panics in ListAndWatch()
138+
if plugin.health != nil {
139+
close(plugin.health)
140+
}
141+
136142
plugin.server = nil
137143
plugin.health = nil
138144
// Do not nil healthCtx or healthCancel - they are needed for restart

0 commit comments

Comments
 (0)