Skip to content

Commit 9308c8e

Browse files
fix(plugin): handle closed health channel in ListAndWatch
Add ok check when receiving from health channel to gracefully handle channel closure during cleanup. This prevents potential panics and ensures clean shutdown when the plugin stops. Reading from a closed channel returns zero value and ok=false, which we now check and return gracefully. Refs: NVIDIA#1601 Task: 5/6
1 parent 0472a3a commit 9308c8e

File tree

2 files changed

+8
-3
lines changed

2 files changed

+8
-3
lines changed

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,14 @@
3939
- Lines: 120-129
4040
- Changes: Close channel before niling to prevent panics
4141
- Addresses: Devil's advocate blocker - channel never closed
42-
- Commit: (pending)
42+
- Commit: 795807362
4343

44-
- [TODO] **Task 5**: Handle closed channel in `ListAndWatch()`
44+
- [DONE] **Task 5**: Handle closed channel in `ListAndWatch()`
4545
- File: `internal/plugin/server.go`
4646
- Lines: 287-298
4747
- Changes: Add `ok` check when receiving from health channel
4848
- Addresses: Graceful handling of channel closure
49+
- Commit: (pending)
4950

5051
### Phase 4: Error Handling Improvements
5152

internal/plugin/server.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,11 @@ func (plugin *nvidiaDevicePlugin) ListAndWatch(e *pluginapi.Empty, s pluginapi.D
305305
select {
306306
case <-healthCtx.Done():
307307
return nil
308-
case d := <-health:
308+
case d, ok := <-health:
309+
if !ok {
310+
// Health channel closed, health checks stopped
311+
return nil
312+
}
309313
// FIXME: there is no way to recover from the Unhealthy state.
310314
d.Health = pluginapi.Unhealthy
311315
klog.Infof("'%s' device marked unhealthy: %s", plugin.rm.Resource(), d.ID)

0 commit comments

Comments
 (0)