Skip to content

Commit e7c3cf0

Browse files
fix(plugin): recreate health context in cleanup for restart support
Recreate healthCtx and healthCancel in cleanup() after cancellation to support plugin restart. Remove nil assignments for these fields as they need to persist across restart cycles. This addresses Elezar's concern NVIDIA#2 about why we nil these fields - we no longer do. The context is recreated fresh for each restart, ensuring health checks work correctly when the plugin is restarted. Fixes plugin restart blocker identified in architecture review. Refs: NVIDIA#1601 Task: 3/6 Co-authored-by: Evan Lezar <elezar@nvidia.com> Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
1 parent 8b37c6b commit e7c3cf0

File tree

2 files changed

+8
-4
lines changed

2 files changed

+8
-4
lines changed

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,16 @@
2121
- Lines: 114-118
2222
- Changes: Remove `context.WithCancel()` call (already done in constructor)
2323
- Addresses: Cleanup redundant initialization
24-
- Commit: (pending)
24+
- Commit: d055f1e0c
2525

2626
### Phase 2: Restart-Safe Cleanup
2727

28-
- [TODO] **Task 3**: Modify `cleanup()` to recreate context after cancellation
28+
- [DONE] **Task 3**: Modify `cleanup()` to recreate context after cancellation
2929
- File: `internal/plugin/server.go`
3030
- Lines: 120-129
3131
- Changes: Recreate `healthCtx` and `healthCancel` after cancelling for restart support
3232
- Addresses: Elezar's concern #2 (line 128 - why nil these fields), fixes plugin restart
33+
- Commit: (pending)
3334

3435
### Phase 3: Health Channel Lifecycle
3536

internal/plugin/server.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,12 +128,15 @@ func (plugin *nvidiaDevicePlugin) initialize() {
128128
func (plugin *nvidiaDevicePlugin) cleanup() {
129129
if plugin.healthCancel != nil {
130130
plugin.healthCancel()
131+
// Recreate context for potential plugin restart. The same plugin instance
132+
// may be restarted via Start() after Stop(), so we need a fresh context.
133+
plugin.healthCtx, plugin.healthCancel = context.WithCancel(plugin.ctx)
131134
}
132135
plugin.healthWg.Wait()
133136
plugin.server = nil
134137
plugin.health = nil
135-
plugin.healthCtx = nil
136-
plugin.healthCancel = nil
138+
// Do not nil healthCtx or healthCancel - they are needed for restart
139+
// and are recreated above if they were cancelled
137140
}
138141

139142
// Devices returns the full set of devices associated with the plugin.

0 commit comments

Comments
 (0)