This is the actionable checklist for remaining fixes and validation work.
It is derived from docs/PROJECT_STATUS.md.
- Persist
nvidia-smiwrapper via Warewulf overlay/chroot workflow. - Define and enforce statistical acceptance gates for
.2SM behavior. - Harden direct-suite artifact capture for transient
no softmig logcases.
- Run and review overnight soak/stress cycles for stability metrics.
- Calibrate
suite_soak.shthresholds (shrreg_delta, FD stability) from overnight data.
- Continue low-risk dead code/header cleanup where safe.
-
nvsmileak stress remainsothers=0under concurrent pressure. -
.2SM metrics stay within agreed tolerance bands. - OOM enforcement remains PASS across CUDA versions and slice types.
- Direct-linked hook registration remains stable.
- Wrapper persistence is verified after reboot/reprovision.