Prerequisites
Question
Dear NVSentinel Team,
I learned in the GTC News, about NVSentinel being a new tool for self-remediation of an organization GPU resources. I am interested in the hardware/node health detection components of the software.
Our system is not a kubernetes/cloud, but the more traditional on-prem HPC setup. We are already using the DCGM Exporter and found it useful to log and gather intelligence on the system status and utilization. This motivates my question:
Would it be possible to operate NVSentinel without all the bells and whistles? More like a traditional alarming system?
I would argue that would be a very valuable tool for our local sys admin team. :)
Thanks!
Category
Installation/Deployment
Context
No response
Prerequisites
Question
Dear NVSentinel Team,
I learned in the GTC News, about NVSentinel being a new tool for self-remediation of an organization GPU resources. I am interested in the hardware/node health detection components of the software.
Our system is not a kubernetes/cloud, but the more traditional on-prem HPC setup. We are already using the DCGM Exporter and found it useful to log and gather intelligence on the system status and utilization. This motivates my question:
Would it be possible to operate NVSentinel without all the bells and whistles? More like a traditional alarming system?
I would argue that would be a very valuable tool for our local sys admin team. :)
Thanks!
Category
Installation/Deployment
Context
No response