From df6fe575d777abd413c2db9ae09675ce3248c99a Mon Sep 17 00:00:00 2001
From: Claudia <c.misale@ibm.com>
Date: Wed, 19 Mar 2025 22:47:14 -0400
Subject: [PATCH] deploy dashboards

---
 setup.KubeConEU25/README.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/setup.KubeConEU25/README.md b/setup.KubeConEU25/README.md
index 164e6ba..2363f51 100644
--- a/setup.KubeConEU25/README.md
+++ b/setup.KubeConEU25/README.md
@@ -195,6 +195,20 @@ export POD_NAME=$(kubectl --namespace prometheus get pod -l "app.kubernetes.io/n
   kubectl --namespace prometheus port-forward $POD_NAME 3000
 ```
 
+To import NVidia and Autopilot metrics, from the Grafana Dashboard:
+
+- Select the `+` drop down menu on the top right, and **Import dashboard**
+- In the `Grafana.com dashboard URL or ID` box, add [https://grafana.com/grafana/dashboards/23123-autopilot-metrics/](https://grafana.com/grafana/dashboards/23123-autopilot-metrics/) and click Load, then repeat with the NVidia dashboard [https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard/](https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard/)
+
+To visualize the metrics, we need to label the service monitor objects in both `autopilot` and `nvidia-gpu-operator` namespaces with the Prometheus release name.
+
+```bash
+kubectl label servicemonitors.monitoring.coreos.com -n autopilot autopilot-metrics-monitor release=kube-prometheus-stack --overwrite
+```
+```bash
+kubectl label servicemonitors.monitoring.coreos.com -n nvidia-gpu-operator nvidia-dcgm-exporter gpu-operator nvidia-node-status-exporter  release=kube-prometheus-stack --overwrite
+```
+
 ### MLBatch Cluster Setup
 
 We follow instructions from [CLUSTER-SETUP.md](../setup.k8s/CLUSTER-SETUP.md).