|
1 | 1 | # Abstract |
2 | 2 |
|
3 | | - For the sake of simplicity, this article provides only one possible way to ultimately use prometheus to capture monitoring metrics as a data source and grafana to present monitoring information. |
| 3 | +For the sake of simplicity, this article provides only one possible way to ultimately use prometheus to capture monitoring metrics as a data source and grafana to present monitoring information. |
4 | 4 |
|
5 | | - Many users feedback from creating issues that they do not know how to install and configure related components, resulting in failure to use related dashboard normally. The installation and configuration steps are described as follows, Hope you use it smoothly! Any feedback is welcome. |
| 5 | +Many users feedback from creating issues that they do not know how to install and configure related components, resulting in failure to use related dashboard normally. The installation and configuration steps are described as follows, Hope you use it smoothly! Any feedback is welcome. |
6 | 6 |
|
7 | | - This article assumes that Kubernetes cluster and HAMi has been deployed successfully. The following components are installed in a kubernetes cluster. The components or software versions are as follows: |
| 7 | +This article assumes that Kubernetes cluster and HAMi has been deployed successfully. The following components are installed in a kubernetes cluster. The components or software versions are as follows: |
8 | 8 |
|
9 | 9 | | components or software name | version | remark | |
10 | 10 | | --------------------------- | ------------------- | ---------------- | |
|
16 | 16 |
|
17 | 17 | ## Deploy kube-prometheus stack |
18 | 18 |
|
19 | | -**Note:**See the version compatibility matrix for kubernetes and kube-prometheus stack in:https://github.com/prometheus-operator/kube-prometheus?tab=readme-ov-file#compatibility |
| 19 | +**Note:** See the version compatibility matrix for kubernetes and kube-prometheus stack in:https://github.com/prometheus-operator/kube-prometheus?tab=readme-ov-file#compatibility |
20 | 20 |
|
21 | 21 | ```shell |
22 | 22 | #Clone kube-prometheus code repository(using release-0.11 here) |
@@ -48,19 +48,19 @@ grafana NodePort 10.233.56.112 <none> 3000:30300/TCP |
48 | 48 | prometheus-k8s NodePort 10.233.38.113 <none> 9090:30090/TCP,8080:31273/TCP 19h |
49 | 49 | ``` |
50 | 50 |
|
51 | | - If ip address of controller node is 10.0.0.21, then grafana, prometheus, and alertmanager can be accessed using the following urls: http://10.0.0.21:30300 , http://10.0.0.21:30090 , and http://10.0.0.21:30093 , and the default user name and password for accessing grafana are admin |
| 51 | +If ip address of controller node is 10.0.0.21, then grafana, prometheus, and alertmanager can be accessed using the following urls: http://10.0.0.21:30300 , http://10.0.0.21:30090 , and http://10.0.0.21:30093 , and the default user name and password for accessing grafana are admin |
52 | 52 |
|
53 | 53 | ## Configure grafana |
54 | 54 |
|
55 | 55 | ### Create Datasource ALL |
56 | 56 |
|
57 | | - Go to the "Configuration" -> "Data soutces" page in grafana and create a datasource named "ALL", and keep the value of HTTP.URL be same with the counterpart in default "prometheus" datasource. |
| 57 | +Go to the "Configuration" -> "Data soutces" page in grafana and create a datasource named "ALL", and keep the value of HTTP.URL be same with the counterpart in default "prometheus" datasource. |
58 | 58 |
|
59 | 59 | ### Import dashboard |
60 | 60 |
|
61 | | - Go to the "Configuration" -> "Data soutces" page in grafana and import the dashboard from https://grafana.com/grafana/dashboards/22043-hami-vgpu-metrics-dashboard/ , and a dashboard page named "hami-vgpu-metrics-dashboard" will be created. 22043-hami-vgpu-metrics-dashboard is valid in grafana8.5.5 and grafana9.1.0, and it's grealty possible that this dashboard is vaild in grafana version later than 9.1.0. Now data of some panels in this dashboard page are missing, which requires you read the rest of the document. |
| 61 | +Go to the "Configuration" -> "Data soutces" page in grafana and import the dashboard from https://grafana.com/grafana/dashboards/22043-hami-vgpu-metrics-dashboard/ , and a dashboard page named "hami-vgpu-metrics-dashboard" will be created. 22043-hami-vgpu-metrics-dashboard is valid in grafana8.5.5 and grafana9.1.0, and it's grealty possible that this dashboard is vaild in grafana version later than 9.1.0. Now data of some panels in this dashboard page are missing, which requires you read the rest of the document. |
62 | 62 |
|
63 | | - For versions earlier than grafana8.5.5, such as grafana7.5.17, please refer to:https://grafana.com/grafana/dashboards/21833-hami-vgpu-dashboard/ |
| 63 | +For versions earlier than grafana8.5.5, such as grafana7.5.17, please refer to:https://grafana.com/grafana/dashboards/21833-hami-vgpu-dashboard/ |
64 | 64 |
|
65 | 65 | # Deploy dcgm-exporter |
66 | 66 |
|
@@ -231,6 +231,6 @@ NAME READY STATUS RESTARTS AGE IP NODE |
231 | 231 | gpu-pod-01 0/1 Completed 0 52s 10.233.81.70 controller01 <none> <none> |
232 | 232 | ``` |
233 | 233 |
|
234 | | - You can see the monitoring details in the dashboard. The contents are as follows: |
| 234 | +You can see the monitoring details in the dashboard. The contents are as follows: |
235 | 235 |
|
236 | 236 |  |
0 commit comments