Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions charts/hami/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,16 @@ This document provides detailed descriptions of all configurable values paramete
| `scheduler.service.monitorPort` | Monitor port | `31993` |
| `scheduler.service.monitorTargetPort` | Monitor target port | `9395` |

### Scheduler ServiceMonitor Configuration

| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `scheduler.servicemonitor.enabled` | Whether to enable ServiceMonitor for Prometheus monitoring | `false` |
| `scheduler.servicemonitor.labels` | Additional labels for ServiceMonitor | `{}` |
| `scheduler.servicemonitor.annotations` | Additional annotations for ServiceMonitor | `{}` |
| `scheduler.servicemonitor.interval` | Scrape interval for metrics collection | `"15s"` |
| `scheduler.servicemonitor.honorLabels` | Whether to honor labels from the target | `false` |

## Device Plugin Configuration

| Parameter | Description | Default Value |
Expand All @@ -158,6 +168,16 @@ This document provides detailed descriptions of all configurable values paramete
| `devicePlugin.monitor.image.pullSecrets` | Monitor image pull secrets | `[]` |
| `devicePlugin.monitor.ctrPath` | Container path | `/usr/local/vgpu/containers` |

### Device Plugin ServiceMonitor Configuration

| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `devicePlugin.monitor.servicemonitor.enabled` | Whether to enable ServiceMonitor for Prometheus monitoring | `false` |
| `devicePlugin.monitor.servicemonitor.labels` | Additional labels for ServiceMonitor | `{}` |
| `devicePlugin.monitor.servicemonitor.annotations` | Additional annotations for ServiceMonitor | `{}` |
| `devicePlugin.monitor.servicemonitor.interval` | Scrape interval for metrics collection | `"15s"` |
| `devicePlugin.monitor.servicemonitor.honorLabels` | Whether to honor labels from the target | `false` |

### Device Plugin Other Configuration

| Parameter | Description | Default Value |
Expand Down
33 changes: 33 additions & 0 deletions charts/hami/templates/device-plugin/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{{- if .Values.scheduler.servicemonitor.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There appears to be a copy-paste error in this conditional check. The creation of the device-plugin ServiceMonitor should be controlled by its own configuration flag, .Values.devicePlugin.monitor.servicemonitor.enabled, not the scheduler's flag. As it is, this ServiceMonitor would be incorrectly created only when the scheduler's monitor is enabled.

{{- if .Values.devicePlugin.monitor.servicemonitor.enabled }}

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
{{- if .Values.devicePlugin.monitor.servicemonitor.annotations }}
annotations:
{{ toYaml .Values.devicePlugin.monitor.servicemonitor.annotations | nindent 4 }}
{{- end }}
name: {{ include "hami-vgpu.device-plugin" . }}
namespace: {{ include "hami-vgpu.namespace" . }}
labels:
{{- include "hami-vgpu.labels" . | nindent 4 }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering add default label release: prometheus for prometheus to select this ServiceMonitor.

{{- if .Values.devicePlugin.monitor.servicemonitor.labels }}
{{ toYaml .Values.devicePlugin.monitor.servicemonitor.labels | indent 4 }}
{{- end }}
spec:
endpoints:
- path: /metrics
port: monitorport
scheme: http
interval: {{ .Values.devicePlugin.monitor.servicemonitor.interval | default "15s" }}
honorLabels: {{ .Values.devicePlugin.monitor.servicemonitor.honorLabels | default false }}
namespaceSelector:
matchNames:
- {{ include "hami-vgpu.namespace" . }}
selector:
matchLabels:
app.kubernetes.io/component: hami-device-plugin
{{- include "hami-vgpu.labels" . | nindent 6 }}
{{- if .Values.devicePlugin.service.labels }} # Use devicePlugin instead of scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment appears to be a leftover from copying another template. It should be removed to improve code clarity.

      {{- if .Values.devicePlugin.service.labels }}

{{ toYaml .Values.devicePlugin.service.labels | indent 6 }}
{{- end }}
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a best practice for files to end with a newline character. This ensures compatibility with various tools and follows POSIX standards.

{{- end }}

33 changes: 33 additions & 0 deletions charts/hami/templates/scheduler/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{{- if .Values.scheduler.servicemonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
{{- if .Values.scheduler.servicemonitor.annotations }}
annotations:
{{ toYaml .Values.scheduler.servicemonitor.annotations | nindent 4 }}
{{- end }}
name: {{ include "hami-vgpu.scheduler" . }}
namespace: {{ include "hami-vgpu.namespace" . }}
labels:
{{- include "hami-vgpu.labels" . | nindent 4 }}
{{- if .Values.scheduler.servicemonitor.labels }}
{{ toYaml .Values.scheduler.servicemonitor.labels | indent 4 }}
{{- end }}
spec:
endpoints:
- path: /metrics
port: monitor
scheme: http
interval: {{ .Values.scheduler.servicemonitor.interval | default "15s" }}
honorLabels: {{ .Values.scheduler.servicemonitor.honorLabels | default false }}
namespaceSelector:
matchNames:
- {{ include "hami-vgpu.namespace" . }}
selector:
matchLabels:
app.kubernetes.io/component: hami-scheduler
{{- include "hami-vgpu.labels" . | nindent 6 }}
{{- if .Values.scheduler.service.labels }}
{{ toYaml .Values.scheduler.service.labels | indent 6 }}
{{- end }}
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a best practice for files to end with a newline character. This ensures compatibility with various tools and follows POSIX standards.

{{- end }}

14 changes: 14 additions & 0 deletions charts/hami/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,13 @@ scheduler:
httpTargetPort: 443
labels: {}
annotations: {}
# scheduler ServiceMonitor configuration
servicemonitor:
enabled: false
labels: {}
annotations: {}
interval: "15s"
honorLabels: false

devicePlugin:
enabled: true
Expand Down Expand Up @@ -283,6 +290,13 @@ devicePlugin:
pullSecrets: []
ctrPath: /usr/local/vgpu/containers
resyncInterval: "5m"
# ServiceMonitor configuration
servicemonitor:
enabled: false
labels: {}
annotations: {}
interval: "15s"
honorLabels: false
deviceSplitCount: 10
deviceMemoryScaling: 1
deviceCoreScaling: 1
Expand Down