Skip to content

Commit ca6aa73

Browse files
sallyomkfswain
authored andcommitted
epp servicemonitor (#1425)
* epp servicemonitor and clusterpodmonitor templates Signed-off-by: sallyom <[email protected]> * add monitoring chart doc Signed-off-by: sallyom <[email protected]> --------- Signed-off-by: sallyom <[email protected]>
1 parent b3f4213 commit ca6aa73

File tree

6 files changed

+89
-7
lines changed

6 files changed

+89
-7
lines changed

config/charts/inferencepool/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,30 @@ Then apply it with:
9797
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
9898
```
9999

100+
### Install with Monitoring
101+
102+
To enable metrics collection and monitoring for the EndpointPicker, you can configure Prometheus ServiceMonitor creation:
103+
104+
```yaml
105+
inferenceExtension:
106+
monitoring:
107+
interval: "10s"
108+
prometheus:
109+
enabled: true
110+
secret:
111+
name: inference-gateway-sa-metrics-reader-secret
112+
```
113+
114+
**Note:** Prometheus monitoring requires the Prometheus Operator and ServiceMonitor CRD to be installed in the cluster.
115+
116+
For GKE environments, monitoring is automatically configured when `provider.name` is set to `gke`.
117+
118+
Then apply it with:
119+
120+
```txt
121+
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
122+
```
123+
100124
## Uninstall
101125

102126
Run the following command to uninstall the chart:
@@ -125,6 +149,10 @@ The following table list the configurable parameters of the chart.
125149
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
126150
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
127151
| `inferenceExtension.flags.has-enable-leader-election` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. |
152+
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
153+
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
154+
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
155+
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
128156
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
129157

130158
## Notes
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{{- if or .Values.inferenceExtension.monitoring.prometheus.enabled .Values.inferenceExtension.monitoring.gke.enabled }}
2+
apiVersion: v1
3+
kind: Secret
4+
metadata:
5+
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
6+
namespace: {{ .Release.Namespace }}
7+
labels:
8+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
9+
annotations:
10+
kubernetes.io/service-account.name: {{ include "gateway-api-inference-extension.name" . }}
11+
type: kubernetes.io/service-account-token
12+
{{- end }}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
2+
apiVersion: monitoring.coreos.com/v1
3+
kind: ServiceMonitor
4+
metadata:
5+
name: {{ include "gateway-api-inference-extension.name" . }}-monitor
6+
namespace: {{ .Release.Namespace }}
7+
labels:
8+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
9+
spec:
10+
endpoints:
11+
- interval: {{ .Values.inferenceExtension.monitoring.interval }}
12+
port: "http-metrics"
13+
path: "/metrics"
14+
authorization:
15+
credentials:
16+
key: token
17+
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
18+
jobLabel: {{ include "gateway-api-inference-extension.name" . }}
19+
namespaceSelector:
20+
matchNames:
21+
- {{ .Release.Namespace }}
22+
selector:
23+
matchLabels:
24+
{{- include "gateway-api-inference-extension.labels" . | nindent 6 }}
25+
{{- end }}

config/charts/inferencepool/templates/gke.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,15 +46,15 @@ spec:
4646
endpoints:
4747
- port: metrics
4848
scheme: http
49-
interval: 5s
49+
interval: {{ .Values.inferenceExtension.monitoring.interval }}
5050
path: /metrics
5151
authorization:
5252
type: Bearer
5353
credentials:
5454
secret:
55-
name: {{ .Values.gke.monitoringSecret.name }}
55+
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
5656
key: token
57-
namespace: {{ .Values.gke.monitoringSecret.namespace }}
57+
namespace: {{ .Release.Namespace }}
5858
selector:
5959
matchLabels:
6060
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}

config/charts/inferencepool/templates/rbac.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ rules:
1717
- subjectaccessreviews
1818
verbs:
1919
- create
20+
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
21+
- nonResourceURLs:
22+
- "/metrics"
23+
verbs:
24+
- get
25+
{{- end }}
2026
---
2127
kind: ClusterRoleBinding
2228
apiVersion: rbac.authorization.k8s.io/v1

config/charts/inferencepool/values.yaml

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,21 @@ inferenceExtension:
3636
- name: v
3737
value: 1
3838

39+
affinity: {}
40+
41+
tolerations: []
42+
43+
# Monitoring configuration for EPP
44+
monitoring:
45+
interval: "10s"
46+
# Service account token secret for authentication
47+
secret:
48+
name: inference-gateway-sa-metrics-reader-secret
49+
50+
# Prometheus ServiceMonitor will be created when enabled for EPP metrics collection
51+
prometheus:
52+
enabled: false
53+
3954
inferencePool:
4055
targetPorts:
4156
- number: 8000
@@ -52,7 +67,3 @@ inferencePool:
5267
provider:
5368
name: none
5469

55-
gke:
56-
monitoringSecret:
57-
name: inference-gateway-sa-metrics-reader-secret
58-
namespace: default

0 commit comments

Comments
 (0)