-
Notifications
You must be signed in to change notification settings - Fork 271
Description
Feature Request
Summary
Add native Prometheus Operator support (ServiceMonitor and PrometheusRule) to all ACK controller Helm charts. This would enable seamless integration with Prometheus-based monitoring stacks and provide out-of-the-box alerting capabilities.
Motivation
Currently, ACK controllers expose metrics but don't include Prometheus Operator resources in their Helm charts. Users running Prometheus Operator (which is very common in Kubernetes environments) need to manually create ServiceMonitor and PrometheusRule resources for each ACK controller they deploy.
Adding these resources to the Helm charts would:
- Reduce manual effort for users setting up monitoring
- Ensure consistency across ACK controller deployments
- Provide sensible default alerts for common issues like sync errors
- Follow best practices established by other Kubernetes operators (e.g., cert-manager, external-dns)
Proposed Solution
I've created a reference implementation for the RDS controller: aws-controllers-k8s/rds-controller#254
The changes include:
1. ServiceMonitor template (helm/templates/service-monitor.yaml)
{{- if .Values.metrics.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "ack-controller.fullname" . }}-metrics
labels:
{{- include "ack-controller.labels" . | nindent 4 }}
{{- with .Values.metrics.serviceMonitor.additionalLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
selector:
matchLabels:
{{- include "ack-controller.selectorLabels" . | nindent 6 }}
endpoints:
- port: http
interval: {{ .Values.metrics.serviceMonitor.interval }}
scrapeTimeout: {{ .Values.metrics.serviceMonitor.scrapeTimeout }}
{{- end }}2. PrometheusRule template (helm/templates/prometheus-rule.yaml)
{{- if .Values.metrics.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ include "ack-controller.fullname" . }}
labels:
{{- include "ack-controller.labels" . | nindent 4 }}
{{- with .Values.metrics.prometheusRule.additionalLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
groups:
- name: {{ include "ack-controller.fullname" . }}
rules:
{{- toYaml .Values.metrics.prometheusRule.rules | nindent 8 }}
{{- end }}3. Values additions (helm/values.yaml)
metrics:
serviceMonitor:
enabled: false
additionalLabels: {}
interval: 30s
scrapeTimeout: 10s
prometheusRule:
enabled: false
additionalLabels: {}
rules:
- alert: ACKControllerSyncErrors
expr: sum by (controller) (rate(controller_runtime_reconcile_errors_total{job="<controller>-metrics"}[10m])) > 0.5
for: 5m
labels:
severity: critical
annotations:
description: ACK controller having sync errors in the last 10 minutes
summary: ACK controller having sync errors with one or more objectsRequest
Could this be added as a standard feature across all ACK controller Helm charts? This would ideally be:
- Added to the common Helm chart templates/generator so all controllers get it automatically
- Disabled by default (to not break existing deployments)
- Include sensible default alerting rules that can be customized
Additional Context
- Prometheus Operator is a CNCF project and widely adopted
- Many users deploy ACK controllers in environments where Prometheus Operator is the standard monitoring solution
- This follows patterns established by other popular Kubernetes projects
Related
- Reference PR for RDS controller: Feat/add servicemonitor and prometheusrule rds-controller#254
Thank you for considering this feature!