Skip to content

[Feature Request] Add ServiceMonitor and PrometheusRule support to all ACK controller Helm charts #2717

@oded-dd

Description

@oded-dd

Feature Request

Summary

Add native Prometheus Operator support (ServiceMonitor and PrometheusRule) to all ACK controller Helm charts. This would enable seamless integration with Prometheus-based monitoring stacks and provide out-of-the-box alerting capabilities.

Motivation

Currently, ACK controllers expose metrics but don't include Prometheus Operator resources in their Helm charts. Users running Prometheus Operator (which is very common in Kubernetes environments) need to manually create ServiceMonitor and PrometheusRule resources for each ACK controller they deploy.

Adding these resources to the Helm charts would:

  • Reduce manual effort for users setting up monitoring
  • Ensure consistency across ACK controller deployments
  • Provide sensible default alerts for common issues like sync errors
  • Follow best practices established by other Kubernetes operators (e.g., cert-manager, external-dns)

Proposed Solution

I've created a reference implementation for the RDS controller: aws-controllers-k8s/rds-controller#254

The changes include:

1. ServiceMonitor template (helm/templates/service-monitor.yaml)

{{- if .Values.metrics.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: {{ include "ack-controller.fullname" . }}-metrics
  labels:
    {{- include "ack-controller.labels" . | nindent 4 }}
    {{- with .Values.metrics.serviceMonitor.additionalLabels }}
    {{- toYaml . | nindent 4 }}
    {{- end }}
spec:
  selector:
    matchLabels:
      {{- include "ack-controller.selectorLabels" . | nindent 6 }}
  endpoints:
    - port: http
      interval: {{ .Values.metrics.serviceMonitor.interval }}
      scrapeTimeout: {{ .Values.metrics.serviceMonitor.scrapeTimeout }}
{{- end }}

2. PrometheusRule template (helm/templates/prometheus-rule.yaml)

{{- if .Values.metrics.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: {{ include "ack-controller.fullname" . }}
  labels:
    {{- include "ack-controller.labels" . | nindent 4 }}
    {{- with .Values.metrics.prometheusRule.additionalLabels }}
    {{- toYaml . | nindent 4 }}
    {{- end }}
spec:
  groups:
    - name: {{ include "ack-controller.fullname" . }}
      rules:
        {{- toYaml .Values.metrics.prometheusRule.rules | nindent 8 }}
{{- end }}

3. Values additions (helm/values.yaml)

metrics:
  serviceMonitor:
    enabled: false
    additionalLabels: {}
    interval: 30s
    scrapeTimeout: 10s
  prometheusRule:
    enabled: false
    additionalLabels: {}
    rules:
      - alert: ACKControllerSyncErrors
        expr: sum by (controller) (rate(controller_runtime_reconcile_errors_total{job="<controller>-metrics"}[10m])) > 0.5
        for: 5m
        labels:
          severity: critical
        annotations:
          description: ACK controller having sync errors in the last 10 minutes
          summary: ACK controller having sync errors with one or more objects

Request

Could this be added as a standard feature across all ACK controller Helm charts? This would ideally be:

  1. Added to the common Helm chart templates/generator so all controllers get it automatically
  2. Disabled by default (to not break existing deployments)
  3. Include sensible default alerting rules that can be customized

Additional Context

  • Prometheus Operator is a CNCF project and widely adopted
  • Many users deploy ACK controllers in environments where Prometheus Operator is the standard monitoring solution
  • This follows patterns established by other popular Kubernetes projects

Related

Thank you for considering this feature!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions