The Alert API defines how events are filtered by severity and involved object, and what provider to use for dispatching.
The following is an example of how to send alerts to Slack when Flux fails to reconcile the flux-system namespace.
---
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Provider
metadata:
name: slack-bot
namespace: flux-system
spec:
type: slack
channel: general
address: https://slack.com/api/chat.postMessage
secretRef:
name: slack-bot-token
---
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Alert
metadata:
name: slack
namespace: flux-system
spec:
summary: "Cluster addons impacted in us-east-2"
providerRef:
name: slack-bot
eventSeverity: error
eventSources:
- kind: GitRepository
name: '*'
- kind: Kustomization
name: '*'In the above example:
- A Provider named
slack-botis created, indicated by theProvider.metadata.namefield. - An Alert named
slackis created, indicated by theAlert.metadata.namefield. - The Alert references the
slack-botprovider, indicated by theAlert.spec.providerReffield. - The notification-controller starts listening for events sent for
all GitRepositories and Kustomizations in the
flux-systemnamespace. - When an event with severity
erroris received, the controller posts a message on Slack channel from.spec.channel, containing thesummarytext and the reconciliation error.
You can run this example by saving the manifests into slack-alerts.yaml.
-
First create a secret with the Slack bot token:
kubectl -n flux-system create secret generic slack-bot-token --from-literal=token=xoxb-YOUR-TOKEN
-
Apply the resources on the cluster:
kubectl -n flux-system apply --server-side -f slack-alerts.yaml
-
Run
kubectl -n flux-system describe alert slackto see its status:... Status: Conditions: Last Transition Time: 2022-11-16T23:43:38Z Message: Initialized Observed Generation: 1 Reason: Succeeded Status: True Type: Ready Observed Generation: 1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Succeeded 82s notification-controller Initialized
As with all other Kubernetes config, an Alert needs apiVersion,
kind, and metadata fields. The name of an Alert object must be a
valid DNS subdomain name.
An Alert also needs a
.spec section.
.spec.summary is an optional field to specify a short description of the
impact and affected cluster.
The summary max length can't be greater than 255 characters.
.spec.providerRef.name is a required field to specify a name reference to a
Provider in the same namespace as the Alert.
.spec.eventSources is a required field to specify a list of references to
Flux objects for which events are forwarded to the alert provider API.
To select events issued by Flux objects, each entry in the .spec.eventSources list
must contain the following fields:
kindis the Flux Custom Resource Kind such as GitRepository, HelmRelease, Kustomization, etc.nameis the Flux Custom Resource.metadata.name, or it can be set to the*wildcard.namespaceis the Flux Custom Resource.metadata.namespace. When not specified, the Alert.metadata.namespaceis used instead.
To select events issued by a single Flux object, set the kind, name and namespace:
eventSources:
- kind: GitRepository
name: webapp
namespace: appsThe * wildcard can be used to select events issued by all Flux objects of a particular kind in a namespace:
eventSources:
- kind: HelmRelease
name: '*'
namespace: appsThe * wildcard can be used to select events issued by Flux objects from any namespace:
eventSources:
- kind: HelmRelease
name: 'service1'
namespace: '*'It requires to have cross-namespace references enabled for the controller.
To select events issued by all Flux objects of a particular kind with specific labels:
eventSources:
- kind: HelmRelease
name: '*'
namespace: apps
matchLabels:
team: app-devNote: On multi-tenant clusters, platform admins can disable cross-namespace references by
starting the controller with the --no-cross-namespace-refs=true flag.
When this flag is set, alerts can only refer to event sources in the same namespace as the alert object,
preventing tenants from subscribing to another tenant's events.
.spec.eventSeverity is an optional field to filter events based on severity. When not specified, or
when the value is set to info, all events are forwarded to the alert provider API, including errors.
To receive alerts only on errors, set the field value to error.
.spec.exclusionList is an optional field to specify a list of regex expressions to filter
events based on message content.
Skip alerting if the message matches a Go regex from the exclusion list:
---
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Alert
metadata:
name: <name>
spec:
eventSources:
- kind: GitRepository
name: '*'
exclusionList:
- "waiting.*socket"The above definition will not send alerts for transient Git clone errors like:
unable to clone 'ssh://git@ssh.dev.azure.com/v3/...', error: SSH could not read data: Error waiting on socket
.spec.suspend is an optional field to suspend the altering.
When set to true, the controller will stop processing events.
When the field is set to false or removed, it will resume.
An Alert enters various states during its lifecycle, reflected as Kubernetes Conditions. It can be ready, or it can fail during reconciliation.
The Alert API is compatible with the kstatus specification,
and reports the Reconciling condition where applicable.
The notification-controller marks an Alert as ready when it has the following characteristics:
- The Alert's Provider referenced in
.spec.providerRef.nameis found on the cluster. - The Alert's Provider
Readystatus condition is set toTrue.
When the Alert is "ready", the controller sets a Condition with the following
attributes in the Alert's .status.conditions:
type: Readystatus: "True"reason: Succeeded
The notification-controller may get stuck trying to reconcile an Alert if its Provider can't be found or if the Provider is not ready.
When this happens, the controller sets the Ready Condition status to False,
and adds a Condition with the following attributes:
type: Reconcilingstatus: "True"reason: ProgressingWithRetry
The notification-controller reports an
observed generation
in the Alert's .status.observedGeneration. The observed generation is the
latest .metadata.generation which resulted in a ready state.
The notification-controller reports the last reconcile.fluxcd.io/requestedAt
annotation value it acted on in the .status.lastHandledReconcileAt field.