KubeAid uses ArgoCD to implement GitOps principles, ensuring that your cluster state always matches what's defined in Git. This document explains how drift detection works and how to configure alerting.
ArgoCD continuously monitors your cluster and compares it against the desired state in Git:
flowchart LR
subgraph Git["Git Repository"]
Desired["Desired State"]
end
subgraph ArgoCD["ArgoCD"]
Sync["Sync Engine"]
Compare["Comparison"]
end
subgraph Cluster["Kubernetes Cluster"]
Actual["Actual State"]
end
Desired --> Compare
Actual --> Compare
Compare -->|"Match"| Synced["✓ Synced"]
Compare -->|"Mismatch"| OutOfSync["⚠ OutOfSync"]
style Synced fill:#6b8e23,stroke:#4a6319,color:#fff
style OutOfSync fill:#e8833a,stroke:#b35c1e,color:#fff
| Status | Meaning |
|---|---|
| Synced | Cluster matches Git - desired state achieved |
| OutOfSync | Cluster differs from Git - manual changes detected or pending updates |
| Unknown | ArgoCD cannot determine the state |
| Status | Meaning |
|---|---|
| Healthy | All resources are running correctly |
| Progressing | Resources are being deployed/updated |
| Degraded | Some resources have issues |
| Suspended | Resources are paused |
| Missing | Resources don't exist yet |
When someone manually modifies a resource:
# This creates drift!
kubectl edit deployment my-app -n productionArgoCD detects this and marks the application as OutOfSync.
Resources that exist in the cluster but are not tracked by ArgoCD. These can be:
- Manually created resources
- Resources from other deployment tools
- Leftover resources from deleted applications
ArgoCD can track resources it doesn't manage (orphaned resources). This is configured per Application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
spec:
# ... other config ...
syncPolicy:
syncOptions:
- CreateNamespace=true
# Enable resource tracking
source:
plugin:
env:
- name: ARGOCD_APP_SOURCE_REPO
value: "true"In the ArgoCD UI:
- Navigate to your Application
- Click on "App Details"
- Look for resources marked with a warning icon
Via CLI:
argocd app resources <app-name> --orphanedArgoCD exposes metrics that Prometheus can scrape. Add these alert rules:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: monitoring
spec:
groups:
- name: argocd
rules:
# Alert when application is out of sync
- alert: ArgoCDApplicationOutOfSync
expr: |
argocd_app_info{sync_status="OutOfSync"} == 1
for: 15m
labels:
severity: warning
annotations:
summary: "ArgoCD Application {{ $labels.name }} is out of sync"
description: "Application {{ $labels.name }} has been OutOfSync for more than 15 minutes"
# Alert when application is unhealthy
- alert: ArgoCDApplicationUnhealthy
expr: |
argocd_app_info{health_status!~"Healthy|Progressing"} == 1
for: 15m
labels:
severity: critical
annotations:
summary: "ArgoCD Application {{ $labels.name }} is unhealthy"
description: "Application {{ $labels.name }} health status is {{ $labels.health_status }}"
# Alert on sync failures
- alert: ArgoCDSyncFailed
expr: |
argocd_app_sync_total{phase="Failed"} > 0
for: 1m
labels:
severity: critical
annotations:
summary: "ArgoCD sync failed for {{ $labels.name }}"
description: "Application {{ $labels.name }} sync operation failed"ArgoCD has a built-in notification system. Configure it for Slack, email, or other channels:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
trigger.on-sync-status-unknown: |
- when: app.status.sync.status == 'Unknown'
send: [app-sync-status]
trigger.on-health-degraded: |
- when: app.status.health.status == 'Degraded'
send: [app-health-degraded]
template.app-sync-status: |
message: |
Application {{.app.metadata.name}} sync status is {{.app.status.sync.status}}
template.app-health-degraded: |
message: |
Application {{.app.metadata.name}} health has degraded
service.slack: |
token: $slack-tokenArgoCD can automatically revert manual changes:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
spec:
syncPolicy:
automated:
prune: true # Remove resources not in Git
selfHeal: true # Revert manual changesWarning: Enable
selfHealcarefully. It will override any manual changes!
All changes should go through Git:
# ❌ Don't do this
kubectl edit deployment my-app
# ✅ Do this instead
# 1. Edit the YAML in your Git repository
# 2. Create a Pull Request
# 3. Merge and let ArgoCD syncPrevent syncs during critical periods:
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: production
spec:
syncWindows:
- kind: deny
schedule: '0 22 * * *' # No syncs at 10 PM
duration: 8h
applications:
- '*'Always inspect what will change:
argocd app diff <app-name>Or use the ArgoCD UI to see a visual diff.
Configure alerts for:
- OutOfSync states lasting more than 15 minutes
- Health degradation
- Sync failures
-
Check the diff:
argocd app diff <app-name>
-
Common causes:
- Immutable fields (e.g.,
selectorin Deployments) - Defaulted fields that differ from manifest
- Resources modified by controllers
- Immutable fields (e.g.,
-
Solutions:
- Use
ignoreDifferencesfor known discrepancies - Update your Git manifests to match the expected state
- Use
Check sync history:
argocd app history <app-name>View detailed sync operation:
argocd app get <app-name> --show-operation