|
| 1 | +# 🧭 External-DNS Version Upgrade Playbook |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This playbook describes the best practices and steps to safely upgrade **External-DNS** in Kubernetes clusters. |
| 6 | + |
| 7 | +Upgrading External-DNS involves validating configuration compatibility, testing changes, and ensuring no unintended DNS record modifications occur. |
| 8 | + |
| 9 | +> Note; We strongly encourage the community to help the maintainers validate changes before they are merged or released. |
| 10 | +> Early validation and feedback are key to ensuring stable upgrades for everyone. |
| 11 | +
|
| 12 | +--- |
| 13 | + |
| 14 | +## 1. Review Release Notes |
| 15 | + |
| 16 | +- Visit the official [External-DNS Releases](https://github.com/kubernetes-sigs/external-dns/releases). |
| 17 | +- Review all versions between your current and target release. |
| 18 | +- Pay attention to: |
| 19 | + - **Breaking changes** (flags, CRD fields, provider behaviors). Not all changes could be captured as breaking changes. |
| 20 | + - **Deprecations** |
| 21 | + - **Provider-specific updates** |
| 22 | + - **Bug fixes** |
| 23 | + |
| 24 | +> ⚠️ Breaking CLI flag or annotation changes are common in `0.x` releases. |
| 25 | +
|
| 26 | +--- |
| 27 | + |
| 28 | +## 2. Review Helm Chart and Configuration |
| 29 | + |
| 30 | +If using Helm: |
| 31 | + |
| 32 | +- Compare your Helm chart version to the version supporting the new app release. |
| 33 | +- Check for: |
| 34 | + - `values.yaml` structural changes |
| 35 | + - Default arguments under `extraArgs` |
| 36 | + - Updates to RBAC, ServiceAccounts, or Deployment templates |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## 3. Check Compatibility |
| 41 | + |
| 42 | +Before upgrading, confirm: |
| 43 | + |
| 44 | +- The new version supports your **Kubernetes version** (e.g., 1.25+). |
| 45 | +- The **DNS provider** integration you use is still supported. |
| 46 | + |
| 47 | +> 💡 Watch out for deprecated Kubernetes API versions (e.g., `v1/endpoints` → `discovery.k8s.io/v1/endpointslices`). |
| 48 | +
|
| 49 | +--- |
| 50 | + |
| 51 | +## 4. Test in Non-Production or with Dry Run flag |
| 52 | + |
| 53 | +Run the new External-DNS version in a **staging cluster**. |
| 54 | + |
| 55 | +- Use `--dry-run` mode to preview intended changes: |
| 56 | + - Validate logs for any unexpected record changes. |
| 57 | + - Ensure `external-dns` correctly identifies and plans updates without actually applying them. |
| 58 | + - **submit a feature request if `dry-run` is not supported for a specific case |
| 59 | + |
| 60 | + ```yaml |
| 61 | + args: |
| 62 | + - --dry-run |
| 63 | + ``` |
| 64 | +
|
| 65 | +--- |
| 66 | +
|
| 67 | +5. Backup DNS State |
| 68 | +
|
| 69 | +Before applying the upgrade, take a snapshot of your DNS zone(s). |
| 70 | +
|
| 71 | +**Example (AWS Route53):** |
| 72 | +
|
| 73 | + ```sh |
| 74 | + aws route53 list-resource-record-sets --hosted-zone-id ZONE_ID > backup.json |
| 75 | + ``` |
| 76 | + |
| 77 | +Use equivalent tooling for your DNS provider (Cloudflare, Google Cloud DNS, etc.). |
| 78 | + |
| 79 | +> Having a backup ensures you can restore records if External-DNS misconfigures entries and you have a solid DR solution. |
| 80 | +
|
| 81 | +6. Perform a Controlled Rollout |
| 82 | + |
| 83 | +Instead of upgrading in-place, use a phased rollout across multiple environments or clusters. |
| 84 | + |
| 85 | +Recommended Approaches |
| 86 | + |
| 87 | +a. Multi-Cluster Rollout and Progression |
| 88 | + |
| 89 | + 1. Deploy the new `external-dns` version first in sandbox, then staging, and finally production. |
| 90 | + 2. Monitor each environment for correct record syncing and absence of unexpected deletions. |
| 91 | + 3. Promote the configuration only after validation in the lower environment. |
| 92 | + |
| 93 | +b. Read-Only Parallel Deployment |
| 94 | + |
| 95 | + 1. Run a second External-DNS instance (e.g., external-dns-readonly) with: |
| 96 | + |
| 97 | + ```yaml |
| 98 | + args: |
| 99 | + - --dry-run |
| 100 | + - ...other flags |
| 101 | + ``` |
| 102 | +
|
| 103 | + 1. Observe logs and planned record updates to confirm behavior. |
| 104 | + 2. Observe logs and planned record updates to confirm behavior. |
| 105 | +
|
| 106 | + 7. Monitor and Validate |
| 107 | +
|
| 108 | +After deploying the new version, continuously observe both application logs and DNS synchronization metrics to ensure External-DNS behaves as expected. |
| 109 | +
|
| 110 | +**Logging** |
| 111 | +
|
| 112 | +Check logs for anomalies or unexpected record changes: |
| 113 | +
|
| 114 | +```yaml |
| 115 | +kubectl logs -n external-dns deploy/external-dns --tail=100 -f |
| 116 | +``` |
| 117 | + |
| 118 | +Look for: |
| 119 | + |
| 120 | +- Creating record or Deleting record entries — validate these match expected changes. |
| 121 | +- `WARN` or `ERROR` messages, particularly related to provider authentication or permissions. |
| 122 | +- `TXT` registry conflicts (ownership issues between multiple instances). |
| 123 | + |
| 124 | +If using a centralized logging stack (e.g., Loki, Elasticsearch, or CloudWatch Logs): |
| 125 | + |
| 126 | +- Create a temporary dashboard or saved query filtering for "Creating record" OR "Deleting record". |
| 127 | +- Correlate `external-dns` logs with DNS provider API logs to detect mismatches. |
| 128 | + |
| 129 | +**Metrics and Observability** |
| 130 | + |
| 131 | +Check metrics exposed by External-DNS (if Prometheus scraping is enabled): |
| 132 | + |
| 133 | +Focus on: |
| 134 | + |
| 135 | +- Error rate (*_errors_total) |
| 136 | +- Number of syncs per interval (*_sync_duration_seconds) |
| 137 | +- Provider API call spikes |
| 138 | + |
| 139 | +Example PromQL checks: |
| 140 | + |
| 141 | +```promql |
| 142 | +rate(external_dns_registry_errors_total[5m]) > 0 |
| 143 | +rate(external_dns_provider_requests_total{operation="DELETE"}[5m]) |
| 144 | +``` |
| 145 | + |
| 146 | +## External Verification |
| 147 | + |
| 148 | +Ideally, you should have a set of automated tests |
| 149 | + |
| 150 | +Query key DNS records directly: |
| 151 | + |
| 152 | + ```sh |
| 153 | + dig +short myapp.example.com |
| 154 | + nslookup api.staging.example.com |
| 155 | + ``` |
| 156 | + |
| 157 | +Ensure that A, CNAME, and TXT records remain correct and point to expected endpoints. |
| 158 | + |
| 159 | +Additional Tips |
| 160 | + |
| 161 | +- Automate upgrade testing with CI/CD pipelines. |
| 162 | +- Maintain clear CHANGELOGs and migration notes for internal users. |
| 163 | +- Tag known good versions in Git or Helm values for rollback. |
| 164 | +- Avoid skipping multiple minor versions when possible. |
0 commit comments