From 95137532c1966f1a576cdd709dece484b301a19e Mon Sep 17 00:00:00 2001 From: ivan katliarchuk Date: Tue, 7 Oct 2025 10:38:25 +0100 Subject: [PATCH 1/7] docs(release): update release docs Signed-off-by: ivan katliarchuk --- docs/faq.md | 15 +++ docs/release.md | 7 ++ docs/version-update-playbook.md | 164 ++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 4 files changed, 187 insertions(+) create mode 100644 docs/version-update-playbook.md diff --git a/docs/faq.md b/docs/faq.md index 090938caf4..ce54b37326 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -320,3 +320,18 @@ FATA[0060] failed to sync cache: timed out waiting for the condition You may not have the correct permissions required to query all the necessary resources in your kubernetes cluster. Specifically, you may be running in a `namespace` that you don't have these permissions in. By default, commands are run against the `default` namespace. Try changing this to your particular namespace to see if that fixes the issue. + +## When we plan to release a v1.0, our first breaking release? + +> We should really get away from 0.x only if we have APIs that we can declare stable. + +The jump to `1.0` isn’t just symbolic—it’s a promise. If the `External-DNS` maintainers can confidently say that config structures, CRDs, and flags won’t break unexpectedly, that’s the moment to move to `1.0` + +Before moving to 1.0, review and lock down: + +- CRD schemas (especially DNSEndpoint and ExternalDNS if applicable) +- Command-line flags and configuration behavior +- Environment variables and metrics +- Provider interface stability (e.g., AWS, GCP, Cloudflare, etc.) +- Helm chart values — these often break user setups more than code changes +- Once these are considered stable and documented, then a 1.0 tag makes sense. diff --git a/docs/release.md b/docs/release.md index 58b05b2ef8..a9c5513ce5 100644 --- a/docs/release.md +++ b/docs/release.md @@ -28,6 +28,13 @@ These are the conventions that we will be using for releases following `0.7.6`: - **Major** version should be upgraded if we introduce breaking changes. +### Semantic Versioning Discipline + +External-DNS follows semantic versioning principles: + +- `0.x` → pre-stable, APIs subject to change. +- `1.x` → stable APIs and core functionality; breaking changes require a major bump. + ## How to release a new image ### Prerequisite diff --git a/docs/version-update-playbook.md b/docs/version-update-playbook.md new file mode 100644 index 0000000000..cb4d9c8e1b --- /dev/null +++ b/docs/version-update-playbook.md @@ -0,0 +1,164 @@ +# 🧭 External-DNS Version Upgrade Playbook + +## Overview + +This playbook describes the best practices and steps to safely upgrade **External-DNS** in Kubernetes clusters. + +Upgrading External-DNS involves validating configuration compatibility, testing changes, and ensuring no unintended DNS record modifications occur. + +> Note; We strongly encourage the community to help the maintainers validate changes before they are merged or released. +> Early validation and feedback are key to ensuring stable upgrades for everyone. + +--- + +## 1. Review Release Notes + +- Visit the official [External-DNS Releases](https://github.com/kubernetes-sigs/external-dns/releases). +- Review all versions between your current and target release. +- Pay attention to: + - **Breaking changes** (flags, CRD fields, provider behaviors). Not all changes could be captured as breaking changes. + - **Deprecations** + - **Provider-specific updates** + - **Bug fixes** + +> ⚠️ Breaking CLI flag or annotation changes are common in `0.x` releases. + +--- + +## 2. Review Helm Chart and Configuration + +If using Helm: + +- Compare your Helm chart version to the version supporting the new app release. +- Check for: + - `values.yaml` structural changes + - Default arguments under `extraArgs` + - Updates to RBAC, ServiceAccounts, or Deployment templates + +--- + +## 3. Check Compatibility + +Before upgrading, confirm: + +- The new version supports your **Kubernetes version** (e.g., 1.25+). +- The **DNS provider** integration you use is still supported. + +> 💡 Watch out for deprecated Kubernetes API versions (e.g., `v1/endpoints` → `discovery.k8s.io/v1/endpointslices`). + +--- + +## 4. Test in Non-Production or with Dry Run flag + +Run the new External-DNS version in a **staging cluster**. + +- Use `--dry-run` mode to preview intended changes: + - Validate logs for any unexpected record changes. + - Ensure `external-dns` correctly identifies and plans updates without actually applying them. + - **submit a feature request if `dry-run` is not supported for a specific case + + ```yaml + args: + - --dry-run + ``` + +--- + +5. Backup DNS State + +Before applying the upgrade, take a snapshot of your DNS zone(s). + +**Example (AWS Route53):** + + ```sh + aws route53 list-resource-record-sets --hosted-zone-id ZONE_ID > backup.json + ``` + +Use equivalent tooling for your DNS provider (Cloudflare, Google Cloud DNS, etc.). + +> Having a backup ensures you can restore records if External-DNS misconfigures entries and you have a solid DR solution. + +6. Perform a Controlled Rollout + +Instead of upgrading in-place, use a phased rollout across multiple environments or clusters. + +Recommended Approaches + +a. Multi-Cluster Rollout and Progression + + 1. Deploy the new `external-dns` version first in sandbox, then staging, and finally production. + 2. Monitor each environment for correct record syncing and absence of unexpected deletions. + 3. Promote the configuration only after validation in the lower environment. + +b. Read-Only Parallel Deployment + + 1. Run a second External-DNS instance (e.g., external-dns-readonly) with: + + ```yaml + args: + - --dry-run + - ...other flags + ``` + + 1. Observe logs and planned record updates to confirm behavior. + 2. Observe logs and planned record updates to confirm behavior. + + 7. Monitor and Validate + +After deploying the new version, continuously observe both application logs and DNS synchronization metrics to ensure External-DNS behaves as expected. + +**Logging** + +Check logs for anomalies or unexpected record changes: + +```yaml +kubectl logs -n external-dns deploy/external-dns --tail=100 -f +``` + +Look for: + +- Creating record or Deleting record entries — validate these match expected changes. +- `WARN` or `ERROR` messages, particularly related to provider authentication or permissions. +- `TXT` registry conflicts (ownership issues between multiple instances). + +If using a centralized logging stack (e.g., Loki, Elasticsearch, or CloudWatch Logs): + +- Create a temporary dashboard or saved query filtering for "Creating record" OR "Deleting record". +- Correlate `external-dns` logs with DNS provider API logs to detect mismatches. + +**Metrics and Observability** + +Check metrics exposed by External-DNS (if Prometheus scraping is enabled): + +Focus on: + +- Error rate (*_errors_total) +- Number of syncs per interval (*_sync_duration_seconds) +- Provider API call spikes + +Example PromQL checks: + +```promql +rate(external_dns_registry_errors_total[5m]) > 0 +rate(external_dns_provider_requests_total{operation="DELETE"}[5m]) +``` + +## External Verification + +Ideally, you should have a set of automated tests + +Query key DNS records directly: + + ```sh + dig +short myapp.example.com + nslookup api.staging.example.com + ``` + +Ensure that A, CNAME, and TXT records remain correct and point to expected endpoints. + +Additional Tips + +- Automate upgrade testing with CI/CD pipelines. +- Maintain clear CHANGELOGs and migration notes for internal users. +- Tag known good versions in Git or Helm values for rollback. +- Avoid skipping multiple minor versions when possible. diff --git a/mkdocs.yml b/mkdocs.yml index ba43d28930..297f986ff7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -17,6 +17,7 @@ nav: - Code of Conduct: code-of-conduct.md - License: LICENSE.md - Providers: docs/providers.md + - Version Update: docs/version-update-playbook.md - Tutorials: docs/tutorials/* - Annotations: - About: docs/annotations/annotations.md From c2281dc4edceea14f049fe584ffaca7cce22805a Mon Sep 17 00:00:00 2001 From: ivan katliarchuk Date: Tue, 7 Oct 2025 10:40:55 +0100 Subject: [PATCH 2/7] docs(release): update release docs Signed-off-by: ivan katliarchuk --- docs/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/faq.md b/docs/faq.md index ce54b37326..adaf972f00 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -321,7 +321,7 @@ FATA[0060] failed to sync cache: timed out waiting for the condition You may not have the correct permissions required to query all the necessary resources in your kubernetes cluster. Specifically, you may be running in a `namespace` that you don't have these permissions in. By default, commands are run against the `default` namespace. Try changing this to your particular namespace to see if that fixes the issue. -## When we plan to release a v1.0, our first breaking release? +## When we plan to release a v1.0, our first `major` release? > We should really get away from 0.x only if we have APIs that we can declare stable. From ba03489ee73df3b648dd8f663e7c3c6e8735c423 Mon Sep 17 00:00:00 2001 From: ivan katliarchuk Date: Tue, 7 Oct 2025 10:42:40 +0100 Subject: [PATCH 3/7] docs(release): update release docs Signed-off-by: ivan katliarchuk --- docs/faq.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index adaf972f00..0fd6e5f4a7 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -329,9 +329,10 @@ The jump to `1.0` isn’t just symbolic—it’s a promise. If the `External-DNS Before moving to 1.0, review and lock down: -- CRD schemas (especially DNSEndpoint and ExternalDNS if applicable) +- CRD schemas (especially DNSEndpoint if applicable) +- Annotations support - Command-line flags and configuration behavior - Environment variables and metrics -- Provider interface stability (e.g., AWS, GCP, Cloudflare, etc.) +- Provider interface stability - Helm chart values — these often break user setups more than code changes -- Once these are considered stable and documented, then a 1.0 tag makes sense. +- Once these are considered stable and documented, then a `1.0` tag makes sense. From d9e179e092bbfdeee80b36c50a3c082f6f12d617 Mon Sep 17 00:00:00 2001 From: ivan katliarchuk Date: Wed, 8 Oct 2025 09:14:39 +0100 Subject: [PATCH 4/7] docs(release): update release docs Signed-off-by: ivan katliarchuk --- docs/faq.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/faq.md b/docs/faq.md index 0fd6e5f4a7..5a2d628db5 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -334,5 +334,4 @@ Before moving to 1.0, review and lock down: - Command-line flags and configuration behavior - Environment variables and metrics - Provider interface stability -- Helm chart values — these often break user setups more than code changes - Once these are considered stable and documented, then a `1.0` tag makes sense. From 8756408c6901a8f656ea07e4b4fa393390c938f2 Mon Sep 17 00:00:00 2001 From: ivan katliarchuk Date: Wed, 8 Oct 2025 09:26:37 +0100 Subject: [PATCH 5/7] docs(release): update release docs Signed-off-by: ivan katliarchuk --- docs/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/faq.md b/docs/faq.md index 5a2d628db5..5c4e347746 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -327,7 +327,7 @@ By default, commands are run against the `default` namespace. Try changing this The jump to `1.0` isn’t just symbolic—it’s a promise. If the `External-DNS` maintainers can confidently say that config structures, CRDs, and flags won’t break unexpectedly, that’s the moment to move to `1.0` -Before moving to 1.0, review and lock down: +Before moving to `1.0`, review and lock down: - CRD schemas (especially DNSEndpoint if applicable) - Annotations support From 97381b9ab8bcf1d4a8c5c34b37aaee76b90716da Mon Sep 17 00:00:00 2001 From: ivan katliarchuk Date: Sat, 11 Oct 2025 10:08:38 +0100 Subject: [PATCH 6/7] docs(release): update release docs Signed-off-by: ivan katliarchuk --- docs/release.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/release.md b/docs/release.md index a9c5513ce5..5bf2cb2c71 100644 --- a/docs/release.md +++ b/docs/release.md @@ -33,7 +33,11 @@ These are the conventions that we will be using for releases following `0.7.6`: External-DNS follows semantic versioning principles: - `0.x` → pre-stable, APIs subject to change. -- `1.x` → stable APIs and core functionality; breaking changes require a major bump. +- `1.x` → not yet considered. + +> **Versioning & Releases** +> External-DNS opts to stay within `0.x` versioning scheme. +> We strive for stability, but reserve the right to introduce breaking changes in minor version bumps when necessary. ## How to release a new image From 05b8a755cf35e8fee61944839e8408ee5a8f5816 Mon Sep 17 00:00:00 2001 From: Ivan Ka <5395690+ivankatliarchuk@users.noreply.github.com> Date: Mon, 13 Oct 2025 08:10:54 +0100 Subject: [PATCH 7/7] docs(release): update release docs Co-authored-by: Michel Loiseleur <97035654+mloiseleur@users.noreply.github.com> --- docs/version-update-playbook.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/version-update-playbook.md b/docs/version-update-playbook.md index cb4d9c8e1b..22e20b4c59 100644 --- a/docs/version-update-playbook.md +++ b/docs/version-update-playbook.md @@ -55,7 +55,7 @@ Run the new External-DNS version in a **staging cluster**. - Use `--dry-run` mode to preview intended changes: - Validate logs for any unexpected record changes. - Ensure `external-dns` correctly identifies and plans updates without actually applying them. - - **submit a feature request if `dry-run` is not supported for a specific case + - **submit a feature request** if `dry-run` is not supported for a specific case ```yaml args: @@ -70,9 +70,9 @@ Before applying the upgrade, take a snapshot of your DNS zone(s). **Example (AWS Route53):** - ```sh - aws route53 list-resource-record-sets --hosted-zone-id ZONE_ID > backup.json - ``` +```sh +aws route53 list-resource-record-sets --hosted-zone-id ZONE_ID > backup.json +``` Use equivalent tooling for your DNS provider (Cloudflare, Google Cloud DNS, etc.).