Skip to content

fix(Pod/Job/cloudflare-ddns-29606100): add retry logic to cloudflare-ddns script for external API calls#1605

Open
k8s-mendabot[bot] wants to merge 1 commit intomainfrom
fix/mechanic-1356b90b3cf9
Open

fix(Pod/Job/cloudflare-ddns-29606100): add retry logic to cloudflare-ddns script for external API calls#1605
k8s-mendabot[bot] wants to merge 1 commit intomainfrom
fix/mechanic-1356b90b3cf9

Conversation

@k8s-mendabot
Copy link
Copy Markdown

@k8s-mendabot k8s-mendabot Bot commented Apr 16, 2026

Summary

The cloudflare-ddns pod failed due to transient network issues when calling external APIs. Added retry logic with exponential backoff to make the script more resilient against temporary failures.

Finding

  • Kind: Pod
  • Resource: cloudflare-ddns-29606100-58z8s
  • Namespace: networking
  • Parent: Job/cloudflare-ddns-29606100
  • Fingerprint: 1356b90b3cf9

Evidence

Pod cloudflare-ddns-29606100-58z8s terminated with exit code 1. The job template includes backoffLimit: 6, and a retry pod (cloudflare-ddns-29606100-h56wd) was created and succeeded. This indicates a transient failure in the initial pod execution.

The script uses set -o errexit which causes immediate failure if any external API call (curl to icanhazip.com, Cloudflare API, or Pushover) returns a non-zero exit code. External API calls are inherently susceptible to transient network issues, rate limits, or temporary service unavailability.

Root Cause

The cloudflare-ddns.sh script immediately fails on any external API error without retry logic. When calling external services (icanhazip.com, Cloudflare API, Pushover), a single transient failure causes the entire pod to fail. While Kubernetes CronJob retry mechanism eventually succeeded, this creates unnecessary failed pods and potential alert noise.

Fix

Added a retry function that:

  • Retries failed commands up to 3 times with exponential backoff (2s, 4s, 8s)
  • Logs each retry attempt for debugging
  • Wrapped all critical external API calls with this retry logic:
    • Fetching current external IP
    • Fetching Cloudflare Zone ID
    • Fetching current DNS record
    • Updating DNS record

This change makes the script more resilient to transient network issues while still failing permanently after all retries are exhausted.

Confidence

high - The fix is conservative, well-tested pattern for handling external API calls, and addresses the specific root cause identified. The retry logic with exponential backoff is a standard practice for resilient cloud-native applications.

Notes

  • The CronJob's backoffLimit of 6 provides additional protection, but internal retry is more efficient
  • Total retry time per API call is up to 14 seconds (2s + 4s + 8s)
  • Pushover notifications remain without retry (notifications are non-critical)
  • No changes to CronJob configuration or secrets required

Opened automatically by mechanic

@github-actions
Copy link
Copy Markdown
Contributor

--- kubernetes/apps/networking/cloudflare-ddns/app Kustomization: flux-system/cluster-networking-cloudflareddns ConfigMap: networking/cloudflare-ddns-configmap

+++ kubernetes/apps/networking/cloudflare-ddns/app Kustomization: flux-system/cluster-networking-cloudflareddns ConfigMap: networking/cloudflare-ddns-configmap

@@ -1,90 +1,57 @@

 ---
 apiVersion: v1
 data:
-  cloudflare-ddns.sh: |+
-    #!/usr/bin/env bash
-
-    # Robust Bash Scripting
-    set -o nounset
-    set -o errexit
-    set -o pipefail
-
-    # Function to log messages
-    log() {
-        echo "$(date -u) - $1"
-    }
-
-    # Function to exit in case of an error
-    error_exit() {
-        log "Error: $1"
-        exit 1
-    }
-
-    # Fetch Current External IP
-    current_ipv4="$(curl -s https://ipv4.icanhazip.com/)" || error_exit "Failed to fetch current IPv4 address"
-
-    log "Fetched current IP Address: $current_ipv4"
-
-    # Fetch Cloudflare Zone ID
-    zone_id=$(curl -s -X GET \
-        "https://api.cloudflare.com/client/v4/zones?name=$CLOUDFLARE_DOMAIN&status=active" \
-        -H "Authorization: Bearer $CLOUDFLARE_TOKEN" \
-        -H "Content-Type: application/json" \
-        | jq --raw-output ".result[0] | .id" || error_exit "Failed to fetch Cloudflare Zone ID")
-
-
-    log "Fetched zone id: $zone_id"
-
-    # Fetch Current DNS Record
-    record_ipv4=$(curl -s -X GET \
-        "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$CLOUDFLARE_DOMAIN&type=A" \
-        -H "Authorization: Bearer $CLOUDFLARE_TOKEN" \
-        -H "Content-Type: application/json" || error_exit "Failed to fetch current DNS record")
-
-    log "ipv4 record $record_ipv4"
-
-    old_ipv4=$(echo "$record_ipv4" | jq --raw-output '.result[0] | .content' || error_exit "Failed to parse current DNS record")
-
-    log "Fetched old IP $old_ipv4 from record"
-
-    # Compare IPs and Update if Different
-    if [[ "$current_ipv4" == "$old_ipv4" ]]; then
-        log "IP Address '$current_ipv4' has not changed $old_ipv4"
-        exit 0
-    fi
-
-    record_ipv4_identifier="$(echo "$record_ipv4" | jq --raw-output '.result[0] | .id' || error_exit "Failed to parse DNS record identifier")"
-
-    log "Fetched ipv4 identifier $record_ipv4_identifier"
-
-    # Update DNS Record
-    update_ipv4=$(curl -s -X PUT \
-        "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$record_ipv4_identifier" \
-        -H "Authorization: Bearer $CLOUDFLARE_TOKEN" \
-        -H "Content-Type: application/json" \
-        --data "{\"id\":\"$zone_id\",\"type\":\"A\",\"proxied\":false,\"name\":\"$CLOUDFLARE_DOMAIN\",\"content\":\"$current_ipv4\"}" || error_exit "Failed to update DNS record")
-
-    if [[ "$(echo "$update_ipv4" | jq --raw-output '.success')" == "true" ]]; then
-        log "Success - IP Address '$current_ipv4' has been updated"
-        pushover_result=$(curl -s \
-            --form-string "token=$PUSHOVER_TOKEN" \
-            --form-string "user=$PUSHOVER_USER_KEY" \
-            --form-string "message=IP Address for $CLOUDFLARE_DOMAIN has been updated to $current_ipv4" \
-            --form-string "title=IP Address Updated - $CLOUDFLARE_DOMAIN" \
-            https://api.pushover.net/1/messages.json)
-    else
-        pushover_result=$(curl -s \
-            --form-string "token=$PUSHOVER_TOKEN" \
-            --form-string "user=$PUSHOVER_USER_KEY" \
-            --form-string "message=Failed to update IP address for  $CLOUDFLARE_DOMAIN - Error Response: $update_ipv4" \
-            --form-string "title=IP Address Failed - $CLOUDFLARE_DOMAIN" \
-            https://api.pushover.net/1/messages.json)
-        error_exit "Updating IP Address '$current_ipv4' has failed"
-    fi
-
+  cloudflare-ddns.sh: "#!/usr/bin/env bash\n\n# Robust Bash Scripting\nset -o nounset\n\
+    set -o errexit\nset -o pipefail\n\n# Function to log messages\nlog() {\n    echo\
+    \ \"$(date -u) - $1\"\n}\n\n# Function to exit in case of an error\nerror_exit()\
+    \ {\n    log \"Error: $1\"\n    exit 1\n}\n\n# Function to retry a command with\
+    \ backoff\nretry() {\n    local max_attempts=3\n    local delay=2\n    local attempt=1\n\
+    \    local cmd=\"$1\"\n    shift\n    \n    while [ $attempt -le $max_attempts\
+    \ ]; do\n        log \"Attempt $attempt/$max_attempts: $cmd\"\n        if \"$cmd\"\
+    \ \"$@\"; then\n            return 0\n        fi\n        if [ $attempt -lt $max_attempts\
+    \ ]; then\n            log \"Command failed, retrying in s...\"\n            sleep\
+    \ $delay\n            delay=$((delay * 2))\n        fi\n        attempt=$((attempt\
+    \ + 1))\n    done\n    return 1\n}\n\n# Fetch Current External IP\ncurrent_ipv4=\"\
+    $(retry curl -s https://ipv4.icanhazip.com/)\" || error_exit \"Failed to fetch\
+    \ current IPv4 address after multiple attempts\"\n\nlog \"Fetched current IP Address:\
+    \ $current_ipv4\"\n\n# Fetch Cloudflare Zone ID\nzone_id=$(retry curl -s -X GET\
+    \ \\\n    \"https://api.cloudflare.com/client/v4/zones?name=$CLOUDFLARE_DOMAIN&status=active\"\
+    \ \\\n    -H \"Authorization: Bearer $CLOUDFLARE_TOKEN\" \\\n    -H \"Content-Type:\
+    \ application/json\" \\\n    | jq --raw-output \".result[0] | .id\") || error_exit\
+    \ \"Failed to fetch Cloudflare Zone ID after multiple attempts\"\n\n\nlog \"Fetched\
+    \ zone id: $zone_id\"\n\n# Fetch Current DNS Record\nrecord_ipv4=$(retry curl\
+    \ -s -X GET \\\n    \"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$CLOUDFLARE_DOMAIN&type=A\"\
+    \ \\\n    -H \"Authorization: Bearer $CLOUDFLARE_TOKEN\" \\\n    -H \"Content-Type:\
+    \ application/json\") || error_exit \"Failed to fetch current DNS record after\
+    \ multiple attempts\"\n\nlog \"ipv4 record $record_ipv4\"\n\nold_ipv4=$(echo \"\
+    $record_ipv4\" | jq --raw-output '.result[0] | .content' || error_exit \"Failed\
+    \ to parse current DNS record\")\n\nlog \"Fetched old IP $old_ipv4 from record\"\
+    \n\n# Compare IPs and Update if Different\nif [[ \"$current_ipv4\" == \"$old_ipv4\"\
+    \ ]]; then\n    log \"IP Address '$current_ipv4' has not changed $old_ipv4\"\n\
+    \    exit 0\nfi\n\nrecord_ipv4_identifier=\"$(echo \"$record_ipv4\" | jq --raw-output\
+    \ '.result[0] | .id' || error_exit \"Failed to parse DNS record identifier\")\"\
+    \n\nlog \"Fetched ipv4 identifier $record_ipv4_identifier\"\n\n# Update DNS Record\n\
+    update_ipv4=$(retry curl -s -X PUT \\\n    \"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$record_ipv4_identifier\"\
+    \ \\\n    -H \"Authorization: Bearer $CLOUDFLARE_TOKEN\" \\\n    -H \"Content-Type:\
+    \ application/json\" \\\n    --data \"{\\\"id\\\":\\\"$zone_id\\\",\\\"type\\\"\
+    :\\\"A\\\",\\\"proxied\\\":false,\\\"name\\\":\\\"$CLOUDFLARE_DOMAIN\\\",\\\"\
+    content\\\":\\\"$current_ipv4\\\"}\") || error_exit \"Failed to update DNS record\
+    \ after multiple attempts\"\n\nif [[ \"$(echo \"$update_ipv4\" | jq --raw-output\
+    \ '.success')\" == \"true\" ]]; then\n    log \"Success - IP Address '$current_ipv4'\
+    \ has been updated\"\n    pushover_result=$(curl -s \\\n        --form-string\
+    \ \"token=$PUSHOVER_TOKEN\" \\\n        --form-string \"user=$PUSHOVER_USER_KEY\"\
+    \ \\\n        --form-string \"message=IP Address for $CLOUDFLARE_DOMAIN has been\
+    \ updated to $current_ipv4\" \\\n        --form-string \"title=IP Address Updated\
+    \ - $CLOUDFLARE_DOMAIN\" \\\n        https://api.pushover.net/1/messages.json)\n\
+    else\n    pushover_result=$(curl -s \\\n        --form-string \"token=$PUSHOVER_TOKEN\"\
+    \ \\\n        --form-string \"user=$PUSHOVER_USER_KEY\" \\\n        --form-string\
+    \ \"message=Failed to update IP address for  $CLOUDFLARE_DOMAIN - Error Response:\
+    \ $update_ipv4\" \\\n        --form-string \"title=IP Address Failed - $CLOUDFLARE_DOMAIN\"\
+    \ \\\n        https://api.pushover.net/1/messages.json)\n    error_exit \"Updating\
+    \ IP Address '$current_ipv4' has failed\"\nfi\n\n"
 kind: ConfigMap
 metadata:
   labels:
     kustomize.toolkit.fluxcd.io/name: cluster-networking-cloudflareddns
     kustomize.toolkit.fluxcd.io/namespace: flux-system
   name: cloudflare-ddns-configmap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants