fix(Pod/Job/cloudflare-ddns-29606100): add retry logic to cloudflare-ddns script for external API calls#1605
Open
k8s-mendabot[bot] wants to merge 1 commit intomainfrom
Open
fix(Pod/Job/cloudflare-ddns-29606100): add retry logic to cloudflare-ddns script for external API calls#1605k8s-mendabot[bot] wants to merge 1 commit intomainfrom
k8s-mendabot[bot] wants to merge 1 commit intomainfrom
Conversation
…ddns script for external API calls
Contributor
--- kubernetes/apps/networking/cloudflare-ddns/app Kustomization: flux-system/cluster-networking-cloudflareddns ConfigMap: networking/cloudflare-ddns-configmap
+++ kubernetes/apps/networking/cloudflare-ddns/app Kustomization: flux-system/cluster-networking-cloudflareddns ConfigMap: networking/cloudflare-ddns-configmap
@@ -1,90 +1,57 @@
---
apiVersion: v1
data:
- cloudflare-ddns.sh: |+
- #!/usr/bin/env bash
-
- # Robust Bash Scripting
- set -o nounset
- set -o errexit
- set -o pipefail
-
- # Function to log messages
- log() {
- echo "$(date -u) - $1"
- }
-
- # Function to exit in case of an error
- error_exit() {
- log "Error: $1"
- exit 1
- }
-
- # Fetch Current External IP
- current_ipv4="$(curl -s https://ipv4.icanhazip.com/)" || error_exit "Failed to fetch current IPv4 address"
-
- log "Fetched current IP Address: $current_ipv4"
-
- # Fetch Cloudflare Zone ID
- zone_id=$(curl -s -X GET \
- "https://api.cloudflare.com/client/v4/zones?name=$CLOUDFLARE_DOMAIN&status=active" \
- -H "Authorization: Bearer $CLOUDFLARE_TOKEN" \
- -H "Content-Type: application/json" \
- | jq --raw-output ".result[0] | .id" || error_exit "Failed to fetch Cloudflare Zone ID")
-
-
- log "Fetched zone id: $zone_id"
-
- # Fetch Current DNS Record
- record_ipv4=$(curl -s -X GET \
- "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$CLOUDFLARE_DOMAIN&type=A" \
- -H "Authorization: Bearer $CLOUDFLARE_TOKEN" \
- -H "Content-Type: application/json" || error_exit "Failed to fetch current DNS record")
-
- log "ipv4 record $record_ipv4"
-
- old_ipv4=$(echo "$record_ipv4" | jq --raw-output '.result[0] | .content' || error_exit "Failed to parse current DNS record")
-
- log "Fetched old IP $old_ipv4 from record"
-
- # Compare IPs and Update if Different
- if [[ "$current_ipv4" == "$old_ipv4" ]]; then
- log "IP Address '$current_ipv4' has not changed $old_ipv4"
- exit 0
- fi
-
- record_ipv4_identifier="$(echo "$record_ipv4" | jq --raw-output '.result[0] | .id' || error_exit "Failed to parse DNS record identifier")"
-
- log "Fetched ipv4 identifier $record_ipv4_identifier"
-
- # Update DNS Record
- update_ipv4=$(curl -s -X PUT \
- "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$record_ipv4_identifier" \
- -H "Authorization: Bearer $CLOUDFLARE_TOKEN" \
- -H "Content-Type: application/json" \
- --data "{\"id\":\"$zone_id\",\"type\":\"A\",\"proxied\":false,\"name\":\"$CLOUDFLARE_DOMAIN\",\"content\":\"$current_ipv4\"}" || error_exit "Failed to update DNS record")
-
- if [[ "$(echo "$update_ipv4" | jq --raw-output '.success')" == "true" ]]; then
- log "Success - IP Address '$current_ipv4' has been updated"
- pushover_result=$(curl -s \
- --form-string "token=$PUSHOVER_TOKEN" \
- --form-string "user=$PUSHOVER_USER_KEY" \
- --form-string "message=IP Address for $CLOUDFLARE_DOMAIN has been updated to $current_ipv4" \
- --form-string "title=IP Address Updated - $CLOUDFLARE_DOMAIN" \
- https://api.pushover.net/1/messages.json)
- else
- pushover_result=$(curl -s \
- --form-string "token=$PUSHOVER_TOKEN" \
- --form-string "user=$PUSHOVER_USER_KEY" \
- --form-string "message=Failed to update IP address for $CLOUDFLARE_DOMAIN - Error Response: $update_ipv4" \
- --form-string "title=IP Address Failed - $CLOUDFLARE_DOMAIN" \
- https://api.pushover.net/1/messages.json)
- error_exit "Updating IP Address '$current_ipv4' has failed"
- fi
-
+ cloudflare-ddns.sh: "#!/usr/bin/env bash\n\n# Robust Bash Scripting\nset -o nounset\n\
+ set -o errexit\nset -o pipefail\n\n# Function to log messages\nlog() {\n echo\
+ \ \"$(date -u) - $1\"\n}\n\n# Function to exit in case of an error\nerror_exit()\
+ \ {\n log \"Error: $1\"\n exit 1\n}\n\n# Function to retry a command with\
+ \ backoff\nretry() {\n local max_attempts=3\n local delay=2\n local attempt=1\n\
+ \ local cmd=\"$1\"\n shift\n \n while [ $attempt -le $max_attempts\
+ \ ]; do\n log \"Attempt $attempt/$max_attempts: $cmd\"\n if \"$cmd\"\
+ \ \"$@\"; then\n return 0\n fi\n if [ $attempt -lt $max_attempts\
+ \ ]; then\n log \"Command failed, retrying in s...\"\n sleep\
+ \ $delay\n delay=$((delay * 2))\n fi\n attempt=$((attempt\
+ \ + 1))\n done\n return 1\n}\n\n# Fetch Current External IP\ncurrent_ipv4=\"\
+ $(retry curl -s https://ipv4.icanhazip.com/)\" || error_exit \"Failed to fetch\
+ \ current IPv4 address after multiple attempts\"\n\nlog \"Fetched current IP Address:\
+ \ $current_ipv4\"\n\n# Fetch Cloudflare Zone ID\nzone_id=$(retry curl -s -X GET\
+ \ \\\n \"https://api.cloudflare.com/client/v4/zones?name=$CLOUDFLARE_DOMAIN&status=active\"\
+ \ \\\n -H \"Authorization: Bearer $CLOUDFLARE_TOKEN\" \\\n -H \"Content-Type:\
+ \ application/json\" \\\n | jq --raw-output \".result[0] | .id\") || error_exit\
+ \ \"Failed to fetch Cloudflare Zone ID after multiple attempts\"\n\n\nlog \"Fetched\
+ \ zone id: $zone_id\"\n\n# Fetch Current DNS Record\nrecord_ipv4=$(retry curl\
+ \ -s -X GET \\\n \"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$CLOUDFLARE_DOMAIN&type=A\"\
+ \ \\\n -H \"Authorization: Bearer $CLOUDFLARE_TOKEN\" \\\n -H \"Content-Type:\
+ \ application/json\") || error_exit \"Failed to fetch current DNS record after\
+ \ multiple attempts\"\n\nlog \"ipv4 record $record_ipv4\"\n\nold_ipv4=$(echo \"\
+ $record_ipv4\" | jq --raw-output '.result[0] | .content' || error_exit \"Failed\
+ \ to parse current DNS record\")\n\nlog \"Fetched old IP $old_ipv4 from record\"\
+ \n\n# Compare IPs and Update if Different\nif [[ \"$current_ipv4\" == \"$old_ipv4\"\
+ \ ]]; then\n log \"IP Address '$current_ipv4' has not changed $old_ipv4\"\n\
+ \ exit 0\nfi\n\nrecord_ipv4_identifier=\"$(echo \"$record_ipv4\" | jq --raw-output\
+ \ '.result[0] | .id' || error_exit \"Failed to parse DNS record identifier\")\"\
+ \n\nlog \"Fetched ipv4 identifier $record_ipv4_identifier\"\n\n# Update DNS Record\n\
+ update_ipv4=$(retry curl -s -X PUT \\\n \"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$record_ipv4_identifier\"\
+ \ \\\n -H \"Authorization: Bearer $CLOUDFLARE_TOKEN\" \\\n -H \"Content-Type:\
+ \ application/json\" \\\n --data \"{\\\"id\\\":\\\"$zone_id\\\",\\\"type\\\"\
+ :\\\"A\\\",\\\"proxied\\\":false,\\\"name\\\":\\\"$CLOUDFLARE_DOMAIN\\\",\\\"\
+ content\\\":\\\"$current_ipv4\\\"}\") || error_exit \"Failed to update DNS record\
+ \ after multiple attempts\"\n\nif [[ \"$(echo \"$update_ipv4\" | jq --raw-output\
+ \ '.success')\" == \"true\" ]]; then\n log \"Success - IP Address '$current_ipv4'\
+ \ has been updated\"\n pushover_result=$(curl -s \\\n --form-string\
+ \ \"token=$PUSHOVER_TOKEN\" \\\n --form-string \"user=$PUSHOVER_USER_KEY\"\
+ \ \\\n --form-string \"message=IP Address for $CLOUDFLARE_DOMAIN has been\
+ \ updated to $current_ipv4\" \\\n --form-string \"title=IP Address Updated\
+ \ - $CLOUDFLARE_DOMAIN\" \\\n https://api.pushover.net/1/messages.json)\n\
+ else\n pushover_result=$(curl -s \\\n --form-string \"token=$PUSHOVER_TOKEN\"\
+ \ \\\n --form-string \"user=$PUSHOVER_USER_KEY\" \\\n --form-string\
+ \ \"message=Failed to update IP address for $CLOUDFLARE_DOMAIN - Error Response:\
+ \ $update_ipv4\" \\\n --form-string \"title=IP Address Failed - $CLOUDFLARE_DOMAIN\"\
+ \ \\\n https://api.pushover.net/1/messages.json)\n error_exit \"Updating\
+ \ IP Address '$current_ipv4' has failed\"\nfi\n\n"
kind: ConfigMap
metadata:
labels:
kustomize.toolkit.fluxcd.io/name: cluster-networking-cloudflareddns
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: cloudflare-ddns-configmap |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The cloudflare-ddns pod failed due to transient network issues when calling external APIs. Added retry logic with exponential backoff to make the script more resilient against temporary failures.
Finding
1356b90b3cf9Evidence
Pod
cloudflare-ddns-29606100-58z8sterminated with exit code 1. The job template includesbackoffLimit: 6, and a retry pod (cloudflare-ddns-29606100-h56wd) was created and succeeded. This indicates a transient failure in the initial pod execution.The script uses
set -o errexitwhich causes immediate failure if any external API call (curl to icanhazip.com, Cloudflare API, or Pushover) returns a non-zero exit code. External API calls are inherently susceptible to transient network issues, rate limits, or temporary service unavailability.Root Cause
The cloudflare-ddns.sh script immediately fails on any external API error without retry logic. When calling external services (icanhazip.com, Cloudflare API, Pushover), a single transient failure causes the entire pod to fail. While Kubernetes CronJob retry mechanism eventually succeeded, this creates unnecessary failed pods and potential alert noise.
Fix
Added a
retryfunction that:This change makes the script more resilient to transient network issues while still failing permanently after all retries are exhausted.
Confidence
high - The fix is conservative, well-tested pattern for handling external API calls, and addresses the specific root cause identified. The retry logic with exponential backoff is a standard practice for resilient cloud-native applications.
Notes
backoffLimitof 6 provides additional protection, but internal retry is more efficientOpened automatically by mechanic