fix: show errors in webhook #34
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix: Expose Soft Errors in Infoblox Webhook to Prevent PTR Lookup Issues
Issue: PTR Record Lookup Fails After Node Replacement
Description of the Problem
We encountered an issue where external-dns fails to handle PTR records correctly when nodes are replaced. The problem occurs in the following sequence:
fqdn1andIP1is added.external-dnscreates the corresponding records in Infoblox.fqdn1andIP1is removed.fqdn1, butIP2, is added.external-dnsstill seesfqdn1, but the PTR record is missing forIP2in Infoblox.If steps 3 and 4 happen between two
external-dnsruns, the process breaks at step 5. From that point on,external-dnsis unable to recover properly.Root Cause
external-dnsreports a soft error, but the Infoblox webhook silently ignores it.external-dnsfrom proceeding correctly.Proposed Solution
This pull request modifies the Infoblox webhook to expose soft errors instead of ignoring them. By making these errors visible, administrators can more easily trace issues in Infoblox and manually correct them if needed.
Impact
external-dnsencounters missing PTR records.Workaround for Rancher
In Rancher-managed clusters, nodes may be created dynamically when a resource pool lacks sufficient workers. Rancher assigns a sequential number to new nodes, which increases the likelihood of reusing the same FQDN. To mitigate this issue, consider the following workarounds:
external-dnsintervalEnsure that the
external-dnsupdate interval is shorter than the time Rancher needs to create a new worker node.Instead of reusing FQDNs, create a new resource pool and use a scale-down and add-worker strategy to ensure that each node receives a unique FQDN.