-
Notifications
You must be signed in to change notification settings - Fork 247
feat: enable localdns hosts plugin to cache critical AKS FQDNs #7639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3bf2f16
c3519a7
cdafe40
6e84c80
6f6b9ae
389d6a6
eb8f0dc
5d2601f
310025d
4b0ab1c
4eceb3d
aa9903d
094aa2e
ddaf3cd
4edf5d6
df80d7e
b83a004
30673b7
5657d33
2f0398f
006eb78
473ad7a
339332a
30adc57
555f754
a23a560
eb6a0cc
965d35d
a6011a2
493993b
51b1594
4a538c8
e25e91e
fb4029d
0bcc1dd
e586737
740014f
848a672
aeebf07
7960676
012f876
9cf8457
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1384,6 +1384,172 @@ func ValidateLocalDNSResolution(ctx context.Context, s *Scenario, server string) | |||||||||||||||||||||||||||||||||||||||||||||||||||
| assert.Contains(s.T, execResult.stdout, fmt.Sprintf("SERVER: %s", server)) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| // ValidateLocalDNSHostsFile checks that /etc/localdns/hosts contains entries for critical FQDNs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| // It dynamically resolves IPs on the VM and verifies they match what's in the hosts file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| // This avoids hardcoding IPs that can change over time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| func ValidateLocalDNSHostsFile(ctx context.Context, s *Scenario, fqdns []string) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| s.T.Helper() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Build script that resolves each FQDN and checks it exists in hosts file | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| script := fmt.Sprintf(`set -euo pipefail | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| hosts_file="/etc/localdns/hosts" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| fqdns=(%s) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "=== Validating /etc/localdns/hosts contains resolved IPs for critical FQDNs ===" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "Current hosts file contents:" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| cat "$hosts_file" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| errors=0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| for fqdn in "${fqdns[@]}"; do | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "Checking FQDN: $fqdn" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Resolve IPv4 addresses using the Azure DNS (168.63.129.16) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| ipv4_addrs=$(nslookup -type=A "$fqdn" 168.63.129.16 2>/dev/null | awk '/^Address: / && !/^Address: .*#/ {print $2}' | grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' || true) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| if [ -z "$ipv4_addrs" ]; then | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo " WARNING: Could not resolve IPv4 for $fqdn, skipping IP validation" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # At minimum, check the FQDN exists in the file | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| if ! grep -qF "$fqdn" "$hosts_file"; then | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo " ERROR: FQDN $fqdn not found in hosts file at all" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| errors=$((errors + 1)) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
saewoni marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Check each resolved IP exists in the hosts file for this FQDN | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| for ip in $ipv4_addrs; do | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| expected_entry="$ip $fqdn" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| if grep -qF "$expected_entry" "$hosts_file"; then | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo " OK: Found '$expected_entry' in hosts file" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
1422
to
1425
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| else | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo " ERROR: Expected entry '$expected_entry' not found in hosts file" | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| errors=$((errors + 1)) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| done | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
1421
to
1430
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Check each resolved IP exists in the hosts file for this FQDN | |
| for ip in $ipv4_addrs; do | |
| expected_entry="$ip $fqdn" | |
| if grep -q "$expected_entry" "$hosts_file"; then | |
| echo " OK: Found '$expected_entry' in hosts file" | |
| else | |
| echo " ERROR: Expected entry '$expected_entry' not found in hosts file" | |
| errors=$((errors + 1)) | |
| fi | |
| done | |
| # Check that at least one resolved IP exists in the hosts file for this FQDN | |
| found_match=0 | |
| for ip in $ipv4_addrs; do | |
| expected_entry="$ip $fqdn" | |
| if grep -q "$expected_entry" "$hosts_file"; then | |
| echo " OK: Found '$expected_entry' in hosts file" | |
| found_match=1 | |
| break | |
| fi | |
| done | |
| if [ "$found_match" -eq 0 ]; then | |
| echo " ERROR: None of the resolved IPs for $fqdn were found in hosts file" | |
| errors=$((errors + 1)) | |
| fi |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| [Unit] | ||
| Description=Populate /etc/localdns/hosts with critical AKS FQDN addresses | ||
| After=network-online.target | ||
| Wants=network-online.target | ||
| Before=kubelet.service localdns.service | ||
|
|
||
| [Service] | ||
| Type=oneshot | ||
| ExecStart=/opt/azure/containers/aks-hosts-setup.sh | ||
|
Comment on lines
+7
to
+9
|
||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target kubelet.service | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| #!/bin/bash | ||
| set -uo pipefail | ||
|
|
||
| # aks-hosts-setup.sh | ||
| # Resolves A and AAAA records for critical AKS FQDNs and populates /etc/localdns/hosts | ||
|
|
||
| HOSTS_FILE="/etc/localdns/hosts" | ||
|
|
||
| # Ensure the directory exists | ||
| mkdir -p "$(dirname "$HOSTS_FILE")" | ||
|
|
||
| # Critical AKS FQDNs that should be cached for DNS reliability | ||
| CRITICAL_FQDNS=( | ||
| "acs-mirror.azureedge.net" | ||
| "eastus.data.mcr.microsoft.com" | ||
| "login.microsoftonline.com" | ||
| "management.azure.com" | ||
| "mcr.microsoft.com" | ||
| "packages.aks.azure.com" | ||
| "packages.microsoft.com" | ||
| ) | ||
|
|
||
| # Function to resolve IPv4 addresses for a domain | ||
| # Filters output to only include valid IPv4 addresses (rejects NXDOMAIN, SERVFAIL, hostnames, etc.) | ||
| resolve_ipv4() { | ||
| local domain="$1" | ||
| local output | ||
| output=$(nslookup -type=A "${domain}" 2>/dev/null) || return 0 | ||
| # Parse Address lines (skip server address with #), validate IPv4 format (4 octets of 1-3 digits) | ||
| echo "${output}" | awk '/^Address: / && !/^Address: .*#/ {print $2}' | grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' || return 0 | ||
saewoni marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| # Function to resolve IPv6 addresses for a domain | ||
| # Filters output to only include valid IPv6 addresses (rejects NXDOMAIN, SERVFAIL, hostnames, etc.) | ||
| resolve_ipv6() { | ||
| local domain="$1" | ||
| local output | ||
| output=$(nslookup -type=AAAA "${domain}" 2>/dev/null) || return 0 | ||
| # Parse Address lines (skip server address with #), validate IPv6 format (must contain : and only hex/colons, min 3 chars) | ||
| echo "${output}" | awk '/^Address: / && !/^Address: .*#/ {print $2}' | grep -E '^[0-9a-fA-F:]{3,}$' | grep ':' || return 0 | ||
| } | ||
|
|
||
| echo "Starting AKS critical FQDN hosts resolution at $(date)" | ||
|
|
||
| # Track if we resolved at least one address | ||
| RESOLVED_ANY=false | ||
|
|
||
| # Start building the hosts file content | ||
| HOSTS_CONTENT="# AKS critical FQDN addresses resolved at $(date) | ||
| # This file is automatically generated by aks-hosts-setup.service | ||
| " | ||
|
|
||
| # Resolve each FQDN | ||
| for DOMAIN in "${CRITICAL_FQDNS[@]}"; do | ||
| echo "Resolving addresses for ${DOMAIN}..." | ||
|
|
||
| # Get IPv4 and IPv6 addresses using helper functions | ||
| IPV4_ADDRS=$(resolve_ipv4 "${DOMAIN}") | ||
| IPV6_ADDRS=$(resolve_ipv6 "${DOMAIN}") | ||
|
|
||
| # Check if we got any results for this domain | ||
| if [ -z "${IPV4_ADDRS}" ] && [ -z "${IPV6_ADDRS}" ]; then | ||
| echo " WARNING: No IP addresses resolved for ${DOMAIN}" | ||
| continue | ||
| fi | ||
|
|
||
| RESOLVED_ANY=true | ||
| HOSTS_CONTENT+=" | ||
| # ${DOMAIN}" | ||
|
|
||
| if [ -n "${IPV4_ADDRS}" ]; then | ||
| for addr in ${IPV4_ADDRS}; do | ||
| HOSTS_CONTENT+=" | ||
| ${addr} ${DOMAIN}" | ||
| done | ||
| fi | ||
|
|
||
| if [ -n "${IPV6_ADDRS}" ]; then | ||
| for addr in ${IPV6_ADDRS}; do | ||
| HOSTS_CONTENT+=" | ||
| ${addr} ${DOMAIN}" | ||
| done | ||
| fi | ||
| done | ||
|
|
||
| # Check if we resolved at least one domain | ||
| if [ "${RESOLVED_ANY}" != "true" ]; then | ||
| echo "WARNING: No IP addresses resolved for any domain at $(date)" | ||
| echo "This is likely a temporary DNS issue. Timer will retry later." | ||
| # Keep existing hosts file intact and exit successfully so systemd doesn't mark unit as failed | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Write the hosts file | ||
| echo "Writing addresses to ${HOSTS_FILE}..." | ||
| echo "${HOSTS_CONTENT}" > "${HOSTS_FILE}" | ||
|
Comment on lines
+94
to
+96
|
||
|
|
||
| echo "AKS critical FQDN hosts resolution completed at $(date)" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| [Unit] | ||
| Description=Run AKS hosts setup periodically | ||
| Before=localdns.service | ||
|
|
||
| [Timer] | ||
| # Run immediately on boot | ||
| OnBootSec=0 | ||
| # Run 15 minutes after the last activation (AKS critical FQDN IPs don't change frequently) | ||
| OnUnitActiveSec=15min | ||
| # Timer accuracy (how much systemd can delay) | ||
| AccuracySec=1s | ||
| # Add randomization to avoid thundering herd if multiple nodes boot simultaneously | ||
| RandomizedDelaySec=60s | ||
|
|
||
| [Install] | ||
| WantedBy=timers.target |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1194,6 +1194,23 @@ enableLocalDNS() { | |
| echo "Enable localdns succeeded." | ||
| } | ||
|
|
||
| # This function enables and starts the aks-hosts-setup timer. | ||
| # The timer periodically resolves critical AKS FQDN DNS records and populates /etc/localdns/hosts. | ||
| enableAKSHostsSetup() { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do not make this fail. just log the error and make an empty host file. |
||
| local hosts_file="/etc/localdns/hosts" | ||
|
|
||
| # Run the script once immediately to resolve live DNS before kubelet starts | ||
| echo "Running initial aks-hosts-setup to resolve DNS..." | ||
| mkdir -p "$(dirname "${hosts_file}")" | ||
| /opt/azure/containers/aks-hosts-setup.sh || echo "Warning: Initial hosts setup failed" | ||
|
|
||
| # Enable the timer for periodic refresh (every 15 minutes) | ||
| # This will update the hosts file with fresh IPs from live DNS | ||
| echo "Enabling aks-hosts-setup timer..." | ||
| systemctlEnableAndStart aks-hosts-setup.timer 30 || exit $ERR_SYSTEMCTL_START_FAIL | ||
| echo "aks-hosts-setup timer enabled successfully." | ||
|
Comment on lines
+1207
to
+1211
|
||
| } | ||
|
|
||
| configureManagedGPUExperience() { | ||
| if [ "${GPU_NODE}" != "true" ] || [ "${skip_nvidia_driver_install}" = "true" ]; then | ||
| return | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -297,6 +297,11 @@ EOF | |||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| # This is to enable localdns using scriptless. | ||||||||||||||||||||||||||||||||||||||||
| if [ "${SHOULD_ENABLE_LOCALDNS}" = "true" ]; then | ||||||||||||||||||||||||||||||||||||||||
| # Write hosts file BEFORE starting LocalDNS so it has entries to serve | ||||||||||||||||||||||||||||||||||||||||
| # Enable aks-hosts-setup timer to periodically resolve and cache critical AKS FQDN DNS addresses | ||||||||||||||||||||||||||||||||||||||||
| logs_to_events "AKS.CSE.enableAKSHostsSetup" enableAKSHostsSetup || exit $ERR_SYSTEMCTL_START_FAIL | ||||||||||||||||||||||||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should always enable host systemd unit, so the host file is always available. and mount the host file in corefile if enableHostplugin == true is passed in. |
||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| # Start LocalDNS after hosts file is populated | ||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+302
to
+304
|
||||||||||||||||||||||||||||||||||||||||
| logs_to_events "AKS.CSE.enableAKSHostsSetup" enableAKSHostsSetup || exit $ERR_SYSTEMCTL_START_FAIL | |
| # Start LocalDNS after hosts file is populated | |
| aks_hosts_setup_supported="false" | |
| if command -v systemctl >/dev/null 2>&1; then | |
| if systemctl list-unit-files 2>/dev/null | grep -q '^aks-hosts-setup.service'; then | |
| if systemctl list-unit-files 2>/dev/null | grep -q '^aks-hosts-setup.timer'; then | |
| aks_hosts_setup_supported="true" | |
| fi | |
| fi | |
| fi | |
| if [ "${aks_hosts_setup_supported}" = "true" ]; then | |
| logs_to_events "AKS.CSE.enableAKSHostsSetup" enableAKSHostsSetup || exit $ERR_SYSTEMCTL_START_FAIL | |
| else | |
| echo "aks-hosts-setup systemd units not found or systemctl unavailable; skipping AKS hosts setup" | |
| fi | |
| # Start LocalDNS after hosts file is populated (or skipped gracefully) |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -1888,6 +1888,12 @@ health-check.localdns.local:53 { | |||||
| {{- end }} | ||||||
| bind {{$.NodeListenerIP}} | ||||||
| {{- if $isRootDomain}} | ||||||
| # Check /etc/localdns/hosts first for critical AKS FQDNs (mcr.microsoft.com, packages.aks.azure.com, etc.) | ||||||
| hosts /etc/localdns/hosts { | ||||||
| fallthrough | ||||||
|
Comment on lines
+1891
to
+1893
|
||||||
| } | ||||||
| {{- end}} | ||||||
| {{- if $isRootDomain}} | ||||||
|
Comment on lines
+1895
to
+1896
|
||||||
| {{- end}} | |
| {{- if $isRootDomain}} |
Copilot
AI
Feb 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enables the CoreDNS hosts plugin for the root domain unconditionally, but the PR also introduces LocalDNSProfile.EnableHostsPlugin and a separate service that populates /etc/localdns/hosts. As written, the plugin will be enabled even when the hosts file/population service isn’t present or the feature is meant to be disabled, which risks localdns startup/runtime errors and makes the new EnableHostsPlugin flag ineffective. Suggest gating this block on EnableHostsPlugin (and/or ensuring an empty hosts file is always created before localdns starts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proto source file marks field 6 (critical_hosts_entries) as reserved, indicating it was removed. However, the generated protobuf code (localdns_config.pb.go) still contains the CriticalHostsEntries field and related methods. This mismatch indicates the protobuf code was not regenerated after modifying the .proto file.
You need to run
make generateor the proto code generation command to regenerate the .pb.go files from the updated .proto files.