Skip to content

Commit c3d11e3

Browse files
committed
Revert "fix(localdns): wait for resolv.conf update after networkctl reload to prevent race condition (#7749)"
This reverts commit bcdecfa.
1 parent 776a7d2 commit c3d11e3

File tree

83 files changed

+399
-1365
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+399
-1365
lines changed

.github/copilot-instructions.md

Lines changed: 0 additions & 283 deletions
This file was deleted.

.github/copilot-instructions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../AGENTS.md

AGENTS.md

Lines changed: 2 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -136,90 +136,6 @@ Analyze PRs for these compatibility scenarios:
136136
- Package manager assumptions (apt vs dnf/tdnf)
137137
- Systemd differences between distributions
138138

139-
**5. Package/Dependency Update PRs (Renovate)**
140-
- **Context**: Renovate bot automatically creates PRs to update component versions in `parts/common/components.json`. These components are cached on VHDs during build and directly affect node stability, GPU workloads, networking, and security. Updated packages are downloaded from `packages.aks.azure.com` or upstream registries during VHD build.
141-
- **What to check**: Every version bump—even patch versions—can introduce regressions that affect production nodes.
142-
- **Analysis steps for every package update PR**:
143-
1. **Identify the component and version change**: Parse the diff in `parts/common/components.json` to extract exact old → new versions for each OS/release entry.
144-
2. **Determine the update type**: Classify as major, minor, or patch using semver. Major and minor updates carry higher risk than patch updates.
145-
3. **Research upstream changelog**: Look up the project's release notes, changelog, or GitHub releases to understand what changed between the old and new versions. Summarize:
146-
- New features introduced
147-
- Bug fixes included
148-
- Breaking changes or deprecations
149-
- Security fixes (CVEs patched)
150-
4. **Assess OS coverage**: Check if the update covers all OS variants where the component is used (Ubuntu 22.04, 24.04, Azure Linux 3.0, etc.). Flag if some OS entries are updated but others are not — partial updates can cause inconsistency across node pools.
151-
5. **Evaluate VHD size impact**: For components downloaded as binaries or packages, consider whether the new version significantly increases VHD size. Large size increases can affect VHD build time and storage costs.
152-
6. **Check for configuration or API changes**: If the component exposes configuration files, CLI flags, systemd units, or APIs consumed by CSE scripts, verify that the update doesn't change defaults or remove options that provisioning scripts depend on.
153-
7. **Verify download URL validity**: Confirm that the `downloadLocation` and `downloadURIs` structure in components.json remains valid for the new version. New versions sometimes change the artifact naming convention or repository layout.
154-
155-
- **Risk assessment for package updates**:
156-
- 🔴 **High Risk**: Major version bumps, components critical to node boot (kubelet, containerd, runc), GPU drivers (nvidia-driver, dcgm-exporter), or networking (azure-cni, cilium). Also high risk if upstream changelog mentions breaking changes or behavioral changes.
157-
- 🟡 **Medium Risk**: Minor version bumps of non-critical components, updates that only affect specific OS variants, or updates where upstream changelog shows feature additions that could subtly change behavior.
158-
- 🟢 **Low Risk**: Patch version bumps with only bug fixes or security patches, no breaking changes in upstream changelog, and full OS coverage.
159-
160-
- **Review output for package update PRs must include a detailed version diff analysis**:
161-
162-
**Header:**
163-
```
164-
## Package Update Analysis: <component-name>
165-
**Version change**: X.Y.Z → A.B.C (<major|minor|patch> update)
166-
**OS variants affected**: Ubuntu 22.04, Ubuntu 24.04, Azure Linux 3.0 (list all)
167-
**OS variants NOT updated**: <list any missing, or "None — full coverage">
168-
```
169-
170-
**Detailed changelog between versions:**
171-
Use web search, GitHub releases, or upstream project documentation to find the exact differences between the old and new version. Present each change as a line item with its own risk tag:
172-
173-
```
174-
### Changes between X.Y.Z and A.B.C
175-
176-
| Change | Description | Risk |
177-
|--------|-------------|------|
178-
| Feature | <brief description of new feature> | 🟢 Low / 🟡 Medium / 🔴 High |
179-
| Bug fix | <brief description of bug fixed> | 🟢 Low / 🟡 Medium / 🔴 High |
180-
| Breaking | <description of breaking change> | 🔴 High |
181-
| Security | CVE-YYYY-XXXXX: <description> | 🟢 Low / 🟡 Medium / 🔴 High |
182-
| Deprecation | <what was deprecated and migration path> | 🟡 Medium / 🔴 High |
183-
| Config change | <default value changed or option removed> | 🟡 Medium / 🔴 High |
184-
| Performance | <perf improvement or regression> | 🟢 Low / 🟡 Medium |
185-
```
186-
187-
For each individual change, assess risk by considering:
188-
- Does it alter runtime behavior on AKS nodes?
189-
- Does it change CLI flags, config file formats, or systemd unit behavior that CSE scripts depend on?
190-
- Does it affect GPU workloads, networking, container runtime, or kubelet interaction?
191-
- Could it increase binary size significantly (VHD bloat)?
192-
- Does it introduce new system dependencies or kernel requirements?
193-
194-
**If upstream changelog is unavailable**, explicitly state: _"Upstream changelog not found for this version range. Manual testing recommended before merge."_
195-
196-
**Overall risk assessment:**
197-
```
198-
### Overall Risk: 🟢 Low / 🟡 Medium / 🔴 High
199-
**Justification**: <1-2 sentence summary of why this risk level was chosen>
200-
**Recommendation**: Approve / Request more info / Flag for manual testing
201-
```
202-
203-
**Example** (for a PR like dcgm-exporter 4.7.1 → 4.8.0):
204-
```
205-
## Package Update Analysis: dcgm-exporter
206-
**Version change**: 4.7.1 → 4.8.0 (minor update)
207-
**OS variants affected**: Ubuntu 22.04, Ubuntu 24.04
208-
**OS variants NOT updated**: Azure Linux 3.0 (still on 4.7.1-1.azl3) — flag for follow-up
209-
210-
### Changes between 4.7.1 and 4.8.0
211-
| Change | Description | Risk |
212-
|--------|-------------|------|
213-
| Feature | Added support for new DCGM field IDs for Blackwell GPUs | 🟢 Low |
214-
| Feature | New metrics endpoint configuration options | 🟡 Medium |
215-
| Bug fix | Fixed memory leak in long-running metric collection | 🟢 Low |
216-
| Deprecation | Removed legacy CSV export format | 🟡 Medium |
217-
218-
### Overall Risk: 🟡 Medium
219-
**Justification**: Minor version bump of GPU monitoring component. No breaking changes to core metrics pipeline, but Azure Linux 3.0 is not updated which creates version skew across OS variants.
220-
**Recommendation**: Approve, but file follow-up issue for Azure Linux 3.0 alignment.
221-
```
222-
223139
### Analysis Approach
224140

225141
**Dynamic Dependency Tracing**:
@@ -253,22 +169,8 @@ Provide targeted inline comments on specific lines where you detect issues:
253169
- Include actionable next steps (e.g., "Verify this function is not used by checking references in `vhdbuilder/packer/`")
254170

255171
**Risk indicators to include:**
256-
257-
- **Severity** (pick one):
258-
- 🔴 **High Risk** — Could break production VM provisioning, cause node failures, or introduce security vulnerabilities
259-
- 🟡 **Medium Risk** — Could cause issues in specific configurations, edge cases, or degrade performance
260-
- 🟢 **Low Risk** — Unlikely to cause issues but worth noting for awareness
261-
262-
- **Category** (pick one):
263-
- 🔧 **Script Logic** — Syntax errors, incorrect commands, broken control flow, wrong exit codes
264-
- 🖥️ **Cross-OS** — Incompatibility between Ubuntu, Azure Linux/Mariner, or Windows
265-
- 🌐 **External Dependency** — Unauthorized downloads, missing components.json entries, broken URLs
266-
- 🧪 **Test Coverage** — Missing or insufficient test coverage for changed behavior
267-
- 📦 **Package Update** — Component version changes, upstream regressions, VHD size impact
268-
- 🔄 **Backward Compatibility** — Breaking changes affecting VHDs in production (6-month window)
269-
- 🔒 **Security** — Credential exposure, privilege escalation, insecure defaults
270-
-**Performance** — VHD build time regression, node provisioning latency increase
271-
- 🏗️ **Architecture** — Structural changes affecting multiple components or deployment modes
172+
- Severity: 🔴 High Risk | 🟡 Medium Risk | 🟢 Low Risk
173+
- Category: Script Logic | Cross-OS | External Dependency | Test Coverage | etc.
272174

273175
**Only comment when you have substantive findings** - avoid noise on trivial or obviously safe changes.
274176

e2e/components/components.go

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,18 +43,16 @@ func GetExpectedPackageVersions(packageName, distro, release string) []string {
4343
release = strings.ReplaceAll(release, ".", "\\.")
4444

4545
for _, packageItem := range packages.Array() {
46-
// Check if versionsV2 exists. Assume the DEFAULT OS variant.
47-
versions := packageItem.Get(fmt.Sprintf("%s.DEFAULT/%s.versionsV2", distro, release)).Array()
48-
if len(versions) == 0 {
49-
versions = packageItem.Get(fmt.Sprintf("%s.%s.versionsV2", distro, release)).Array()
50-
}
51-
52-
for _, version := range versions {
53-
// get versions.latestVersion and append to expectedVersions
54-
expectedVersions = append(expectedVersions, version.Get("latestVersion").String())
55-
// get versions.previousLatestVersion (if exists) and append to expectedVersions
56-
if version.Get("previousLatestVersion").Exists() {
57-
expectedVersions = append(expectedVersions, version.Get("previousLatestVersion").String())
46+
// check if versionsV2 exists
47+
if packageItem.Get(fmt.Sprintf("%s.%s.versionsV2", distro, release)).Exists() {
48+
versions := packageItem.Get(fmt.Sprintf("%s.%s.versionsV2", distro, release))
49+
for _, version := range versions.Array() {
50+
// get versions.latestVersion and append to expectedVersions
51+
expectedVersions = append(expectedVersions, version.Get("latestVersion").String())
52+
// get versions.previousLatestVersion (if exists) and append to expectedVersions
53+
if version.Get("previousLatestVersion").Exists() {
54+
expectedVersions = append(expectedVersions, version.Get("previousLatestVersion").String())
55+
}
5856
}
5957
}
6058
}

parts/common/components.json

Lines changed: 5 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -942,16 +942,6 @@
942942
],
943943
"downloadURL": "https://github.com/oras-project/oras/releases/download/v${version}/oras_${version}_linux_${CPU_ARCH}.tar.gz"
944944
}
945-
},
946-
"azurelinux": {
947-
"OSGUARD/v3.0": {
948-
"versionsV2": [
949-
{
950-
"renovateTag": "<DO_NOT_UPDATE>",
951-
"latestVersion": "<SKIP>"
952-
}
953-
]
954-
}
955945
}
956946
}
957947
},
@@ -1039,15 +1029,15 @@
10391029
"versionsV2": [
10401030
{
10411031
"renovateTag": "name=moby-containerd, repository=production, os=ubuntu, release=22.04",
1042-
"latestVersion": "1.7.30-ubuntu22.04u2"
1032+
"latestVersion": "1.7.30-ubuntu22.04u1"
10431033
}
10441034
]
10451035
},
10461036
"r2004": {
10471037
"versionsV2": [
10481038
{
10491039
"renovateTag": "name=moby-containerd, repository=production, os=ubuntu, release=20.04",
1050-
"latestVersion": "1.7.30-ubuntu20.04u2"
1040+
"latestVersion": "1.7.30-ubuntu20.04u1"
10511041
}
10521042
]
10531043
}
@@ -1080,14 +1070,6 @@
10801070
"latestVersion": "2.0.0-16.azl3"
10811071
}
10821072
]
1083-
},
1084-
"OSGUARD/v3.0": {
1085-
"versionsV2": [
1086-
{
1087-
"renovateTag": "<DO_NOT_UPDATE>",
1088-
"latestVersion": "<SKIP>"
1089-
}
1090-
]
10911073
}
10921074
},
10931075
"azurelinuxkata": {
@@ -1100,16 +1082,6 @@
11001082
]
11011083
}
11021084
},
1103-
"flatcar": {
1104-
"current": {
1105-
"versionsV2": [
1106-
{
1107-
"renovateTag": "<DO_NOT_UPDATE>",
1108-
"latestVersion": "<SKIP>"
1109-
}
1110-
]
1111-
}
1112-
},
11131085
"windows": {
11141086
"ws2019": {
11151087
"versionsV2": [
@@ -1710,7 +1682,7 @@
17101682
}
17111683
},
17121684
"azurelinux": {
1713-
"DEFAULT/v3.0": {
1685+
"v3.0": {
17141686
"versionsV2": [
17151687
{
17161688
"renovateTag": "RPM_registry=https://developer.download.nvidia.com/compute/cuda/repos/azl3/x86_64/repodata, name=datacenter-gpu-manager-4-core, repository=nvidia, os=azurelinux, release=3.0",
@@ -1744,7 +1716,7 @@
17441716
}
17451717
},
17461718
"azurelinux": {
1747-
"DEFAULT/v3.0": {
1719+
"v3.0": {
17481720
"versionsV2": [
17491721
{
17501722
"renovateTag": "RPM_registry=https://developer.download.nvidia.com/compute/cuda/repos/azl3/x86_64/repodata, name=datacenter-gpu-manager-4-proprietary, repository=nvidia, os=azurelinux, release=3.0",
@@ -1778,7 +1750,7 @@
17781750
}
17791751
},
17801752
"azurelinux": {
1781-
"DEFAULT/v3.0": {
1753+
"v3.0": {
17821754
"versionsV2": [
17831755
{
17841756
"renovateTag": "RPM_registry=https://packages.microsoft.com/azurelinux/3.0/prod/cloud-native/x86_64/repodata, name=dcgm-exporter, os=azurelinux, release=3.0",

parts/linux/cloud-init/artifacts/README-COMPONENTS.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,6 @@ Please refer to [components.cue](../../../../schemas/components.cue) for the mos
111111
}
112112
#AzureLinuxOSDistro: {
113113
"v3.0"?: #ReleaseDownloadURI
114-
"DEFAULT/v3.0"?: #ReleaseDownloadURI
115-
"OSGUARD/v3.0"?: #ReleaseDownloadURI
116114
current?: #ReleaseDownloadURI
117115
}
118116
```
@@ -140,7 +138,6 @@ Here are the explanation of the above schema.
140138
- In `MarinerOSDistro`, we only have `current` now, which implies that single configurations will be applied to all Mariner release versions. We can distinguish them in needed. Note, we confirmed with Mariner team, Azure Linux 2.0 is reporting itself as `mariner` in the file `/etc/os-release`. So for Azure Linux 2.0 case, it will still read the package versions from `mariner` block.
141139
- In `AzureLinuxOSDistro`, `v3.0` is for Azure Linux v3.0. `current` is for otherwise but we are not using it now. Azure Linux 2.0 case is described in the `MarinerOSDistro` above.
142140
- `DefaultOSDistro` means the default case of OS Distro. If an OSDistro metadata is not defined, it will fetch it from `default`. For example, if a node is Ubuntu 20.04, but we don't specify `ubuntu` in components.json, then it will fetch `default.current`. For another example, if only `default.current` is specified in the components.json, No matter what OSDistro is the node running, it will only fetch `default.current` because it's the default metadata. This provides flexibility while elimiating unnecessary duplication when defining the metadata.
143-
1. The OS release version can optionally be prefixed with the OS variant. `OSGUARD/v3.0` and `DEFAULT/v3.0` are currently supported for `azurelinux`. The latter specifically matches Azure Linux 3.0 without a variant, i.e. not OS Guard.
144141
1. In `ReleaseDownloadURI`, you can see 2 keys.
145142
- `versionsV2`: This is updated from `versions`. You can define a list of `VersionV2` for a particular package. And in the codes, it's up to the feature developer to determine how to use the list. For example, install all versions in the list or just pick the latest one. Note that in package `containerd`, `marinerkata`, the `versionV2s` array is defined as `<SKIP>`. This is to tell the install-dependencies.sh not to install any `containerd` version for Kata SKU.
146143
- `downloadURL`: you can define a downloadURL with unresolved variables. For example, `https://acs-mirror.azureedge.net/azure-cni/v${version}/binaries/azure-vnet-cni-linux-${CPU_ARCH}-v${version}.tgz`. But the feature developer needs to make sure all variables are resolvable in the codes. In this example, `${CPU_ARCH}` is resolvable as it's defined at global scope. `${version}` is resovled based on the `versions` list above.

parts/linux/cloud-init/artifacts/azlosguard/cse_install_osguard.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ installCredentialProviderFromPMC() {
7777
os_version="${OS_VERSION}"
7878
fi
7979
PACKAGE_VERSION=""
80-
getLatestPkgVersionFromK8sVersion "$k8sVersion" "azure-acr-credential-provider-pmc" "$os" "$os_version" "${OS_VARIANT}"
80+
getLatestPkgVersionFromK8sVersion "$k8sVersion" "azure-acr-credential-provider-pmc" "$os" "$os_version"
8181
packageVersion=$(echo $PACKAGE_VERSION | cut -d "-" -f 1)
8282
echo "installing azure-acr-credential-provider package version: $packageVersion"
8383
mkdir -p "${CREDENTIAL_PROVIDER_BIN_DIR}"

parts/linux/cloud-init/artifacts/cse_config.sh

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -340,8 +340,6 @@ ExecStartPost=/sbin/iptables -P FORWARD ACCEPT
340340
EOF
341341

342342
mkdir -p /etc/containerd
343-
# Remove in case this is an existing symlink
344-
rm -f /etc/containerd/config.toml
345343
if [ "${GPU_NODE}" = "true" ]; then
346344
# Check VM tag directly to determine if GPU drivers should be skipped
347345
export -f should_skip_nvidia_drivers

0 commit comments

Comments
 (0)