Skip to content

feat: skip cloud-init ready report and add standalone report_ready script#8056

Draft
awesomenix wants to merge 2 commits intomainfrom
nishchay/exciting-work
Draft

feat: skip cloud-init ready report and add standalone report_ready script#8056
awesomenix wants to merge 2 commits intomainfrom
nishchay/exciting-work

Conversation

@awesomenix
Copy link
Contributor

Bake experimental_skip_ready_report into VHD via cloud.cfg.d to skip
cloud-init's built-in health ready report to Azure fabric. Add a
standalone Python script (report_ready.py) that can be invoked from
CSE to report ready at the appropriate time during node provisioning.

Depends on canonical/cloud-init#6771.

I tried importing the cloud init library directly and copilot suggested this

The problems:

  1. Heavy import chain — importing cloudinit.sources.helpers.azure pulls in cloudinit.url_helper → requests, cloudinit.distros, cloudinit.subp, and dozens
    more. It works on the VM (cloud-init's deps are installed), but it's a large dependency surface.
  2. get_metadata_from_fabric() requires a distro object — it's a full cloud-init Distro class instance (needed for ISO ejection). You'd have to either
    construct one or monkey-patch it.
  3. report_success_to_host() depends on cloud-init's reporting handler registry — kvp.get_kvp_handler() looks up
    instantiated_handler_registry.registered_items["telemetry"], which is only populated during cloud-init's normal boot. Outside of cloud-init's lifecycle, this
    returns None and silently skips KVP reporting.
  4. Version coupling — your script would break if cloud-init refactors these internal APIs (they're not public API). Different VHDs may ship different
    cloud-init versions.

A middle ground: you could import just the KVP handler class directly and the low-level wireserver pieces, bypassing the high-level functions:

from cloudinit.reporting.handlers import HyperVKvpReportingHandler
handler = HyperVKvpReportingHandler()
handler.write_key("PROVISIONING_REPORT", report_string)

But the wireserver reporting (GoalState + health POST) has deep entanglement with url_helper, distro objects, and telemetry decorators that make it hard to
call standalone.

Copilot AI review requested due to automatic review settings March 10, 2026 20:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a VHD-baked mechanism to suppress cloud-init’s built-in “ready” report and replaces it with an explicit, CSE-invoked readiness/failure report to Azure wireserver.

Changes:

  • Add a standalone report_ready.py script that writes Hyper-V KVP provisioning status and POSTs health to wireserver.
  • Bake report_ready.py into multiple VHD build variants and copy it to /opt/azure/containers/.
  • Add skipCloudInitReadyReport to write cloud-init config, and invoke the reporting script from cse_start.sh.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vhdbuilder/packer/vhd-image-builder-mariner.json Adds report_ready.py to files uploaded into the build VM.
vhdbuilder/packer/vhd-image-builder-mariner-cvm.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-mariner-arm64.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-flatcar.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-flatcar-arm64.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-cvm.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-base.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-arm64-gen2.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/vhd-image-builder-acl.json Same: stage report_ready.py into build VM.
vhdbuilder/packer/packer_source.sh Copies report_ready.py into /opt/azure/containers/ with execute perms.
vhdbuilder/packer/install-dependencies.sh Invokes skipCloudInitReadyReport during VHD build.
vhdbuilder/packer/imagecustomizer/azlosguard/azlosguard.yml Ensures OSGuard imagecustomizer also places report_ready.py into /opt/azure/containers/.
parts/linux/cloud-init/artifacts/report_ready.py New standalone readiness/failure reporting implementation.
parts/linux/cloud-init/artifacts/cse_start.sh Calls report_ready.py on provisioning success/failure if present on the VHD.
parts/linux/cloud-init/artifacts/cse_install.sh Adds skipCloudInitReadyReport() helper that writes cloud-init config to skip built-in ready report.

addMarinerNvidiaRepo
updateDnfWithNvidiaPkg
overrideNetworkConfig || exit 1
skipCloudInitReadyReport || exit 1
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipCloudInitReadyReport is already invoked unconditionally earlier in this script, so calling it again in the Mariner/AzureLinux block is redundant and makes the flow harder to reason about. Consider removing this second call to keep the configuration applied in a single place.

Suggested change
skipCloudInitReadyReport || exit 1

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +106
if [ -x /opt/azure/containers/report_ready.py ]; then
if [ "$EXIT_CODE" -eq 0 ]; then
python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric"
else
python3 /opt/azure/containers/report_ready.py -v --failure --description "ExitCode: ${EXIT_CODE}, ${message_string}" || echo "WARNING: Failed to report failure to Azure fabric"
fi
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This report_ready.py invocation runs synchronously before log upload/exit, and can block provisioning for up to ~100s on wireserver timeouts/retries (GET/POST timeouts are 30s with multiple retries). If this is intended to be best-effort (as suggested by || echo "WARNING"), consider running it in the background on success and/or moving it after upload_logs (especially on failure) or passing tighter retry/timeout settings to avoid delaying provisioning and log upload.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +106
if [ -x /opt/azure/containers/report_ready.py ]; then
if [ "$EXIT_CODE" -eq 0 ]; then
python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric"
else
python3 /opt/azure/containers/report_ready.py -v --failure --description "ExitCode: ${EXIT_CODE}, ${message_string}" || echo "WARNING: Failed to report failure to Azure fabric"
fi
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change updates cloud-init/CSE scripts under parts/, which are covered by snapshot-style golden tests in pkg/agent/testdata/* (e.g., baker_test.go reads ./testdata/<folder>/CustomData). Please run make generate (or regenerate the testdata via the repo’s standard workflow) and include the updated golden files in this PR; otherwise CI is likely to fail due to mismatched expected CustomData/CSE outputs.

Copilot uses AI. Check for mistakes.
…ript

   Bake experimental_skip_ready_report into VHD via cloud.cfg.d to skip
   cloud-init's built-in health ready report to Azure fabric. Add a
   standalone Python script (report_ready.py) that can be invoked from
   CSE to report ready at the appropriate time during node provisioning.

   Depends on canonical/cloud-init#6771.

   Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 22 changed files in this pull request and generated 2 comments.

Comment on lines +295 to +304
cpAndMode $CLOUD_INIT_BASE_PKG_SRC $CLOUD_INIT_BASE_PKG_DEST 0644

CLOUD_INIT_AZURE_PKG_SRC=/home/packer/cloud-init-packages/cloud-init-azure_all.deb
CLOUD_INIT_AZURE_PKG_DEST=/opt/azure/cloud-init-packages/cloud-init-azure_all.deb
cpAndMode $CLOUD_INIT_AZURE_PKG_SRC $CLOUD_INIT_AZURE_PKG_DEST 0644

CLOUD_INIT_PKG_SRC=/home/packer/cloud-init-packages/cloud-init_all.deb
CLOUD_INIT_PKG_DEST=/opt/azure/cloud-init-packages/cloud-init_all.deb
cpAndMode $CLOUD_INIT_PKG_SRC $CLOUD_INIT_PKG_DEST 0644

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyPackerFiles() now unconditionally copies Ubuntu-only cloud-init .deb artifacts from /home/packer/cloud-init-packages into /opt/azure/cloud-init-packages. Those source files are only provisioned in vhd-image-builder-base.json, so Mariner/Flatcar/ACL packer templates will not have them and the VHD build will fail when copyPackerFiles runs. Guard these cpAndMode calls behind an Ubuntu check (and/or a file existence check) so non-Ubuntu builds don’t require these artifacts.

Suggested change
cpAndMode $CLOUD_INIT_BASE_PKG_SRC $CLOUD_INIT_BASE_PKG_DEST 0644
CLOUD_INIT_AZURE_PKG_SRC=/home/packer/cloud-init-packages/cloud-init-azure_all.deb
CLOUD_INIT_AZURE_PKG_DEST=/opt/azure/cloud-init-packages/cloud-init-azure_all.deb
cpAndMode $CLOUD_INIT_AZURE_PKG_SRC $CLOUD_INIT_AZURE_PKG_DEST 0644
CLOUD_INIT_PKG_SRC=/home/packer/cloud-init-packages/cloud-init_all.deb
CLOUD_INIT_PKG_DEST=/opt/azure/cloud-init-packages/cloud-init_all.deb
cpAndMode $CLOUD_INIT_PKG_SRC $CLOUD_INIT_PKG_DEST 0644
CLOUD_INIT_AZURE_PKG_SRC=/home/packer/cloud-init-packages/cloud-init-azure_all.deb
CLOUD_INIT_AZURE_PKG_DEST=/opt/azure/cloud-init-packages/cloud-init-azure_all.deb
CLOUD_INIT_PKG_SRC=/home/packer/cloud-init-packages/cloud-init_all.deb
CLOUD_INIT_PKG_DEST=/opt/azure/cloud-init-packages/cloud-init_all.deb
# Ubuntu-only cloud-init .deb artifacts may not be present for non-Ubuntu images.
# Guard these copies so Mariner/Flatcar/ACL builds do not fail when the source
# directory or files are missing.
if [ -d "/home/packer/cloud-init-packages" ]; then
cpAndMode $CLOUD_INIT_BASE_PKG_SRC $CLOUD_INIT_BASE_PKG_DEST 0644
cpAndMode $CLOUD_INIT_AZURE_PKG_SRC $CLOUD_INIT_AZURE_PKG_DEST 0644
cpAndMode $CLOUD_INIT_PKG_SRC $CLOUD_INIT_PKG_DEST 0644
fi

Copilot uses AI. Check for mistakes.
Comment on lines 122 to 136
"source": "vhdbuilder/packer/cloud-init-packages/cloud-init-base_all.deb",
"destination": "/home/packer/cloud-init-packages/cloud-init-base_all.deb"
},
{
"type": "file",
"source": "vhdbuilder/packer/cloud-init-packages/cloud-init-azure_all.deb",
"destination": "/home/packer/cloud-init-packages/cloud-init-azure_all.deb"
},
{
"type": "file",
"source": "vhdbuilder/packer/cloud-init-packages/cloud-init_all.deb",
"destination": "/home/packer/cloud-init-packages/cloud-init_all.deb"
},
{
"type": "file",
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds prebuilt cloud-init .deb artifacts into the repo and bakes them into the VHD. This creates a hard-to-audit dependency surface (no explicit version metadata/checksums, no provenance, and manual updates), and it increases repo/VHD build maintenance burden. Prefer downloading versioned packages during the VHD build from an approved source (and tracking the version via components.json/renovate where applicable), or at least include explicit versioning + integrity verification for these artifacts.

Suggested change
"source": "vhdbuilder/packer/cloud-init-packages/cloud-init-base_all.deb",
"destination": "/home/packer/cloud-init-packages/cloud-init-base_all.deb"
},
{
"type": "file",
"source": "vhdbuilder/packer/cloud-init-packages/cloud-init-azure_all.deb",
"destination": "/home/packer/cloud-init-packages/cloud-init-azure_all.deb"
},
{
"type": "file",
"source": "vhdbuilder/packer/cloud-init-packages/cloud-init_all.deb",
"destination": "/home/packer/cloud-init-packages/cloud-init_all.deb"
},
{
"type": "file",

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 04:16
@awesomenix awesomenix force-pushed the nishchay/exciting-work branch from e1d2a33 to bb6f742 Compare March 11, 2026 04:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.


if [ -x /opt/azure/containers/report_ready.py ]; then
if [ "$EXIT_CODE" -eq 0 ]; then
python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric"
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running report_ready.py synchronously here can add noticeable tail latency to successful provisioning (each attempt can block up to the HTTP timeout(s) plus retry delay). Consider running the success-path report in the background (similar to log upload) and/or tightening the per-request timeouts so a transient wireserver issue doesn't extend CSE completion by ~minutes.

Suggested change
python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric"
python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric" &

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +144
def write_provisioning_kvp(report: str) -> None:
"""Write a provisioning report to the KVP pool file."""
if len(report) >= KVP_AZURE_MAX_VALUE_SIZE:
report = report[:KVP_AZURE_MAX_VALUE_SIZE - 1]
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KVP_AZURE_MAX_VALUE_SIZE is a byte limit, but the current truncation uses Python string length/slicing, which can still exceed the byte limit for non-ASCII input (e.g., descriptions containing UTF-8 multibyte chars). Truncate based on the encoded UTF-8 byte length before packing the record to avoid producing oversized KVP values.

Suggested change
def write_provisioning_kvp(report: str) -> None:
"""Write a provisioning report to the KVP pool file."""
if len(report) >= KVP_AZURE_MAX_VALUE_SIZE:
report = report[:KVP_AZURE_MAX_VALUE_SIZE - 1]
def _truncate_utf8_to_max_bytes(text: str, max_bytes: int) -> str:
"""Truncate a string so its UTF-8 encoding is at most max_bytes - 1 bytes."""
if max_bytes <= 0:
return ""
limit = max_bytes - 1
encoded = text.encode("utf-8")
if len(encoded) <= limit:
return text
truncated = encoded[:limit]
# Ignore incomplete multibyte sequences at the end to keep valid UTF-8.
return truncated.decode("utf-8", errors="ignore")
def write_provisioning_kvp(report: str) -> None:
"""Write a provisioning report to the KVP pool file."""
report = _truncate_utf8_to_max_bytes(report, KVP_AZURE_MAX_VALUE_SIZE)

Copilot uses AI. Check for mistakes.
retry_delay=args.retry_delay,
)
return 0
except RuntimeError:
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main() only catches RuntimeError; other unexpected exceptions (e.g., parsing issues, subprocess errors) will emit a full traceback. Since this is invoked from provisioning, consider catching Exception at the top-level and returning a non-zero exit with a concise logged error message to keep logs readable while still signaling failure.

Suggested change
except RuntimeError:
except Exception as exc:
LOG.error("report_ready failed: %s", exc)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants