Skip to content

Preserve image plan metadata in VMSS model conversion to prevent false model drift#6129

Open
stelucz wants to merge 1 commit intokubernetes-sigs:mainfrom
stelucz:vmss-model-check
Open

Preserve image plan metadata in VMSS model conversion to prevent false model drift#6129
stelucz wants to merge 1 commit intokubernetes-sigs:mainfrom
stelucz:vmss-model-check

Conversation

@stelucz
Copy link
Contributor

@stelucz stelucz commented Mar 2, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes a VMSS model comparison issue in CAPZ where plan metadata was lost during Azure SDK → CAPZ image conversion.
As a result, CAPZ could detect a model mismatch even when AzureMachinePool spec was unchanged, and repeatedly roll MachinePool instances.

Root Cause

  • LatestModelApplied for AzureMachinePoolMachine depends on comparing:
    • desired image from MachinePoolScope.GetVMImage(...)
    • observed VM/VMSS instance image from Azure APIs.
  • For Compute/Shared Gallery images that include a marketplace-like plan, CAPZ expected plan data in desired model.
  • Before this change, converter logic SDKImageToImage did not carry Plan fields for gallery-based image references.
  • This produced structurally different image objects (desired had plan metadata, observed often did not), leading to persistent LatestModelApplied=false.

Actual State Before Change

  • No user-visible spec mutation on MachinePool/AzureMachinePool.
  • CAPZ logs repeatedly showed model-change predicates firing and ongoing reconciles/deletes:
    • MachinePoolModelHasChanged ... shouldUpdate=true
    • repeated Deleting AzureMachinePoolMachine ...
  • Node/VM churn continued despite stable target configuration.

Change Introduced

  • Added plan-aware conversion path in VMSS converter:
    • SDKImageToImageWithPlan(imageRef, isThirdParty, sdkPlan)
    • Existing SDKImageToImage(imageRef, isThirdParty) preserved as compatibility wrapper.
  • Updated VM/VMSS conversion call sites to pass both:
    • existing isThirdParty signal (sdkPlan != nil)
    • full sdkPlan object when available.
  • Added plan propagation for gallery images:
    • Compute Gallery: map Plan into ComputeGallery.Plan
    • Shared Gallery: map Plan into Publisher/Offer/SKU fields
  • Updated/extended unit tests in converter package to cover the plan-carrying behavior.

Expected Result

  • Observed image model now preserves relevant plan metadata, reducing false differences against desired model.
  • LatestModelApplied is evaluated against a more faithful observed state.
  • Prevents unnecessary VMSS instance replacement loops caused by plan metadata loss.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 2, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 2, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @stelucz. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 2, 2026
@stelucz
Copy link
Contributor Author

stelucz commented Mar 2, 2026

/assign vincepri

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.47%. Comparing base (012443d) to head (c380ed5).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6129      +/-   ##
==========================================
+ Coverage   44.41%   44.47%   +0.05%     
==========================================
  Files         280      280              
  Lines       25379    25381       +2     
==========================================
+ Hits        11273    11289      +16     
+ Misses      13293    13281      -12     
+ Partials      813      811       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stelucz stelucz force-pushed the vmss-model-check branch from bdf24b8 to c304453 Compare March 2, 2026 13:31
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 2, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from vincepri. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stelucz
Copy link
Contributor Author

stelucz commented Mar 2, 2026

added missing test

@github-project-automation github-project-automation bot moved this from Todo to Wait-On-Author in CAPZ Planning Mar 2, 2026
Signed-off-by: Lukas Stehlik <stehlik.lukas@gmail.com>
@stelucz stelucz force-pushed the vmss-model-check branch from c304453 to c380ed5 Compare March 3, 2026 13:25
@stelucz stelucz requested a review from jackfrancis March 5, 2026 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Wait-On-Author

Development

Successfully merging this pull request may close these issues.

4 participants