Skip to content

Conversation

@harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Sep 17, 2025

Description

This PR fixes the scaleKubernetesCluster API regression caused in main by #11598

Issue is that scaleKubernetesCluster API is not scaling the nodes on running k8s cluster. This is also causing smoke test failures

image

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

  1. Created k8s cluster with size 1
  2. Scaled the cluster to size 2, it scaled. Previously issue was here, where it skips the scaling.
  3. Stopped the cluster, scaled up and down the offerings for the nodes. All worked fine and no issues or errors observed in the logs
  4. Started the cluster, it worked fine.

How did you try to break this feature and the system with this change?

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 17.39%. Comparing base (96ccd7e) to head (b7bb702).
⚠️ Report is 25 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11652      +/-   ##
============================================
- Coverage     17.39%   17.39%   -0.01%     
- Complexity    15283    15284       +1     
============================================
  Files          5889     5889              
  Lines        526141   526184      +43     
  Branches      64234    64242       +8     
============================================
+ Hits          91542    91544       +2     
- Misses       424265   424296      +31     
- Partials      10334    10344      +10     
Flag Coverage Δ
uitests 3.62% <ø> (+<0.01%) ⬆️
unittests 18.44% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15053

@weizhouapache
Copy link
Member

Is this an issue with main branch only ? @harikrishna-patnala

The smoke tests result of the PR #11598 looks good.

Can you test scaling cks cluster with the same offering ?

@harikrishna-patnala
Copy link
Contributor Author

harikrishna-patnala commented Sep 17, 2025

Is this an issue with main branch only ? @harikrishna-patnala

The smoke tests result of the PR #11598 looks good.

Can you test scaling cks cluster with the same offering ?

main and 4.21. I think this happened as part of forward merging 4.20 to main. 4.20 does not have this loop, so return statement is working there.

@weizhouapache
Copy link
Member

Is this an issue with main branch only ? @harikrishna-patnala

The smoke tests result of the PR #11598 looks good.

Can you test scaling cks cluster with the same offering ?

main and 4.21. I think this happened as part of forward merging 4.20 to main. 4.20 does not have this loop, so return statement is working there.

Thanks @harikrishna-patnala
makes sense

Maybe we can remove the lines and add state transitions in case nothing happens if offering is same as before.

Stopped -> OperationSucceeded -> Stopped

and for Running state too

@bernardodemarco
Copy link
Member

@harikrishna-patnala, thanks for raising the PR.

main and 4.21. I think this happened as part of forward merging 4.20 to main. 4.20 does not have this loop, so return statement is working there.

Yes, exactly. #11598 worked fine in the 4.20 branch. However, the scale Kubernetes workflow now has an iteration for each possible node type. Thus, we should have been more cautious when porting it to the main branch.

Copy link
Member

@bernardodemarco bernardodemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @weizhouapache mentioned (see #11652 (comment)), it'll be required to add the following transitions to avoid the throwing of exceptions when scaling Stopped and Running clusters to the same attributes they already have:

  • From the Stopped state, when it is received an OperationSucceeded event, transit to the Stopped state
  • From the Running state, when it is received an OperationSucceeded event, transit to the Running state

@harikrishna-patnala
Copy link
Contributor Author

thanks for suggestions @weizhouapache and @bernardodemarco . I've added them and tested the scenario of scaling stopped and running k8s environment with same offering and observed no errors.

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 15094

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

thanks @harikrishna-patnala for the fix

Copy link
Contributor

@Pearl1594 Pearl1594 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

Copy link
Member

@bernardodemarco bernardodemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm, thanks @harikrishna-patnala

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15106

@weizhouapache
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-14399)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 51264 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11652-t14399-kvm-ol8.zip
Smoke tests completed. 147 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@harikrishna-patnala harikrishna-patnala merged commit 3ef2556 into apache:main Sep 20, 2025
28 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.22.0 Sep 20, 2025
@harikrishna-patnala harikrishna-patnala deleted the fixScaleK8s branch September 20, 2025 12:27
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Oct 17, 2025
* Fix scaleKubernetesCluster

* Added more state transitions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants