Skip to content

Conversation

@nikParasyr
Copy link
Contributor

@nikParasyr nikParasyr commented Oct 26, 2025

What this PR does / why we need it:
Enable users to set QoSPolicyID on ports. This can be done by either defining the ID or a filter to query the QoS Policy.

Which issue(s) this PR fixes:
Fixes #2672

Special notes for your reviewer:

  • Chose to add both ID and Filter to keep UX consistent with other fields
  • I have not added e2e tests, did a quick look and qos extension doesnt seem enabled. Let me know if this is required

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 26, 2025
@netlify
Copy link

netlify bot commented Oct 26, 2025

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 976b0a3
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-openstack/deploys/6915f39c52701a0008b01bbc
😎 Deploy Preview https://deploy-preview-2800--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cecilerobertmichon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @nikParasyr. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 26, 2025
@nikParasyr nikParasyr force-pushed the qos_policy branch 2 times, most recently from 4ca468e to c2eba8d Compare October 29, 2025 08:52
@lentzi90
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 29, 2025
builder = portSecurityOpts
}

if portSpec.QoSPolicyID != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only works in the Create scenario. There is no handling for Update cases.

There is already an existing ensurePortTagsAndTrunk function that handles the update scenario. But that takes care of just the port tags and trunk. I think we should modify this to handle the update cases for QoS as well.
Rename it to something like ensurePortProperties and add QoS as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to also handle the Update scenario. Also added a unit test for that use case.
Question: Do I also need to update webhooks to allow mutations for qosPolicy? and if so are there some pointers for that?


// QoSPolicyID is the ID of the qos policy the port will use.
// +optional
QoSPolicyID *string `json:"qosPolicyID"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omitempty missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed. fixed

Copy link
Contributor Author

@nikParasyr nikParasyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnallapeta thanks for the review.
Your suggestions were implemented, have a minor question on the update comment.
Moreover, I updated ci and e2e to cover this case ( decided to add it on the existing multi-network scenario since it allows to test various ways of setting qos policy and didnt want to make another scenario just for this )

builder = portSecurityOpts
}

if portSpec.QoSPolicyID != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to also handle the Update scenario. Also added a unit test for that use case.
Question: Do I also need to update webhooks to allow mutations for qosPolicy? and if so are there some pointers for that?

@bnallapeta
Copy link
Contributor

@bnallapeta thanks for the review. Your suggestions were implemented, have a minor question on the update comment. Moreover, I updated ci and e2e to cover this case ( decided to add it on the existing multi-network scenario since it allows to test various ways of setting qos policy and didnt want to make another scenario just for this )

Awesome! thanks for this. I have replied on that comment. Looks good overall.

@bnallapeta
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2025
@bnallapeta
Copy link
Contributor

/test pull-cluster-api-provider-openstack-e2e-test

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@nikParasyr
Copy link
Contributor Author

the e2e tests fails on the multi-network one. looking into it but it might take a while

@nikParasyr
Copy link
Contributor Author

@bnallapeta / @lentzi90 can i get some help on where to look/troubleshoot?

The 1st control-plane node doesnt come up.

What i have checked this far:

  1. qos on devstack seems to be configured correct:
  • i can see it in the cloud-init logs, and various ml2 extensions are added correctly automatically
  • qos policy gets created ( grep controller devstack logs for qos-policy => shows the creation + couple of gets )
  • the checks on extension on the code dont fail
  1. qosPolicyID gets resolved properly:
  • no matter if i pass it by id/filter.name/filter.shared the OpenStackServer.status.resolved of the control-plane machine have the correct qosPolicyID
  1. Only 1 port gets created, i can see that both on logs and ports.json:
  • i can see that it contains qos_policy in the request body ( and no err returned )
  • not sure why only 1 port gets created and not 3
  1. The OpenStackServer for the control-plane doesnt get created:
  • the status is .status.Ready: false
  • it is not dumped on servers.json
  • i cannot find any logs on the controller or worker devstack => Given that I have the port ID and server name I would expect that i can at least find the request but i cant
  • k-orc logs are completely empty

It sort of feels to me that this might be k-orc for some reason not creating the server, but not sure if im correct in this and why it wouldn't. (do we need downstream support for QoS on k-orc maybe? while working on this it didnt feel like that but maybe im wrong)

Any insights/help would be much appreciated

@bnallapeta
Copy link
Contributor

@nikParasyr curious.. why did you comment out the qos here? I think the first time the e2e ran on this PR, these were enabled yes? And still the same failure?

Also, instead of using shared, how about we give the same policy for all ports and see if that works. I'm thinking that if there are multiple policies and for some reason, the policy resolution becomes an issue, the port won't come up and that causes the whole thing to fail. Perhaps you can test this locally.

I will try to find some time tomorrow to look into this deeply.

Enable users to set QoSPolicyID on ports. This can be done
by either defining the ID or a filter to query the QoS
Policy
Ensure that the qos extension is enabled on the
openstack deployment before creating ports with
QoS policy set
Allow the QoSPolicyID of a port to be updated when
different from spec.
@lentzi90
Copy link
Contributor

The missing logs from ORC (and also very minimal from CAPO) is strange and disturbing. Maybe it is because the test gets interrupted and fails to cleanup properly?

Regarding ORC, we are actually not using it for the OpenStackServers yet 🙁
See #2814.
This is also an issue because we are now introducing changes in CAPO that diverges from ORC. You will probably need to check and think about how to implement this in ORC as well, or instead of CAPO.

For progressing here though, I think we need to figure out why we are not getting proper logs from CAPO and fix that. Then it will be clear why the server is not created. Maybe try focusing on only the multi-network test? It will be easier (and faster) to see what is going on.


"github.com/go-logr/logr/testr"
"github.com/gophercloud/gophercloud/v2/openstack/networking/v2/extensions/qos/policies"
. "github.com/onsi/gomega" //nolint:revive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the nolint really needed? We should be covered by the config here:

- linters:
- revive
- staticcheck
path: (test)/.*.go

or if we are not covered by that we should update it to cover

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@nikParasyr
Copy link
Contributor Author

@bnallapeta

why did you comment out the qos here?

This is on one of my later runs to see if i still get the same problems with 1 port with qosPolicy instead of all 3. I have made ci runs with either defining the same qosPolicy.id on all 3 ports or with the uncommented lines => they all fail the same way. History of runs is here: https://prow.k8s.io/pr-history/?org=kubernetes-sigs&repo=cluster-api-provider-openstack&pr=2800

I think the first time the e2e ran on this PR, these were enabled yes? And still the same failure?

Initially e2e tests were not added, i added them after tackling your input ( validating qos extension is enabled etc ). And since then they consistently fail.

Also, instead of using shared, how about we give the same policy for all ports and see if that works. I'm thinking that if there are multiple policies and for some reason, the policy resolution becomes an issue, the port won't come up and that causes the whole thing to fail.

I am passing the same policy on all 3 ports, just with different ways ( to test that the filter works as expected ). On the openstackServer.Status.Resolved i can see that all 3 are resolved properly to the same qosPolicyID.


@lentzi90

The missing logs from ORC (and also very minimal from CAPO) is strange and disturbing. Maybe it is because the test gets interrupted and fails to cleanup properly?

I have noticed in general that orc doesn't create many logs ( we had servers fail due to quotas multiple times and orc logs are silent -- probably because as you mention below k-orc is still not used by capo for servers... ).

Regarding ORC, we are actually not using it for the OpenStackServers yet 🙁
See #2814.
This is also an issue because we are now introducing changes in CAPO that diverges from ORC. You will probably need to check and think about how to implement this in ORC as well, or instead of CAPO.

Ok this is good to know. If this is to happen, its probably better to wait for it and implement it once in k-orc ( and apis etc here ), instead of doing it now on capo, then on k-orc and patching capo again. from my side this doesn't have the highest priority atm and can wait for a bit for the migration to occur. I'll check a bit more in-depth the related ticket and k-orc development (maybe i can start implementing qos there and we meet in the middle :P )

For progressing here though, I think we need to figure out why we are not getting proper logs from CAPO and fix that. Then it will be clear why the server is not created. Maybe try focusing on only the multi-network test? It will be easier (and faster) to see what is going on.

So this PR on the last commit has multiple runs (because i was amending ) https://prow.k8s.io/pr-history/?org=kubernetes-sigs&repo=cluster-api-provider-openstack&pr=2800. All of them fail on the multi-network (which is the only e2e that i "touched"). CAPO logs there are consistently empty (other than the first 50 lines of starting controllers etc). Unless im checking on the wrong place

Update the e2e tests to add coverage for
port with QoS policy set. For this:
- enable q-qos neutron plugin on ci devstack
- update multi-network scenario to add qos
  policies using ID and filter
@k8s-ci-robot
Copy link
Contributor

@nikParasyr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-openstack-test 976b0a3 link true /test pull-cluster-api-provider-openstack-test
pull-cluster-api-provider-openstack-e2e-test 976b0a3 link true /test pull-cluster-api-provider-openstack-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Status: Inbox

Development

Successfully merging this pull request may close these issues.

Add QoSPolicyID to OpenstackCluster gateway settings

4 participants