Skip to content

Conversation

@minmzzhang
Copy link
Collaborator

@minmzzhang minmzzhang commented Jul 29, 2025

  • kind/chore

What this PR does / why we need it:

  • Add the template variable of SriovCniVersion instead of hardcoded "0.3.1"
  • Add unit tests for rendering SR-IOV configuration
  • Update SR-IOV NAD rendering from multi-nic-cni network args

Which issue(s) this PR is related to:

Closes #213, closes #233

When the multi-nic-cni for SR-IOV (multinic-sriov) firstly created, it had the correct info passed through from the definition:

apiVersion: multinic.fms.io/v1
kind: MultiNicNetwork
metadata:
  name: multinic-sriov
spec:
  subnet: "172.34.0.0/16"
  ipam: |
    {
      "type": "multi-nic-ipam",
      "hostBlock": 8,
      "interfaceBlock": 2,
      "vlanMode": "l2"
    }
  multiNICIPAM: true
  plugin:
    cniVersion: "1.0.0"
    type: sriov
    args:
      vlan: "1134"
      numVfs: "2"
      isRdma: "true"
spec:
  config: '{"cniVersion":"0.3.0","name":"multinic-sriov","type":"multi-nic","ipam":{"hostBlock":8,"interfaceBlock":2,"type":"multi-nic-ipam","vlanMode":"l2"},"dns":{},"plugin":{"cniVersion":"1.0.0","type":"sriov","vlan":1134},"subnet":"172.34.0.0/16","masterNets":["172.12.0.0/16"],"multiNICIPAM":true,"daemonIP":"","daemonPort":11000}'

However, after a while, the NAD get updated again, but with incorrect vlan info:

spec:
  config: '{"cniVersion":"0.3.0","name":"multinic-sriov","type":"multi-nic","ipam":{"hostBlock":8,"interfaceBlock":2,"type":"multi-nic-ipam","vlanMode":"l2"},"dns":{},"plugin":{"cniVersion":"1.0.0","type":"sriov","vlan":0},"subnet":"172.34.0.0/16","masterNets":["172.12.0.0/16"],"multiNICIPAM":true,"daemonIP":"","daemonPort":11000}'

This is due to the reconcile loop invoked by controller which impacts the SR-IOV plugin configuration. On the first call, SR-IOV plugin generates the correct config with a nested plugin object: "plugin":{"cniVersion":"1.0.0","type":"sriov","vlan":1134}. Then "RemoveEmpty" applied and the "plugin" field got stripped out becaues it is not in "netConfKeys"

var netConfKeys []string = []string{"cniVersion", "type"}

When the second reconcile call kicked in, the controller tries to update the NAD because CheckDefChanged detects the config difference, which resulted in a malformed config that breaks the plugin.

@minmzzhang minmzzhang added the kind/chore For maintenance purpose (documentation, CI, testing, code refactor) label Jul 29, 2025
@minmzzhang minmzzhang force-pushed the sriov-validation branch 6 times, most recently from d02ffac to 412c1cd Compare July 31, 2025 14:39
@minmzzhang minmzzhang requested a review from sunya-ch July 31, 2025 15:18
@minmzzhang minmzzhang force-pushed the sriov-validation branch 4 times, most recently from 020be39 to e172bc3 Compare August 1, 2025 18:45
Copy link
Collaborator

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thank you so much. Wonderful work!

Additionally, I think we need to add doc about compatible version for each CNI too.

Created a follow-up issue to track here:

@sunya-ch sunya-ch added kind/feat New feature or change present behavior kind/fix Fix bugs kind/chore For maintenance purpose (documentation, CI, testing, code refactor) and removed kind/chore For maintenance purpose (documentation, CI, testing, code refactor) kind/feat New feature or change present behavior labels Aug 4, 2025
@sunya-ch sunya-ch merged commit cf27354 into foundation-model-stack:main Aug 4, 2025
9 checks passed
@sunya-ch sunya-ch mentioned this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/chore For maintenance purpose (documentation, CI, testing, code refactor) kind/fix Fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit connection testing steps for each scenario revisit sriov support

2 participants