Skip to content

Add MIG config support when MIG-backed vGPU type#93

Merged
cdesiniotis merged 2 commits intoNVIDIA:mainfrom
mresvanis:add-mig-configuration
Jul 7, 2025
Merged

Add MIG config support when MIG-backed vGPU type#93
cdesiniotis merged 2 commits intoNVIDIA:mainfrom
mresvanis:add-mig-configuration

Conversation

@mresvanis
Copy link
Copy Markdown
Contributor

@mresvanis mresvanis commented Jun 5, 2025

This PR adds MIG configuration support via nvidia-mig-parted when MIG-backed vGPU types are included in the selected vGPU configuration before the latter takes place.

The changes include:

  • Add CLI flags for MIG configuration options.
  • Include the nvidia-mig-parted binary and busybox in the container image.
  • Parse the selected vGPU configuration to detect MIG requirements, convert the vGPU configuration to the respective MIG configuration and configure MIG before vGPUs via nvidia-mig-parted.
  • nvidia-mig-parted requires NVML. The NVIDIA driver library path is searched and used with the LD_PRELOAD env var when running nvidia-mig-parted commands. When NVML is not available, skip MIG configuration and proceed to vGPU configuration. This keeps the vGPU Device Manager backwards compatible with components that do not make NVML available to it.

Design document

@mresvanis mresvanis force-pushed the add-mig-configuration branch from 65cac5a to 6348f09 Compare June 5, 2025 13:02
@mresvanis
Copy link
Copy Markdown
Contributor Author

/cc @cdesiniotis

@mresvanis mresvanis force-pushed the add-mig-configuration branch 5 times, most recently from 9c10da3 to 18cd4c8 Compare June 13, 2025 10:53
@mresvanis mresvanis marked this pull request as ready for review June 13, 2025 10:54
@mresvanis mresvanis force-pushed the add-mig-configuration branch 5 times, most recently from 9198b86 to 0b7ae0f Compare June 17, 2025 12:50
Copy link
Copy Markdown
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mresvanis for the work on this! This is a great start.

The main feedback I have concerns the logic which converts vGPU configuration to mig-parted configuration.

}

func determineMIGConfig(selectedConfig string) (string, error) {
vgpuType, err := types.ParseVGPUType(selectedConfig)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work if selectedConfig is a valid vGPU type string, e.g. A40-48Q. However, this is not guaranteed as the keys in the vGPU config file can take on arbitrary values. With that said, I understand why you made this assumption -- all of the configs (except the default config) in the default vGPU ConfigMap included in the gpu-operator are named as valid vGPU type strings: https://github.com/NVIDIA/gpu-operator/blob/6324d2aca562edf46d93cbf9d2a0837ab5c12e59/assets/state-vgpu-device-manager/0500_configmap.yaml

This will definitely return an error when the default configuration is applied. For your reference, the default configuration is applied when a node is not labeled with nvidia.com/vgpu.config:

// Apply initial vGPU configuration. If the node is not labeled with an
// explicit config, apply the default configuration.
selectedConfig, err := getNodeLabelValue(clientset, vGPUConfigLabel)
if err != nil {
return fmt.Errorf("unable to get vGPU config label: %v", err)
}
if selectedConfig == "" {
log.Infof("No vGPU config specified for node. Proceeding with default config: %s", defaultVGPUConfigFlag)
selectedConfig = defaultVGPUConfigFlag
} else {
selectedConfig = vGPUConfig.Get()
}

A "proper" solution would likely entail

  1. Parse the vGPU config file
  2. Get the selectedConfig entry
  3. Convert the vGPU config into an equivalent mig-parted config
  4. Write the mig-parted config to a file
  5. Invoke reconfigureMIG and set the MigPartedConfigFileFlag to the file created in Step 4.

For example, the below vGPU configuration

custom-config:
  - devices: [0]
    vgpu-devices:
      "A100-4C": 10
  - devices: [1]
    vgpu-devices:
      "A100-1-5C": 2
      "A100-2-10C": 1
      "A100-3-20C": 1 

would get converted to the following mig-parted configuration:

custom-config:
  - devices: [0]
    mig-enabled: false
  - devices: [1]
    mig-enabled: true
    mig-devices:
      "1g.5gb": 2
      "2g.10gb": 1
      "3g.20gb": 1 

My proposed solution has the following benefits when compared to the current implementation:

  1. We no longer depend on the mig-parted configuration file. This simplifies UX -- for example, if one applied the custom-config from my above example, the custom-config entry would only need to be present in the vGPU ConfigMap and not also in the mig-parted ConfigMap. Additionally, this simplifies the GPU Operator implemenation -- the GPU Operator no longer needs to deploy the mig-parted ConfigMap for this use case.
  2. No assumptions are made concerning how configurations are named.
  3. We naturally can support custom / mixed configurations like my example above. The current implementation assumes a single configuration, where a single vGPU type is used across all GPUs on a node.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clear and detailed explanation! That definitely makes sense and thank you for validating the proper solution logic you first proposed in the design document, but I totally skipped until now (I apologize).

I think I already made the respective adjustments and I'm now looking through additional edge cases. PTAL and let me know if you think that's what you described above (or at least in the same direction) 🙏

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mresvanis thanks for writing this in Go!

Out of scope: In my opinion, the goal state should be to update the mig-parted repository so that https://github.com/NVIDIA/mig-parted/blob/c0ae956dd7d5414d8a450d478f58f1e17f83ab2e/deployments/container/reconfigure-mig.sh is rewritten in Go. That way, both mig-manager and vgpu-device-manager can leverage the same code. Since the vgpu-device-manager does not need all of the functionality from reconfigure-mig.sh (e.g. vgpu-device-manager does not need reconfigure-mig.sh to stop / restart GPU clients in the operator namespace), we may need to introduce some new options to opt-out of certain functionality.

For now, I am okay if we proceed with introducing reconfigure_mig.go in this PR. Once we rewrite the code in Go in the mig-parted repo, we can update vgpu-device-manager to reuse that code accordingly. I will take the action item to update mig-parted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mresvanis thanks for writing this in Go!

No worries, I saw that you were using Go in this project and I assumed the intention is to move the mig-parted reconfigure-mig.sh script to Go as well eventually.

Out of scope: In my opinion, the goal state should be to update the mig-parted repository so that https://github.com/NVIDIA/mig-parted/blob/c0ae956dd7d5414d8a450d478f58f1e17f83ab2e/deployments/container/reconfigure-mig.sh is rewritten in Go. That way, both mig-manager and vgpu-device-manager can leverage the same code. Since the vgpu-device-manager does not need all of the functionality from reconfigure-mig.sh (e.g. vgpu-device-manager does not need reconfigure-mig.sh to stop / restart GPU clients in the operator namespace), we may need to introduce some new options to opt-out of certain functionality.

I agree 100%, I also thought I should keep the scope of this change as tight as possible.

For now, I am okay if we proceed with introducing reconfigure_mig.go in this PR. Once we rewrite the code in Go in the mig-parted repo, we can update vgpu-device-manager to reuse that code accordingly. I will take the action item to update mig-parted.

Awesome, please feel free to let me know if I can help with this refactoring. Happy to take this up (since I'm also the one introducing the tech debt here :) )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @elezar

Copy link
Copy Markdown
Contributor Author

@mresvanis mresvanis Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driven by the gosec linter errors G204: Audit use of command execution in reconfigure_mig.go I added validation for the options (at least for the ones that I think it makes sense).

Please let me know if adding the https://github.com/go-playground/validator package just for this purpose makes sense, otherwise I can roll back these changes and I think we can safely ignore the respective gosec lint errors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mresvanis thanks for the work on this. I used what you have as a starting poing and pulled it into the mig-parted project in NVIDIA/mig-parted#216 where I'm slowly adding the missing functionality there.

My thinking was that we would create a package: github.com/NVIDIA/mig-parted/pkg/mig/reconfigure where we can expose this API and import it into the vgpu-device-manager. Did you want to "migrate" this PR there, or are you ok with me taking over?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar that's great, I think this makes total sense and it will reduce the debt we introduced here. I'm perfectly fine with you taking this over and please feel free to let me know if I can help in any way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See NVIDIA/mig-parted#218 and #101 for early drafts. I think getting these (or at least 218) in before NVIDIA/mig-parted#216 probably makes sense.

imagePullPolicy: IfNotPresent
env:
- name: NAMESPACE
value: "gpu-operator"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Since this DaemonSet is configured to run in the default namespace, should this also be default?

Copy link
Copy Markdown
Contributor Author

@mresvanis mresvanis Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, great catch! I updated this one, but we should also update the original example here.

I thought a separate PR for the original example would keep this one focused on MIG config support, but if you think it's simpler/easier to have this change here we can do that. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with a separate PR to update the samples.

@mresvanis mresvanis force-pushed the add-mig-configuration branch 4 times, most recently from 82479b8 to 84fc0ef Compare June 24, 2025 15:57
@mresvanis mresvanis force-pushed the add-mig-configuration branch 3 times, most recently from 2b96d3e to 05a6ccd Compare June 26, 2025 09:35
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @elezar

@mresvanis mresvanis force-pushed the add-mig-configuration branch 2 times, most recently from e7f4439 to aeb4804 Compare June 27, 2025 07:09
This change adds MIG configuration support via nvidia-mig-parted when
MIG-backed vGPU types are included in the selected vGPU config, before
the vGPU configuration takes place.

Specifically:

- Add CLI flags for MIG configuration options.
- Include the nvidia-mig-parted binary in the container image.
- Parse vGPU config to detect MIG requirements, convert the vGPU config
  to the respective MIG config and configure MIG before vGPUs via
  nvidia-mig-parted.
- nvidia-mig-parted requires NVML. The NVIDIA driver library path is
  searched and used with the `LD_PRELOAD` env var when running
  nvidia-mig-parted commands. When NVML is not available, skip
  MIG configuration and proceed to vGPU configuration. This ensures
  that the vGPU Device Manager is backwards compatible with components
  that do not make NVML available to it.

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
Signed-off-by: Michail Resvanis <mresvani@redhat.com>
@mresvanis mresvanis force-pushed the add-mig-configuration branch from aeb4804 to b1ea9dd Compare June 30, 2025 10:52
Copy link
Copy Markdown
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mresvanis LGTM. Thanks for the hard work and patience on this!

@cdesiniotis cdesiniotis merged commit 6b1dd70 into NVIDIA:main Jul 7, 2025
5 checks passed
@mresvanis mresvanis deleted the add-mig-configuration branch July 8, 2025 09:12
}
migModeChangeRequired = true
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not ste the state label to pending at this point?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! I think the issue with handling the reboot during MIG configuration (when that's enabled) is that we always set the state label to pending before handling MIG configuration. The latter makes this check invalid, as the state label will never equal rebooting.

We could fix this by:

  • checking if the state label is set to rebooting before setting it to pending here
  • setting the state label to pending after assertMIGModeOnly if it's set to rebooting

I can open a PR with this change if you also think that this makes sense. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants