Add MIG config support when MIG-backed vGPU type by mresvanis · Pull Request #93 · NVIDIA/vgpu-device-manager

mresvanis · 2025-06-05T13:01:04Z

This PR adds MIG configuration support via nvidia-mig-parted when MIG-backed vGPU types are included in the selected vGPU configuration before the latter takes place.

The changes include:

Add CLI flags for MIG configuration options.
Include the nvidia-mig-parted binary and busybox in the container image.
Parse the selected vGPU configuration to detect MIG requirements, convert the vGPU configuration to the respective MIG configuration and configure MIG before vGPUs via nvidia-mig-parted.
nvidia-mig-parted requires NVML. The NVIDIA driver library path is searched and used with the LD_PRELOAD env var when running nvidia-mig-parted commands. When NVML is not available, skip MIG configuration and proceed to vGPU configuration. This keeps the vGPU Device Manager backwards compatible with components that do not make NVML available to it.

Design document

mresvanis · 2025-06-05T13:02:33Z

/cc @cdesiniotis

cdesiniotis

Thanks @mresvanis for the work on this! This is a great start.

The main feedback I have concerns the logic which converts vGPU configuration to mig-parted configuration.

deployments/container/Dockerfile.ubi9

cdesiniotis · 2025-06-23T19:11:51Z

cmd/nvidia-k8s-vgpu-dm/main.go

+}
+
+func determineMIGConfig(selectedConfig string) (string, error) {
+	vgpuType, err := types.ParseVGPUType(selectedConfig)


This will only work if selectedConfig is a valid vGPU type string, e.g. A40-48Q. However, this is not guaranteed as the keys in the vGPU config file can take on arbitrary values. With that said, I understand why you made this assumption -- all of the configs (except the default config) in the default vGPU ConfigMap included in the gpu-operator are named as valid vGPU type strings: https://github.com/NVIDIA/gpu-operator/blob/6324d2aca562edf46d93cbf9d2a0837ab5c12e59/assets/state-vgpu-device-manager/0500_configmap.yaml

This will definitely return an error when the default configuration is applied. For your reference, the default configuration is applied when a node is not labeled with nvidia.com/vgpu.config:

vgpu-device-manager/cmd/nvidia-k8s-vgpu-dm/main.go

Lines 195 to 207 in 5f37569

// Apply initial vGPU configuration. If the node is not labeled with an

// explicit config, apply the default configuration.

selectedConfig, err := getNodeLabelValue(clientset, vGPUConfigLabel)

if err != nil {

return fmt.Errorf("unable to get vGPU config label: %v", err)

}

if selectedConfig == "" {

log.Infof("No vGPU config specified for node. Proceeding with default config: %s", defaultVGPUConfigFlag)

selectedConfig = defaultVGPUConfigFlag

} else {

selectedConfig = vGPUConfig.Get()

}

A "proper" solution would likely entail

Parse the vGPU config file

Get the selectedConfig entry

Convert the vGPU config into an equivalent mig-parted config

Write the mig-parted config to a file

Invoke reconfigureMIG and set the MigPartedConfigFileFlag to the file created in Step 4.

For example, the below vGPU configuration

custom-config: - devices: [0] vgpu-devices: "A100-4C": 10 - devices: [1] vgpu-devices: "A100-1-5C": 2 "A100-2-10C": 1 "A100-3-20C": 1

would get converted to the following mig-parted configuration:

custom-config: - devices: [0] mig-enabled: false - devices: [1] mig-enabled: true mig-devices: "1g.5gb": 2 "2g.10gb": 1 "3g.20gb": 1

My proposed solution has the following benefits when compared to the current implementation:

We no longer depend on the mig-parted configuration file. This simplifies UX -- for example, if one applied the custom-config from my above example, the custom-config entry would only need to be present in the vGPU ConfigMap and not also in the mig-parted ConfigMap. Additionally, this simplifies the GPU Operator implemenation -- the GPU Operator no longer needs to deploy the mig-parted ConfigMap for this use case.

No assumptions are made concerning how configurations are named.

We naturally can support custom / mixed configurations like my example above. The current implementation assumes a single configuration, where a single vGPU type is used across all GPUs on a node.

Thank you for the clear and detailed explanation! That definitely makes sense and thank you for validating the proper solution logic you first proposed in the design document, but I totally skipped until now (I apologize).

I think I already made the respective adjustments and I'm now looking through additional edge cases. PTAL and let me know if you think that's what you described above (or at least in the same direction) 🙏

cmd/nvidia-k8s-vgpu-dm/main.go

cmd/nvidia-k8s-vgpu-dm/reconfigure_mig.go

cdesiniotis · 2025-06-23T22:27:45Z

cmd/nvidia-k8s-vgpu-dm/reconfigure_mig.go

@mresvanis thanks for writing this in Go!

Out of scope: In my opinion, the goal state should be to update the mig-parted repository so that https://github.com/NVIDIA/mig-parted/blob/c0ae956dd7d5414d8a450d478f58f1e17f83ab2e/deployments/container/reconfigure-mig.sh is rewritten in Go. That way, both mig-manager and vgpu-device-manager can leverage the same code. Since the vgpu-device-manager does not need all of the functionality from reconfigure-mig.sh (e.g. vgpu-device-manager does not need reconfigure-mig.sh to stop / restart GPU clients in the operator namespace), we may need to introduce some new options to opt-out of certain functionality.

For now, I am okay if we proceed with introducing reconfigure_mig.go in this PR. Once we rewrite the code in Go in the mig-parted repo, we can update vgpu-device-manager to reuse that code accordingly. I will take the action item to update mig-parted.

@mresvanis thanks for writing this in Go!

No worries, I saw that you were using Go in this project and I assumed the intention is to move the mig-parted reconfigure-mig.sh script to Go as well eventually.

Out of scope: In my opinion, the goal state should be to update the mig-parted repository so that https://github.com/NVIDIA/mig-parted/blob/c0ae956dd7d5414d8a450d478f58f1e17f83ab2e/deployments/container/reconfigure-mig.sh is rewritten in Go. That way, both mig-manager and vgpu-device-manager can leverage the same code. Since the vgpu-device-manager does not need all of the functionality from reconfigure-mig.sh (e.g. vgpu-device-manager does not need reconfigure-mig.sh to stop / restart GPU clients in the operator namespace), we may need to introduce some new options to opt-out of certain functionality.

I agree 100%, I also thought I should keep the scope of this change as tight as possible.

For now, I am okay if we proceed with introducing reconfigure_mig.go in this PR. Once we rewrite the code in Go in the mig-parted repo, we can update vgpu-device-manager to reuse that code accordingly. I will take the action item to update mig-parted.

Awesome, please feel free to let me know if I can help with this refactoring. Happy to take this up (since I'm also the one introducing the tech debt here :) )

Driven by the gosec linter errors G204: Audit use of command execution in reconfigure_mig.go I added validation for the options (at least for the ones that I think it makes sense).

Please let me know if adding the https://github.com/go-playground/validator package just for this purpose makes sense, otherwise I can roll back these changes and I think we can safely ignore the respective gosec lint errors.

@mresvanis thanks for the work on this. I used what you have as a starting poing and pulled it into the mig-parted project in NVIDIA/mig-parted#216 where I'm slowly adding the missing functionality there.

My thinking was that we would create a package: github.com/NVIDIA/mig-parted/pkg/mig/reconfigure where we can expose this API and import it into the vgpu-device-manager. Did you want to "migrate" this PR there, or are you ok with me taking over?

@elezar that's great, I think this makes total sense and it will reduce the debt we introduced here. I'm perfectly fine with you taking this over and please feel free to let me know if I can help in any way.

See NVIDIA/mig-parted#218 and #101 for early drafts. I think getting these (or at least 218) in before NVIDIA/mig-parted#216 probably makes sense.

examples/nvidia-vgpu-device-manager-mig-support-example.yaml

cdesiniotis · 2025-06-23T22:30:17Z

examples/nvidia-vgpu-device-manager-mig-support-example.yaml

+        imagePullPolicy: IfNotPresent
+        env:
+        - name: NAMESPACE
+          value: "gpu-operator"


Question: Since this DaemonSet is configured to run in the default namespace, should this also be default?

Absolutely, great catch! I updated this one, but we should also update the original example here.

I thought a separate PR for the original example would keep this one focused on MIG config support, but if you think it's simpler/easier to have this change here we can do that. WDYT?

I am fine with a separate PR to update the samples.

cmd/nvidia-k8s-vgpu-dm/main.go

cmd/nvidia-k8s-vgpu-dm/reconfigure_mig.go

deployments/container/Dockerfile.ubi9

cdesiniotis · 2025-06-26T22:52:51Z

cmd/nvidia-k8s-vgpu-dm/reconfigure_mig.go

This change adds MIG configuration support via nvidia-mig-parted when MIG-backed vGPU types are included in the selected vGPU config, before the vGPU configuration takes place. Specifically: - Add CLI flags for MIG configuration options. - Include the nvidia-mig-parted binary in the container image. - Parse vGPU config to detect MIG requirements, convert the vGPU config to the respective MIG config and configure MIG before vGPUs via nvidia-mig-parted. - nvidia-mig-parted requires NVML. The NVIDIA driver library path is searched and used with the `LD_PRELOAD` env var when running nvidia-mig-parted commands. When NVML is not available, skip MIG configuration and proceed to vGPU configuration. This ensures that the vGPU Device Manager is backwards compatible with components that do not make NVML available to it. Signed-off-by: Michail Resvanis <mresvani@redhat.com>

Signed-off-by: Michail Resvanis <mresvani@redhat.com>

cdesiniotis

@mresvanis LGTM. Thanks for the hard work and patience on this!

elezar · 2025-07-08T12:17:33Z

cmd/nvidia-k8s-vgpu-dm/reconfigure_mig.go

+		}
+		migModeChangeRequired = true
+	}
+


Should we not ste the state label to pending at this point?

Thanks for catching that! I think the issue with handling the reboot during MIG configuration (when that's enabled) is that we always set the state label to pending before handling MIG configuration. The latter makes this check invalid, as the state label will never equal rebooting.

We could fix this by:

checking if the state label is set to rebooting before setting it to pending here

setting the state label to pending after assertMIGModeOnly if it's set to rebooting

I can open a PR with this change if you also think that this makes sense. WDYT?

mresvanis force-pushed the add-mig-configuration branch from 65cac5a to 6348f09 Compare June 5, 2025 13:02

mresvanis mentioned this pull request Jun 5, 2025

Add driver rootfs and /host to vgpu-device-manager NVIDIA/gpu-operator#1474

Merged

mresvanis force-pushed the add-mig-configuration branch 5 times, most recently from 9c10da3 to 18cd4c8 Compare June 13, 2025 10:53

mresvanis marked this pull request as ready for review June 13, 2025 10:54

mresvanis force-pushed the add-mig-configuration branch 5 times, most recently from 9198b86 to 0b7ae0f Compare June 17, 2025 12:50

cdesiniotis reviewed Jun 23, 2025

View reviewed changes

mresvanis force-pushed the add-mig-configuration branch 4 times, most recently from 82479b8 to 84fc0ef Compare June 24, 2025 15:57

cdesiniotis reviewed Jun 25, 2025

View reviewed changes

cmd/nvidia-k8s-vgpu-dm/main.go Outdated Show resolved Hide resolved

cmd/nvidia-k8s-vgpu-dm/main.go Outdated Show resolved Hide resolved

cmd/nvidia-k8s-vgpu-dm/main.go Show resolved Hide resolved

mresvanis force-pushed the add-mig-configuration branch 3 times, most recently from 2b96d3e to 05a6ccd Compare June 26, 2025 09:35

cdesiniotis reviewed Jun 26, 2025

View reviewed changes

mresvanis force-pushed the add-mig-configuration branch 2 times, most recently from e7f4439 to aeb4804 Compare June 27, 2025 07:09

mresvanis added 2 commits June 30, 2025 12:52

Add example deployment for MIG config support

b1ea9dd

Signed-off-by: Michail Resvanis <mresvani@redhat.com>

mresvanis force-pushed the add-mig-configuration branch from aeb4804 to b1ea9dd Compare June 30, 2025 10:52

cdesiniotis approved these changes Jul 7, 2025

View reviewed changes

cdesiniotis merged commit 6b1dd70 into NVIDIA:main Jul 7, 2025
5 checks passed

mresvanis deleted the add-mig-configuration branch July 8, 2025 09:12

elezar reviewed Jul 8, 2025

View reviewed changes

This was referenced Jul 9, 2025

Migrate reconfigure to golang NVIDIA/mig-parted#216

Draft

Add reconfigure api NVIDIA/mig-parted#218

Open

	// Apply initial vGPU configuration. If the node is not labeled with an
	// explicit config, apply the default configuration.
	selectedConfig, err := getNodeLabelValue(clientset, vGPUConfigLabel)
	if err != nil {
	return fmt.Errorf("unable to get vGPU config label: %v", err)
	}

	if selectedConfig == "" {
	log.Infof("No vGPU config specified for node. Proceeding with default config: %s", defaultVGPUConfigFlag)
	selectedConfig = defaultVGPUConfigFlag
	} else {
	selectedConfig = vGPUConfig.Get()
	}

Conversation

mresvanis commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mresvanis commented Jun 5, 2025

Uh oh!

cdesiniotis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mresvanis Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mresvanis Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdesiniotis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mresvanis commented Jun 5, 2025 •

edited

Loading

mresvanis Jun 30, 2025 •

edited

Loading

mresvanis Jun 24, 2025 •

edited

Loading