Add MPS control daemon support to k8s device plugin #789

yeazelm · 2025-12-31T22:46:23Z

Issue number:

Related to: bottlerocket-os/bottlerocket#4673

Description of changes:
This builds the mps-control-daemon binary from the device plugin that allows MPS support. We have to patch the hardcoded paths for Bottlerocket usage since the device plugin assumes it can write to / which doesn't work with Bottlerocket.

This change also adds a new service to start this binary when settings request it. Otherwise it daemonizes sleep infinity to let systemd try-restart upon changing the settings for MPS.

The change should be safe to take without the bottlerocket-os/bottlerocket-kernel-kit#347 change or the upcoming settings change but the daemon will not work without the kmod update and the settings being properly set.

Testing done:
Build images with the kernel change, settings changes, and validated that a node will come up with MPS working if set in user data, and the services are restarted and MPS can be enabled at runtime as well.

Setting in userdata for a g6.2xlarge which only has one GPU

Details

eksctl config snippet for setting it at the beginning:

    bottlerocket:
      settings:
        kubelet-device-plugins:
          nvidia:
            device-sharing-strategy: "mps"
            mps:
              replicas: 2

Results in a node reporting nvidia.com/gpu.shared:

Capacity:
  cpu:                    8
  ephemeral-storage:      81854Mi
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 31619656Ki
  nvidia.com/gpu.shared:  2
  pods:                   58
Allocatable:
  cpu:                    7910m
  ephemeral-storage:      76173383962
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 30602824Ki
  nvidia.com/gpu.shared:  2
  pods:                   58

Setting the MPS after boot

Details

Start with a node with no configuration for MPS:

# apiclient get settings.kubelet-device-plugins.nvidia
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "cdi-cri",
        "device-partitioning-strategy": "none",
        "device-sharing-strategy": "none",
        "pass-device-specs": true
      }
    }
  }
}

# systemctl status
● ip-192-168-12-91.us-west-2.compute.internal
    State: running
    Units: 458 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Wed 2025-12-31 22:32:18 UTC; 5min ago
  systemd: 257.9
  Tainted: unmerged-bin
   CGroup: /
....

# systemctl status nvidia-mps-control-daemon
● nvidia-mps-control-daemon.service - NVIDIA MPS Control Daemon
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-mps-control-daemon.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf
             /etc/systemd/system/nvidia-mps-control-daemon.service.d
             └─exec-start.conf
     Active: active (running) since Wed 2025-12-31 22:32:32 UTC; 5min ago
 Invocation: d1565c1130dc4d9e87108f540f1178da
   Main PID: 3111 (/usr/bin/sleep)
      Tasks: 1 (limit: 36988)
     Memory: 308K (peak: 1.2M)
        CPU: 5ms
     CGroup: /system.slice/nvidia-mps-control-daemon.service
             └─3111 /usr/bin/sleep infinity

Dec 31 22:32:32 ip-... systemd[1]: Started NVIDIA MPS Control Daemon.

# systemctl cat nvidia-mps-control-daemon
# /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-mps-control-daemon.service
[Unit]
Description=NVIDIA MPS Control Daemon
After=nvidia-k8s-device-plugin.service
Requires=nvidia-k8s-device-plugin.service

[Service]
Type=simple
ExecStart=/bin/true
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

# /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d/00-aws-config.conf
[Service]
# Set the AWS_SDK_LOAD_CONFIG system-wide instead of at the individual service
# level, to make sure new system services that use the AWS SDK for Go read the
# shared AWS config
Environment=AWS_SDK_LOAD_CONFIG=true

# /etc/systemd/system/nvidia-mps-control-daemon.service.d/exec-start.conf
[Service]
ExecStart=
ExecStart=/usr/bin/sleep infinity

The node shows one GPU:

Capacity:
  cpu:                8
  ephemeral-storage:  81854Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             31619660Ki
  nvidia.com/gpu:     1
  pods:               58
Allocatable:
  cpu:                7910m
  ephemeral-storage:  76173383962
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             30602828Ki
  nvidia.com/gpu:     1
  pods:               58

Then set MPS:

apiclient set settings.kubelet-device-plugins.nvidia.device-sharing-strategy=mps settings.kubelet-device-plugins.nvidia.mps.replicas=8


bash-5.1# apiclient get settings.kubelet-device-plugins.nvidia
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "cdi-cri",
        "device-partitioning-strategy": "none",
        "device-sharing-strategy": "mps",
        "mps": {
          "replicas": 8
        },
        "pass-device-specs": true
      }
    }
  }
}

Now check the rest of the system:

# systemctl status
● ip-192-168-12-91.us-west-2.compute.internal
    State: running
    Units: 458 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Wed 2025-12-31 22:32:18 UTC; 7min ago
  systemd: 257.9
  Tainted: unmerged-bin
   CGroup: /
           ├─default
...
# systemctl status nvidia-mps-control-daemon
● nvidia-mps-control-daemon.service - NVIDIA MPS Control Daemon
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-mps-control-daemon.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf
             /etc/systemd/system/nvidia-mps-control-daemon.service.d
             └─exec-start.conf
     Active: active (running) since Wed 2025-12-31 22:39:41 UTC; 36s ago
 Invocation: 7191c5bc120246709e113d50d3ce3c54
   Main PID: 6994 (mps-control-dae)
      Tasks: 12 (limit: 36988)
     Memory: 49.1M (peak: 62M)
        CPU: 227ms
     CGroup: /system.slice/nvidia-mps-control-daemon.service
             ├─6994 /usr/bin/mps-control-daemon --config-file /etc/nvidia-k8s-device-plugin/settings.yaml
             ├─7015 nvidia-cuda-mps-control -d
             └─7021 tail -n +1 -f /run/mps/nvidia.com/gpu.shared/log/control.log

Dec 31 22:39:41 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:39:41.892 Control  7015] Accepting connection...
Dec 31 22:39:41 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:39:41.892 Control  7015] NEW UI
Dec 31 22:39:41 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:39:41.892 Control  7015] Cmd:set_default_active_thread_percentage 12
Dec 31 22:39:41 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:39:41.892 Control  7015] 12.0
Dec 31 22:39:41 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:39:41.892 Control  7015] UI closed
Dec 31 22:40:11 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:40:11.832 Control  7015] Accepting connection...
Dec 31 22:40:11 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:40:11.832 Control  7015] NEW UI
Dec 31 22:40:11 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:40:11.832 Control  7015] Cmd:get_default_active_thread_percentage
Dec 31 22:40:11 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:40:11.832 Control  7015] 12.0
Dec 31 22:40:11 ip-192-168-12-91.us-west-2.compute.internal mps-control-daemon[7021]: [2025-12-31 22:40:11.832 Control  7015] UI closed

# systemctl cat nvidia-mps-control-daemon
# /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-mps-control-daemon.service
[Unit]
Description=NVIDIA MPS Control Daemon
After=nvidia-k8s-device-plugin.service
Requires=nvidia-k8s-device-plugin.service

[Service]
Type=simple
ExecStart=/bin/true
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

# /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d/00-aws-config.conf
[Service]
# Set the AWS_SDK_LOAD_CONFIG system-wide instead of at the individual service
# level, to make sure new system services that use the AWS SDK for Go read the
# shared AWS config
Environment=AWS_SDK_LOAD_CONFIG=true

# /etc/systemd/system/nvidia-mps-control-daemon.service.d/exec-start.conf
[Service]
ExecStart=
ExecStart=/usr/bin/mps-control-daemon --config-file /etc/nvidia-k8s-device-plugin/settings.yaml

# cat /etc/nvidia-k8s-device-plugin/settings.yaml
version: v1
flags:
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  mpsRoot: "/run/nvidia/mps"
  plugin:
    passDeviceSpecs: true
    deviceListStrategy: cdi-cri
    deviceIDStrategy: index
    containerDriverRoot: "/"
sharing:
  mps:
    renameByDefault: true
    resources:
    - name: "nvidia.com/gpu"
      replicas: 8

And the node shows the empty nvidia.com/gpu offering but now a shared one:

Capacity:
  cpu:                    8
  ephemeral-storage:      81854Mi
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 31619660Ki
  nvidia.com/gpu:         1
  nvidia.com/gpu.shared:  8
  pods:                   58
Allocatable:
  cpu:                    7910m
  ephemeral-storage:      76173383962
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 30602828Ki
  nvidia.com/gpu:         0
  nvidia.com/gpu.shared:  8
  pods:                   58

This is a known edge case and is similar to how timeslicing works. In order to avoid old resources, you'd need to start with the user-data approach.

Shifting to rename-by-default=false(apiclient set settings.kubelet-device-plugins.nvidia.mps.rename-by-default=false) will have the original nvidia.com/gpu resource instead:

Capacity:
  cpu:                    8
  ephemeral-storage:      81854Mi
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 31619660Ki
  nvidia.com/gpu:         8
  nvidia.com/gpu.shared:  8
  pods:                   58
Allocatable:
  cpu:                    7910m
  ephemeral-storage:      76173383962
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 30602828Ki
  nvidia.com/gpu:         8
  nvidia.com/gpu.shared:  0
  pods:                   58

And finally, setting sharing to none disables MPS:

# apiclient set settings.kubelet-device-plugins.nvidia.device-sharing-strategy=none
# systemctl status nvidia-mps-control-daemon
● nvidia-mps-control-daemon.service - NVIDIA MPS Control Daemon
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-mps-control-daemon.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf
             /etc/systemd/system/nvidia-mps-control-daemon.service.d
             └─exec-start.conf
     Active: active (running) since Wed 2025-12-31 22:44:41 UTC; 2s ago
 Invocation: 82664a64fd044762a81ecef6d1cc0462
   Main PID: 9436 (/usr/bin/sleep)
      Tasks: 1 (limit: 36988)
     Memory: 308K (peak: 1.2M)
        CPU: 4ms
     CGroup: /system.slice/nvidia-mps-control-daemon.service
             └─9436 /usr/bin/sleep infinity

Dec 31 22:44:41 ip-192-168-12-91.us-west-2.compute.internal systemd[1]: Started NVIDIA MPS Control Daemon.

And the resource goes back down to 1.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Add support for NVIDIA Multi-Process Service (MPS) control daemon, including service configuration and device plugin updates. Signed-off-by: Matthew Yeazel <[email protected]>

vigh-m · 2026-01-06T19:47:55Z

packages/nvidia-k8s-device-plugin/nvidia-mps-control-daemon-exec-start-conf

+ExecStart=/usr/bin/mps-control-daemon --config-file /etc/nvidia-k8s-device-plugin/settings.yaml
+{{else}}
+ExecStart=
+ExecStart=/usr/bin/sleep infinity


As discussed. Options to remove the sleep are limited since a combination of our tooling and SystemD behaviour is preventing us from:

leaving the service in an exited state since SystemD will not try-restart it

letting the service fail till mps is available since the system will be in a degraded state

having the service without an [Install] section since we don't have logic to conditionally start and stop a service

having the entire service file be rendered on enabling mps since that would require a daemon reload which is not possible by the existing logic.

Without us writing new logic to conditionally enable and disable services, this would be the way forward to enable mps support

bcressey · 2026-01-09T19:06:08Z

packages/nvidia-k8s-device-plugin/nvidia-mps-control-daemon.service

+
+[Service]
+Type=simple
+ExecStart=/bin/true


I'd prefer /usr/bin/false here so that the start fails in a more obvious way.

bcressey · 2026-01-09T19:13:42Z

packages/nvidia-k8s-device-plugin/nvidia-mps-control-daemon-exec-start-conf

+[Service]
+{{#if (eq settings.kubelet-device-plugins.nvidia.device-sharing-strategy "mps")}}
+ExecStart=
+ExecStart=/usr/bin/mps-control-daemon --config-file /etc/nvidia-k8s-device-plugin/settings.yaml


In the code, there's this log message:

klog.Info("No devices are configured for MPS sharing; Waiting indefinitely.")

Which seems like the infinite sleep we want. Can we render the config in a way that triggers this?

nvidia-k8s-device-plugin: add MPS control daemon support

39a4394

Add support for NVIDIA Multi-Process Service (MPS) control daemon, including service configuration and device plugin updates. Signed-off-by: Matthew Yeazel <[email protected]>

yeazelm requested review from bcressey, cbgbt and piyush-jena December 31, 2025 22:46

This was referenced Dec 31, 2025

feat: add MPS GPU sharing settings model bottlerocket-os/bottlerocket-settings-sdk#107

Open

Support for CUDA MPS bottlerocket-os/bottlerocket#4673

Open

vigh-m reviewed Jan 6, 2026

View reviewed changes

bcressey reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MPS control daemon support to k8s device plugin #789

Add MPS control daemon support to k8s device plugin #789

yeazelm commented Dec 31, 2025

Uh oh!

vigh-m Jan 6, 2026

Uh oh!

bcressey Jan 9, 2026

Uh oh!

bcressey Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add MPS control daemon support to k8s device plugin #789

Are you sure you want to change the base?

Add MPS control daemon support to k8s device plugin #789

Conversation

yeazelm commented Dec 31, 2025

Setting in userdata for a g6.2xlarge which only has one GPU

Setting the MPS after boot

Uh oh!

vigh-m Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

bcressey Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

bcressey Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants