Skip to content

Conversation

@yeazelm
Copy link
Contributor

@yeazelm yeazelm commented Nov 15, 2025

Related to : bottlerocket-os/bottlerocket#4218 (comment)

Description of changes:
This change moves the NVIDIA kmod package to load before drivers.target instead of preconfigured.target. This allows us to separate the state of loading all kernel modules from the rest of configuration.

Testing done:
Built NVIDIA variants for both arches and k8s 1.33 and ECS 2. Validated that the drivers come up for all 3 versions of the driver based upon hardware and fail as expected if no NVIDIA devices are present:

Failure on m6i.large:

[  OK  ] Finished Sets the hostname.
[  OK  ] Finished Bootstrap Commands.
[  OK  ] Finished Link Tesla kernel modules.
         Starting Load Tesla kernel modules...
[   28.104268] NVRM: No NVIDIA GPU found.
[   28.426169] NVRM: No NVIDIA GPU found.
[   28.745379] NVRM: No NVIDIA GPU found.
[   28.932771] driverdog[1535]: 02:34:25 [ERROR] '/usr/bin/modprobe' failed - stderr: modprobe: ERROR: could not insert 'nvidia': No such device
[   28.934905] driverdog[1535]: modprobe: ERROR: could not insert 'nvidia_modeset': No such device
[   28.936271] driverdog[1535]: modprobe: ERROR: could not insert 'nvidia_uvm': No such device
[FAILED] Failed to start Load Tesla kernel modules.
See 'systemctl status load-tesla-kernel-modules.service' for details.
[DEPEND] Dependency failed for Driver units.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Activate configured.target.
         Starting NVIDIA Persistence Daemon...
[FAILED] Failed to start NVIDIA Persistence Daemon.
See 'systemctl status nvidia-persistenced.service' for details.
[DEPEND] Dependency failed for Generate CDI specifications.
         Starting NVIDIA Grid Daemon...
	2025-11-15T02:35:39+00:00

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

The drivers.target is a more correct target for loading the NVIDIA
drivers. This shifts the Required target to drivers.target from
preconfigured.target.

Signed-off-by: Matthew Yeazel <[email protected]>
The drivers.target is a more correct target for loading the NVIDIA
drivers. This shifts the Required target to drivers.target from
preconfigured.target.

Signed-off-by: Matthew Yeazel <[email protected]>
The drivers.target is a more correct target for loading the NVIDIA
drivers. This shifts the Required target to drivers.target from
preconfigured.target.

Signed-off-by: Matthew Yeazel <[email protected]>
The drivers.target is a more correct target for loading the NVIDIA
drivers. This shifts the Required target to drivers.target from
preconfigured.target.

Signed-off-by: Matthew Yeazel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant