Make NVDRIVER configurable from env file#280
Conversation
|
|
||
| with_retry 3 10s sudo apt-get update | ||
| install_packages_with_retry cuda-drivers | ||
| install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}} |
There was a problem hiding this comment.
As a matter of interest, what are valid values for Version here then?
There was a problem hiding this comment.
Looking at the package names these are the "branch numbers": 550, 560, 570, for example?
There was a problem hiding this comment.
I am not checking for validity; if the user sets a wrong version, it will fail.
Valid versions are the ones provided by apt list -a cuda-drivers
There was a problem hiding this comment.
Valid versions are the ones provided by apt list -a cuda-drivers
Which versions are those?
There was a problem hiding this comment.
ubuntu@ip-10-0-0-196:~$ apt list -a cuda-drivers
Listing... Done
cuda-drivers/unknown 570.86.15-0ubuntu1 amd64
cuda-drivers/unknown 570.86.10-0ubuntu1 amd64
cuda-drivers/unknown 565.57.01-0ubuntu1 amd64
cuda-drivers/unknown 560.35.05-0ubuntu1 amd64
cuda-drivers/unknown 560.35.03-1 amd64
cuda-drivers/unknown 560.28.03-1 amd64
cuda-drivers/unknown 555.42.06-1 amd64
cuda-drivers/unknown 555.42.02-1 amd64
cuda-drivers/unknown 550.144.03-0ubuntu1 amd64
cuda-drivers/unknown 550.127.08-0ubuntu1 amd64
cuda-drivers/unknown 550.127.05-0ubuntu1 amd64
cuda-drivers/unknown 550.90.12-0ubuntu1 amd64
cuda-drivers/unknown 550.90.07-1 amd64
cuda-drivers/unknown 550.54.15-1 amd64
cuda-drivers/unknown 550.54.14-1 amd64
cuda-drivers/unknown 545.23.08-1 amd64
cuda-drivers/unknown 545.23.06-1 amd64
cuda-drivers/unknown 535.230.02-0ubuntu1 amd64
cuda-drivers/unknown 535.216.03-0ubuntu1 amd64
cuda-drivers/unknown 535.216.01-0ubuntu1 amd64
cuda-drivers/unknown 535.183.06-1 amd64
cuda-drivers/unknown 535.183.01-1 amd64
cuda-drivers/unknown 535.161.08-1 amd64
cuda-drivers/unknown 535.161.07-1 amd64
cuda-drivers/unknown 535.154.05-1 amd64
cuda-drivers/unknown 535.129.03-1 amd64
cuda-drivers/unknown 535.104.12-1 amd64
cuda-drivers/unknown 535.104.05-1 amd64
cuda-drivers/unknown 535.86.10-1 amd64
cuda-drivers/unknown 535.54.03-1 amd64
cuda-drivers/unknown 530.30.02-1 amd64
cuda-drivers/unknown 525.147.05-1 amd64
cuda-drivers/unknown 525.125.06-1 amd64
cuda-drivers/unknown 525.105.17-1 amd64
cuda-drivers/unknown 525.85.12-1 amd64
cuda-drivers/unknown 525.60.13-1 amd64
cuda-drivers/unknown 520.61.05-1 amd64
...There was a problem hiding this comment.
OK. Let's at least add a string to the Version field that indicates that this is (currently) appended to the cuda-drivers package name and installed using apt?
| ` | ||
|
|
||
| type NvDriver struct { | ||
| Version string |
There was a problem hiding this comment.
| Version string | |
| // Version -- if specified -- indicates the version of the `cuda-drivers` package to install. | |
| Version string |
| if env.Spec.NVIDIADriver.Version == "" { | ||
| return &NvDriver{} | ||
| } | ||
| return &NvDriver{ | ||
| Version: env.Spec.NVIDIADriver.Version, | ||
| } |
There was a problem hiding this comment.
| if env.Spec.NVIDIADriver.Version == "" { | |
| return &NvDriver{} | |
| } | |
| return &NvDriver{ | |
| Version: env.Spec.NVIDIADriver.Version, | |
| } | |
| return &NvDriver{ | |
| Version: env.Spec.NVIDIADriver.Version, | |
| } |
The check for the empty string is redundant since golang uses "" as a zero value for strings anyway.
| install_packages_with_retry cuda-drivers | ||
| install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}} | ||
|
|
||
| nvidia-smi -L |
There was a problem hiding this comment.
A nit here, but we should probably drop the -L so that we can also see the driver version being applied.
2a7950a to
571b07a
Compare
README.md
Outdated
| - 192.168.1.0/26 | ||
| image: | ||
| architecture: amd64 | ||
| architecture: |
|
|
||
| ``` | ||
|
|
||
| - 570.86.15-0ubuntu1 |
There was a problem hiding this comment.
| - 570.86.15-0ubuntu1 | |
| Where `<version>` can be a prefix of any package version. The following are example package versions: | |
| - 570.86.15-0ubuntu1 |
README.md
Outdated
| nvidiaDriver: | ||
| install: true | ||
| version: <version> | ||
|
|
There was a problem hiding this comment.
| nvidiaDriver: | |
| install: true | |
| version: <version> | |
| nvidiaDriver: | |
| install: true | |
| version: <version> |
pkg/provisioner/templates/test.bash
Outdated
| @@ -0,0 +1,207 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Should this file have been committed?
|
|
||
| with_retry 3 10s sudo apt-get update | ||
| install_packages_with_retry cuda-drivers | ||
| install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}} |
There was a problem hiding this comment.
Question: do we add --no-install-recommends here somewhere?
There was a problem hiding this comment.
No, here's the install_packages_with_retry function
install_packages_with_retry() {
local packages=("$@")
local max_retries=5
local retry_delay=5
for ((i=1; i<=$max_retries; i++)); do
echo "Attempt $i to install packages: ${packages[@]}"
# Attempt to install packages
sudo apt-get install -y "${packages[@]}"
# Check if the last command failed and the error is related to unsigned repository
if [ $? -ne 0 ] && grep -q 'NO_PUBKEY' <<< "$(tail -n 1 /var/lib/dpkg/status 2>/dev/null)"; then
echo "Error: Unsigned repository. Retrying in $retry_delay seconds..."
sleep $retry_delay
else
# Exit loop if installation is successful or the error is not related to unsigned repository
break
fi
done
}There was a problem hiding this comment.
added to the function
571b07a to
a4bc41d
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
a4bc41d to
8c347ce
Compare
No description provided.