-
Notifications
You must be signed in to change notification settings - Fork 37
Add dependency on the GRID license for kubelet #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| NVidiaEULAforAWS.pdf | ||
| COPYING | ||
| *.rpm | ||
| NvidiaGridAWSUserLicenseAgreement.DOCX |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| [Unit] | ||
| Description=GRID License Check | ||
| RefuseManualStart=true | ||
| RefuseManualStop=true | ||
| DefaultDependencies=no | ||
| Before=kubelet.service | ||
| After=nvidia-gridd.service | ||
| Requires=nvidia-gridd.service | ||
|
|
||
| [Service] | ||
| Type=oneshot | ||
| ExecCondition=/usr/bin/ghostdog match-nvidia-driver grid | ||
| # Otherwise, attempt to load the module. | ||
| ExecStart=/usr/bin/nvidia-smi -q | ||
| # Ensure that the stderr file exists. Otherwise, grep fails on an empty file. | ||
| ExecStart=-/usr/bin/touch /tmp/.nvidia-gridd-license | ||
|
Comment on lines
+15
to
+16
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
(would have to move this before the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like that, I'll give it a shot to see if it still gives me the behavior I want with things rearranged to use |
||
| # Succeed unless there was a fatal error. | ||
| ExecStart=/usr/bin/grep -Fqvzw Unlicensed /tmp/.nvidia-gridd-license | ||
| RemainAfterExit=true | ||
| StandardOutput=append:/tmp/.nvidia-gridd-license | ||
| Restart=on-failure | ||
| RestartSec=1 | ||
| StartLimitBurst=120 | ||
|
|
||
| [Install] | ||
| RequiredBy=kubelet.service | ||
|
Comment on lines
+25
to
+26
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Below it's required by
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, this was originally
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| NVidiaEULAforAWS.pdf | ||
| COPYING | ||
| *.rpm | ||
| NvidiaGridAWSUserLicenseAgreement.DOCX |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| [Unit] | ||
| Description=GRID License Check | ||
| RefuseManualStart=true | ||
| RefuseManualStop=true | ||
| DefaultDependencies=no | ||
| Before=kubelet.service | ||
| After=nvidia-gridd.service | ||
| Requires=nvidia-gridd.service | ||
|
|
||
| [Service] | ||
| Type=oneshot | ||
| ExecCondition=/usr/bin/ghostdog match-nvidia-driver grid | ||
| # Otherwise, attempt to load the module. | ||
| ExecStart=/usr/bin/nvidia-smi -q | ||
| # Ensure that the stderr file exists. Otherwise, grep fails on an empty file. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's the STDOUT file what you are creating, right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this is a forgotten update, I moved to STDOUT but forgot to update the comment. |
||
| ExecStart=-/usr/bin/touch /tmp/.nvidia-gridd-license | ||
| # Succeed unless there was a fatal error. | ||
| ExecStart=/usr/bin/grep -Fqvzw Unlicensed /tmp/.nvidia-gridd-license | ||
| RemainAfterExit=true | ||
| StandardOutput=append:/tmp/.nvidia-gridd-license | ||
| Restart=on-failure | ||
| RestartSec=1 | ||
| StartLimitBurst=120 | ||
|
|
||
| [Install] | ||
| RequiredBy=nvidia-k8s-device-plugin.service | ||
|
Comment on lines
+25
to
+26
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this requirement cause the k8s device plugin to fail if the license check fails? I'm kind of worried about cluttering up the logs with a lot of failures. This could possibly be modeled as:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might be much cleaner in the logs, otherwise the unit is very angry and noisy in the journal when its failing. I'll play with that as a potential alternative to this. FWIW though I haven't seen this fail yet before the next unit runs when we are going to get a license, so it might be a situation where the only time its noisy, is when the node is already in a bad state. Nonetheless, I think making it cleaner is worth it. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| NVidiaEULAforAWS.pdf | ||
| COPYING | ||
| *.rpm | ||
| NvidiaGridAWSUserLicenseAgreement.DOCX |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| [Unit] | ||
| Description=GRID License Check | ||
| RefuseManualStart=true | ||
| RefuseManualStop=true | ||
| DefaultDependencies=no | ||
| Before=kubelet.service | ||
| After=nvidia-gridd.service | ||
| Requires=nvidia-gridd.service | ||
|
|
||
| [Service] | ||
| Type=oneshot | ||
| ExecCondition=/usr/bin/ghostdog match-nvidia-driver grid | ||
| # Otherwise, attempt to load the module. | ||
| ExecStart=/usr/bin/nvidia-smi -q | ||
| # Ensure that the stderr file exists. Otherwise, grep fails on an empty file. | ||
| ExecStart=-/usr/bin/touch /tmp/.nvidia-gridd-license | ||
| # Succeed unless there was a fatal error. | ||
| ExecStart=/usr/bin/grep -Fqvzw Unlicensed /tmp/.nvidia-gridd-license | ||
| RemainAfterExit=true | ||
| StandardOutput=append:/tmp/.nvidia-gridd-license | ||
| Restart=on-failure | ||
| RestartSec=1 | ||
| StartLimitBurst=120 | ||
|
|
||
| [Install] | ||
| RequiredBy=kubelet.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, will this line actually load the module or are you just generating output that will be
greped later?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment might make more sense above the ExecCondition