-
Notifications
You must be signed in to change notification settings - Fork 460
Add flag to disable bootstrap extension and improve error message #5509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to disable bootstrap extension and improve error message #5509
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5509 +/- ##
==========================================
- Coverage 52.87% 52.87% -0.01%
==========================================
Files 272 272
Lines 29473 29481 +8
==========================================
+ Hits 15583 15587 +4
- Misses 13083 13086 +3
- Partials 807 808 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
azure/defaults.go
Outdated
| var ( | ||
| // LinuxBootstrapExtensionCommand is the command the VM bootstrap extension will execute to verify Linux nodes bootstrap completes successfully. | ||
| LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) | ||
| LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then echo 'Error joining node to cluster: kubeadm init failed. To debug, check the cloud-init, kubelet, or other bootstrap logs: https://capz.sigs.k8s.io/self-managed/troubleshooting.html?highlight=kubeadmcontrolplane#checking-cloud-init-logs-ubuntu.'; exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to say "kubeadm init or join", as this error could occur on either the first control plane node (init) or any other node that joins the cluster afterwards
9cede97 to
a6117b3
Compare
|
/retest |
azure/defaults.go
Outdated
| var ( | ||
| // LinuxBootstrapExtensionCommand is the command the VM bootstrap extension will execute to verify Linux nodes bootstrap completes successfully. | ||
| LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) | ||
| LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then echo 'Error joining node to cluster: kubeadm init or join failed. To debug, check the cloud-init, kubelet, or other bootstrap logs: https://capz.sigs.k8s.io/self-managed/troubleshooting.html?highlight=kubeadmcontrolplane#checking-cloud-init-logs-ubuntu.'; exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure of the helpfulness of this message.
- Unless the verbosity of kubeadm init/join is increased by
--v=<any number greater than 5>users do not have a way to debug the issue. - Where else do we expect to see this message ? Is it available on the AzureMachine.Status ?
Suggestions:
- Should we be renaming
#checking-cloud-init-logs-ubuntutochecking-cloud-init-logs-linuxsince Linux is more generic ? - Can we paraphrase the error message from
Error joining node to cluster: kubeadm init or join failed......toError: kubeadm init or join failed. Refer: https://capz.sigs.k8s.io/self-managed/troubleshooting.html?highlight=kubeadmcontrolplane#checking-cloud-init-logs-ubuntu for more guidance on debugging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestions! Gonna work on modifying it now...
Unless the verbosity of kubeadm init/join is increased by --v=<any number greater than 5> users do not have a way to debug the issue.
Is that true? If so, is there a way to change that from CAPZ? I've debugged bootstrap failures previously with these methods.
Where else do we expect to see this message ? Is it available on the AzureMachine.Status ?
I think so but I'm not 100% sure. It should show up in the CAPZ controller logs for sure
|
As per discussion with @jackfrancis, I'll work on potentially changing the script to search the cloud init logs automatically for any kind of error, and displaying that to the user. |
a6117b3 to
28a9a15
Compare
28a9a15 to
cdfa8d0
Compare
cdfa8d0 to
b0f3689
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
LGTM label has been added. Git tree hash: 8e183ab5a3cdde10510a60e336ce6d482e8e8f5c
|
|
/lgtm |
|
Can we have an e2e test that could check for disabled/enabled CAPZ Linux Bootstrap Extension ? |
|
@nawazkh I'm not sure if adding an E2E test for this would be too much. We'll have to create an entirely new cluster with its own template to check it, and verifying the list of extensions in the unit test should be enough imo. It should be an easy implementation, but I'm just not sure if it's what we should do. wdyt @mboersma @jackfrancis |
Thank you for sharing the context, I align with your thoughts. Just wanted a source of truth to show that this works. /lgtm |
|
I will wait for @mboersma or @jackfrancis to weigh in since you pinged them. Otherwise, ready to be shipped from my end! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mboersma The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Approved yesterday and still hasn't merged ? Strange. |
|
Raised a question on tide regarding this in https://kubernetes.slack.com/archives/C7J9RP96G/p1745342990569219 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds a more descriptive error message for the CAPZ Bootstrapping Extension so that users can better understand what went wrong when the extension fails to install. It also adds a new field
DisableVMBootstrapExtensionto disable only the bootstrap extension and not every extension likeDisableExtensionOperationsWhich issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #5482
Special notes for your reviewer:
TODOs:
Release note: