docs: document NoExecute taint risks and add admission warning#120
docs: document NoExecute taint risks and add admission warning#120ketanjani21 wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ketanjani21 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @ketanjani21! |
✅ Deploy Preview for node-readiness-controller canceled.
|
docs/book/src/user-guide/concepts.md
Outdated
|
|
||
| This allows you to preview exactly which nodes would be affected and identifying any potential misconfigurations (like a typo in a label selector) before they impact your cluster. | ||
|
|
||
| ## Choosing a Taint Effect |
There was a problem hiding this comment.
IMO, taint introduction doesnt need to be here. Maybe we could link to the official documentation?
And what do you think about creating a separate page under user-guide for creating rules? More like how-to or getting started page?
I'm thinking aspects like #46, prefix restrictions introduced by #106 could also go in there.
I imagine this is generated with some llm help? IMO, I'm generally open to AI generating docs, but it could be concise to give more value to reader from reading less. just my thought.
There was a problem hiding this comment.
@ajaysundark Thank you for the feedback. I noticed there was a getting-started.md already under docs, so I moved it under user-guide, updated it, and re-arranged few sections.
I also updated the information about prefix restrictions.
The maximum limits for the controller are already documented here, so I didn't duplicate the info again.
Happy to make further updates based on your feedback.
There was a problem hiding this comment.
sounds fair. we could leave the limits at the reference spec.
^it looks to me that docs/spec.md is a missed clean up as it is not maintained (crd-ref-docs only updates docs/book/src/reference/api-spec.md).
Would you be able to help on this as a follow-up PR?
There was a problem hiding this comment.
yes, happy to do a follow-up PR to cleanup docs/spec.md 👍
|
/assign @ketanjani21 |
6a87bb5 to
890f970
Compare
| This is the priority class used by other critical cluster components (eg: core-dns). | ||
|
|
||
| **Images**: The official releases use multi-arch images (AMD64, Arm64). | ||
| **Images**: The official releases use multi-arch images (AMD64, Arm64) available in the Kubernetes staging registry: |
There was a problem hiding this comment.
this is not correct. staging registry is different than 'registry.k8s.io/xx' which is where the official images can be consumed.
There was a problem hiding this comment.
Thank you for catching, it got carried over from the root level getting-started.md. updated 👍
| **Images**: The official releases use multi-arch images (AMD64, Arm64) available in the Kubernetes staging registry: | ||
|
|
||
| ### Option 2: Deploy Using Kustomize | ||
| ```sh |
There was a problem hiding this comment.
we already have installation flows captured: simple consume from official release images, and using kustomize for advanced setup.
I think documenting 3 different installation paths would just be confusing the reader. Maybe we could hold on to this for now and come back to updating the installation section for #94.
There was a problem hiding this comment.
makes sense, I've simplified the installation method for now, we can re-visit later as per your suggestion.
restructure docs update taint key naming instructions update registry url, simplify installation methods
6f3eb9e to
8c9de75
Compare
ajaysundark
left a comment
There was a problem hiding this comment.
@ketanjani21 Left some suggestions, ptal.
Thanks for looking into this.
| @@ -36,16 +34,10 @@ This is the priority class used by other critical cluster components (eg: core-d | |||
|
|
|||
| **Images**: The official releases use multi-arch images (AMD64, Arm64). | |||
There was a problem hiding this comment.
| **Images**: The official releases use multi-arch images (AMD64, Arm64). | |
| #### Images | |
| The official releases use multi-arch images (AMD64, Arm64) and are available at `registry.k8s.io/node-readiness-controller/node-readiness-controller` |
|
|
||
| If you have cloned the repository and want to deploy from source, you can use Kustomize. | ||
|
|
||
| ```sh |
There was a problem hiding this comment.
I think we could skip this detail
|
|
||
| ## Deployment Options | ||
|
|
||
| ### Option 1: Install Official Release (Recommended) |
There was a problem hiding this comment.
we could keep this as well
|
|
||
| **Images**: The official releases use multi-arch images (AMD64, Arm64). | ||
|
|
||
| ### Option 2: Deploy Using Kustomize |
There was a problem hiding this comment.
please revert to existing, let us keep the kustomize option
| ```yaml | ||
| apiVersion: readiness.node.x-k8s.io/v1alpha1 | ||
| kind: NodeReadinessRule | ||
| metadata: | ||
| name: storage-readiness-rule | ||
| spec: | ||
| conditions: | ||
| - type: "storage.kubernetes.io/CSIReady" | ||
| requiredStatus: "True" | ||
| - type: "storage.kubernetes.io/VolumePluginReady" | ||
| requiredStatus: "True" | ||
| taint: | ||
| key: "readiness.k8s.io/StorageReady" | ||
| effect: "NoSchedule" | ||
| value: "pending" | ||
| enforcementMode: "bootstrap-only" | ||
| nodeSelector: | ||
| matchLabels: |
There was a problem hiding this comment.
| ```yaml | |
| apiVersion: readiness.node.x-k8s.io/v1alpha1 | |
| kind: NodeReadinessRule | |
| metadata: | |
| name: nfs-storage-readiness-rule | |
| spec: | |
| conditions: | |
| - type: "csi.example.net/NodePluginRegistered" | |
| requiredStatus: "True" | |
| - type: "csi.example.net/BackendReachable" | |
| requiredStatus: "True" | |
| - type: "DiskPressure" | |
| requiredStatus: "False" | |
| taint: | |
| key: "readiness.k8s.io/vendor.com/nfs-unhealthy" | |
| effect: "NoSchedule" | |
| enforcementMode: "bootstrap-only" | |
| nodeSelector: | |
| matchLabels: |
| effect: "NoSchedule" | ||
|
|
||
| # Rule 2 - This will be REJECTED | ||
| spec: |
There was a problem hiding this comment.
| spec: | |
| spec: | |
| conditions: | |
| - type: "cniplugin.example.net/rdma/NetworkReady" | |
| requiredStatus: "True" |
| - Removes bootstrap taint when conditions are first satisfied | ||
| - Marks completion with node annotation | ||
| - Stops monitoring after successful removal (fail-safe) | ||
| - Ideal for one-time setup conditions (storage, installing node daemons e.g: security agent or kernel-module update) |
There was a problem hiding this comment.
| - Ideal for one-time setup conditions (storage, installing node daemons e.g: security agent or kernel-module update) | |
| - Ideal for one-time setup conditions (installing node daemons e.g: security agent or kernel-module update) |
| requiredStatus: "True" | ||
| ``` | ||
|
|
||
| #### With Cluster Autoscaler |
There was a problem hiding this comment.
let us leave Cluster Autoscaler for now
|
|
||
| ## Configuration | ||
|
|
||
| ### Security |
There was a problem hiding this comment.
Let us leave security for now, we'll find a separate page with more details later.
| - Controller removes taints once conditions are satisfied | ||
| - Autoscaler can safely scale knowing nodes are truly ready | ||
|
|
||
| ## Development |
There was a problem hiding this comment.
Let us also remove the Development section
Description
Addresses the risk of using NoExecute taints with continuous enforcement mode, which can cause unintended pod evictions.
Changes:
concepts.mdexplaining the implications of each taint effect and when to use themType of Change
/kind documentation
/kind feature
Testing
Checklist
make testpassesmake lintpassesDoes this PR introduce a user-facing change?
Add admission warning when creating
NodeReadinessRulewithNoExecutetaint effect. UsingNoExecutewith continuous enforcement can cause pod evictions during transient condition failures. The webhook now returns a warning guiding users toward safer configurations.Doc #9