Skip to content

docs: document NoExecute taint risks and add admission warning#120

Open
ketanjani21 wants to merge 1 commit intokubernetes-sigs:mainfrom
ketanjani21:noexecute-1
Open

docs: document NoExecute taint risks and add admission warning#120
ketanjani21 wants to merge 1 commit intokubernetes-sigs:mainfrom
ketanjani21:noexecute-1

Conversation

@ketanjani21
Copy link

Description

Addresses the risk of using NoExecute taints with continuous enforcement mode, which can cause unintended pod evictions.

Changes:

  • Add "Choosing a Taint Effect" section to concepts.md explaining the implications of each taint effect and when to use them
  • Return admission warnings when creating/updating rules with NoExecute effect
  • Add tests for the new warning logic
  • Update taint field godoc with brief caution note

Type of Change

/kind documentation
/kind feature

Testing

Checklist

  • make test passes
  • make lint passes

Does this PR introduce a user-facing change?

Add admission warning when creating NodeReadinessRule with NoExecute taint effect. Using NoExecute with continuous enforcement can cause pod evictions during transient condition failures. The webhook now returns a warning guiding users toward safer configurations.

Doc #9

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 5, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ketanjani21
Once this PR has been reviewed and has the lgtm label, please assign haircommander for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @ketanjani21!

It looks like this is your first PR to kubernetes-sigs/node-readiness-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/node-readiness-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 5, 2026
@netlify
Copy link

netlify bot commented Feb 5, 2026

Deploy Preview for node-readiness-controller canceled.

Name Link
🔨 Latest commit 8c9de75
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/698c17487d90030009cabc78


This allows you to preview exactly which nodes would be affected and identifying any potential misconfigurations (like a typo in a label selector) before they impact your cluster.

## Choosing a Taint Effect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, taint introduction doesnt need to be here. Maybe we could link to the official documentation?

And what do you think about creating a separate page under user-guide for creating rules? More like how-to or getting started page?

I'm thinking aspects like #46, prefix restrictions introduced by #106 could also go in there.

I imagine this is generated with some llm help? IMO, I'm generally open to AI generating docs, but it could be concise to give more value to reader from reading less. just my thought.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajaysundark Thank you for the feedback. I noticed there was a getting-started.md already under docs, so I moved it under user-guide, updated it, and re-arranged few sections.

I also updated the information about prefix restrictions.

The maximum limits for the controller are already documented here, so I didn't duplicate the info again.

Happy to make further updates based on your feedback.

Copy link
Contributor

@ajaysundark ajaysundark Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds fair. we could leave the limits at the reference spec.

^it looks to me that docs/spec.md is a missed clean up as it is not maintained (crd-ref-docs only updates docs/book/src/reference/api-spec.md).
Would you be able to help on this as a follow-up PR?

cc @sreeram-venkitesh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #124

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, happy to do a follow-up PR to cleanup docs/spec.md 👍

@ajaysundark
Copy link
Contributor

/assign @ketanjani21

This is the priority class used by other critical cluster components (eg: core-dns).

**Images**: The official releases use multi-arch images (AMD64, Arm64).
**Images**: The official releases use multi-arch images (AMD64, Arm64) available in the Kubernetes staging registry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct. staging registry is different than 'registry.k8s.io/xx' which is where the official images can be consumed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching, it got carried over from the root level getting-started.md. updated 👍

**Images**: The official releases use multi-arch images (AMD64, Arm64) available in the Kubernetes staging registry:

### Option 2: Deploy Using Kustomize
```sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have installation flows captured: simple consume from official release images, and using kustomize for advanced setup.

I think documenting 3 different installation paths would just be confusing the reader. Maybe we could hold on to this for now and come back to updating the installation section for #94.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I've simplified the installation method for now, we can re-visit later as per your suggestion.

restructure docs

update taint key naming instructions

update registry url, simplify installation methods
Copy link
Contributor

@ajaysundark ajaysundark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ketanjani21 Left some suggestions, ptal.
Thanks for looking into this.

@@ -36,16 +34,10 @@ This is the priority class used by other critical cluster components (eg: core-d

**Images**: The official releases use multi-arch images (AMD64, Arm64).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Images**: The official releases use multi-arch images (AMD64, Arm64).
#### Images
The official releases use multi-arch images (AMD64, Arm64) and are available at `registry.k8s.io/node-readiness-controller/node-readiness-controller`


If you have cloned the repository and want to deploy from source, you can use Kustomize.

```sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could skip this detail


## Deployment Options

### Option 1: Install Official Release (Recommended)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could keep this as well


**Images**: The official releases use multi-arch images (AMD64, Arm64).

### Option 2: Deploy Using Kustomize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert to existing, let us keep the kustomize option

Comment on lines 14 to 31
```yaml
apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
name: storage-readiness-rule
spec:
conditions:
- type: "storage.kubernetes.io/CSIReady"
requiredStatus: "True"
- type: "storage.kubernetes.io/VolumePluginReady"
requiredStatus: "True"
taint:
key: "readiness.k8s.io/StorageReady"
effect: "NoSchedule"
value: "pending"
enforcementMode: "bootstrap-only"
nodeSelector:
matchLabels:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```yaml
apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
name: nfs-storage-readiness-rule
spec:
conditions:
- type: "csi.example.net/NodePluginRegistered"
requiredStatus: "True"
- type: "csi.example.net/BackendReachable"
requiredStatus: "True"
- type: "DiskPressure"
requiredStatus: "False"
taint:
key: "readiness.k8s.io/vendor.com/nfs-unhealthy"
effect: "NoSchedule"
enforcementMode: "bootstrap-only"
nodeSelector:
matchLabels:

effect: "NoSchedule"

# Rule 2 - This will be REJECTED
spec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spec:
spec:
conditions:
- type: "cniplugin.example.net/rdma/NetworkReady"
requiredStatus: "True"

- Removes bootstrap taint when conditions are first satisfied
- Marks completion with node annotation
- Stops monitoring after successful removal (fail-safe)
- Ideal for one-time setup conditions (storage, installing node daemons e.g: security agent or kernel-module update)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Ideal for one-time setup conditions (storage, installing node daemons e.g: security agent or kernel-module update)
- Ideal for one-time setup conditions (installing node daemons e.g: security agent or kernel-module update)

requiredStatus: "True"
```

#### With Cluster Autoscaler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us leave Cluster Autoscaler for now


## Configuration

### Security
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us leave security for now, we'll find a separate page with more details later.

- Controller removes taints once conditions are satisfied
- Autoscaler can safely scale knowing nodes are truly ready

## Development
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us also remove the Development section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants