Kubernetes controller that automatically assigns node roles based on a configurable label value (e.g., nodeGroup=gpu-worker becomes node-role.kubernetes.io/gpu-worker).
By default, kubeadm enables the NodeRestriction admission controller that restricts what labels kubelet can self-apply on node registration. The node-role.kubernetes.io/* label is restricted and can't be set in cloud init scripts or during other node bootstrap processes.
helm upgrade --install works for both fresh installs and upgrades:
helm upgrade --install node-role-controller \
oci://ghcr.io/mchmarny/node-role-controller \
-n node-role-controller --create-namespaceTo schedule on tainted nodes, add tolerations:
helm upgrade --install node-role-controller \
oci://ghcr.io/mchmarny/node-role-controller \
-n node-role-controller --create-namespace \
--set-json 'tolerations=[{"key":"dedicated","value":"system-workload","operator":"Equal","effect":"NoExecute"},{"key":"dedicated","value":"system-workload","operator":"Equal","effect":"NoSchedule"}]'kubectl apply -f https://raw.githubusercontent.com/mchmarny/rolesetter/refs/heads/main/deployment/manifest.yamlkubectl apply -k deployment/overlays/prodThe controller is configured via environment variables sourced from a ConfigMap. With Helm, set values directly:
helm upgrade --install node-role-controller \
oci://ghcr.io/mchmarny/node-role-controller \
-n node-role-controller --create-namespace \
--set config.roleLabel=nodeGroup \
--set config.roleReplace=true \
--set config.logLevel=debug| Parameter | Default | Description |
|---|---|---|
config.roleLabel |
nodeGroup |
Source label whose value becomes the node role |
config.roleReplace |
false |
Replace existing node-role.kubernetes.io/* labels |
config.logLevel |
info |
Log level (debug, info, warn, error) |
replicas |
1 |
Number of controller replicas (leader election enabled) |
image.tag |
Chart appVersion |
Override the image tag |
resources.requests.cpu |
50m |
CPU request |
resources.requests.memory |
64Mi |
Memory request |
resources.limits.cpu |
250m |
CPU limit |
resources.limits.memory |
256Mi |
Memory limit |
tolerations |
[] |
Pod tolerations |
nodeSelector |
{} |
Pod node selector |
After changing configuration, restart to apply:
kubectl -n node-role-controller rollout restart deployment node-role-controller
helm uninstall node-role-controller -n node-role-controller- Nodes are labeled with a source label (e.g.,
nodeGroup=gpu-worker) - The controller watches node add/update events via a Kubernetes informer
- When a node has the source label, the controller patches it with
node-role.kubernetes.io/<value> - Leader election via Lease ensures only one replica is active
Example: A node with nodeGroup=gpu-worker gets node-role.kubernetes.io/gpu-worker.
| Metric | Description |
|---|---|
node_role_patch_success_total |
Successful patch operations (labeled by role) |
node_role_patch_failure_total |
Failed patch operations (labeled by role) |
Available at /metrics on port 8080. Health at /healthz, readiness at /readyz.
Every release includes SLSA provenance attestation:
export IMAGE=ghcr.io/mchmarny/node-role-controller:v0.6.0
cosign verify-attestation \
--output json \
--type slsaprovenance \
--certificate-identity-regexp 'https://github.com/.*/.*/.github/workflows/.*' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
$IMAGETo enforce verification in-cluster with the Sigstore policy controller:
kubectl label namespace node-labeler policy.sigstore.dev/include=true
kubectl apply -f policy/clusterimagepolicy.yamlSee CONTRIBUTING.md. Run make pre before submitting PRs.
This is my personal project and it does not represent my employer. While I do my best to ensure that everything works, I take no responsibility for issues caused by this code.