Skip to content

DRA: Provide helper for implementing webhooks for opaque config #134792

@nojnhuh

Description

@nojnhuh

What would you like to be added?

A new webhook package in the k8s.io/dynamic-resource-allocation module should implement an abstraction on top of a generic validating webhook that drivers can use to plug in their own logic to validate the opaque config present in ResourceClaims and ResourceClaimTemplates.

Why is this needed?

DRA drivers are currently on their own to implement validating webhooks "from scratch" which involves a non-trivial amount of boilerplate that is equivalent for most drivers. Starting an HTTPS server, reading the AdmissionReview object, extracting the opaque config from the ResourceClaim or ResourceClaimTemplate within, and writing the response can all be shared be shared by the vast majority of DRA drivers, leaving only the validation logic to be defined by each driver.

To illustrate, this snippet dra-example-driver's webhook shows the extent of the driver-specific logic amongst the rest of the webhook implementation: https://github.com/kubernetes-sigs/dra-example-driver/blob/ba82bf55dad820297b124b64cb4487958bb17466/cmd/dra-example-webhook/main.go#L275-L283

The NVIDIA DRA driver has a similar amount of nearly identical boilerplate around its specific logic: https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/7f591c28f3853f432d11985ecc533ed63e472b9f/cmd/webhook/main.go#L259-L284

The dra-example-driver and NVIDIA driver have already gone back and forth a few times sharing improvements to the boilerplate to handle things like natively validating different API versions of ResourceClaims. Being able to more easily collaborate that way on a shared implementation would benefit all drivers looking to add webhooks.

With webhooks also being a key reason why KEP-5322 was deferred beyond v1.35, making it easier for drivers to implement webhooks will help improve DRA driver quality and cluster stability overall.

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.wg/device-managementCategorizes an issue or PR as relevant to WG Device Management.

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions