Skip to content

[BUG] Webhook Health Checker Panics on Startup, Crashes Controller #2355

@sagar-h007

Description

@sagar-h007

What happened:

The kruise-controller webhook server crashes on startup with an unrecoverable panic if:

  1. The CA certificate file cannot be loaded from /kruise/cert-dir/ca-cert.pem, OR
  2. The filesystem watcher (fsnotify.Watcher) fails to initialize, OR
  3. The watcher fails to add the CA cert file path

The panic occurs in pkg/webhook/util/health/checker.go at lines 113, 117, and 120 within a sync.Once block, making it impossible to recover.

What you expected to happen:

The health checker should:

  1. Return an error instead of panicking
  2. Allow the webhook server to start with a failing health check
  3. Log the error and retry initialization gracefully
  4. Not crash the entire controller process

How to reproduce it (as minimally and precisely as possible):

  1. Deploy kruise-controller in a namespace where the cert directory is misconfigured or missing
  2. OR: Deploy on a filesystem that doesn't support fsnotify (certain network filesystems)
  3. OR: Set incorrect permissions on the certificate directory
  4. Observe the controller pod crash-looping with panic trace like:

Metadata

Metadata

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions