Skip to content

Bug: Scheduler plugin panic: "cannot add Pod that already exists" #1358

@sharnoff

Description

@sharnoff

Environment

Prod

Steps to reproduce

Unknown — this happened soon after scheduler startup, and after restart it was totally fine.

There was a panic in the Score plugin, trying to call pkg/plugin/state.(*Node).AddPod(...) — presumably the Score plugin was run after the pod was already added...?

Expected result

Probably we should just fail to score, rather than panicking?

Actual result

Scheduler plugin panicked, taking down the entire scheduler, with:

E0410 17:12:36.125777       1 node.go:300] "Observed a panic" panic="cannot add Pod that already exists" stacktrace=<
	goroutine 892 [running]:
	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2c15fa0, 0x4317600}, {0x227ebe0, 0x2be3e20})
		/go/pkg/mod/k8s.io/apimachinery@v0.31.7/pkg/util/runtime/runtime.go:107 +0xbc
	k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x2c15fa0, 0x4317600}, {0x227ebe0, 0x2be3e20}, {0x4317600, 0x0, 0x10000000043aa45?})
		/go/pkg/mod/k8s.io/apimachinery@v0.31.7/pkg/util/runtime/runtime.go:82 +0x5e
	k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc02a400a80?})
		/go/pkg/mod/k8s.io/apimachinery@v0.31.7/pkg/util/runtime/runtime.go:59 +0x108
	panic({0x227ebe0?, 0x2be3e20?})
		/usr/local/go/src/runtime/panic.go:791 +0x132
	github.com/neondatabase/autoscaling/pkg/plugin/state.(*Node).AddPod(0xc02a912bd0, {{{0xc02a7f6e40, 0x7}, {0xc02a82a020, 0x20}}, {0xc02a741830, 0x24}, {0x0, 0xedf89f302, 0x42f38e0}, ...})
		/workspace/pkg/plugin/state/node.go:300 +0x2cf
	github.com/neondatabase/autoscaling/pkg/plugin.(*AutoscaleEnforcer).Score.func2(0xc02a912bd0)
		/workspace/pkg/plugin/framework_methods.go:327 +0x78
	github.com/neondatabase/autoscaling/pkg/plugin/state.(*Node).Speculatively(0xc01b37d340, 0xc02aea9b30)
		/workspace/pkg/plugin/state/node.go:199 +0x234
	github.com/neondatabase/autoscaling/pkg/plugin.(*AutoscaleEnforcer).Score(0xc016092738, {0x2c16090?, 0xc02a402e60?}, 0xc029dc6740?, 0xc02a824908, {0xc01c2418c0, 0x2e})
		/workspace/pkg/plugin/framework_methods.go:326 +0xb45
	k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*instrumentedScorePlugin).Score(0xc0157708e0, {0x2c16090, 0xc02a402e60}, 0xc029dc6740, 0xc02a824908, {0xc01c2418c0, 0x2e})
		/go/pkg/mod/k8s.io/kubernetes@v1.31.7/pkg/scheduler/framework/runtime/instrumented_plugins.go:82 +0x75
	k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runScorePlugin(0x23d1580?, {0x2c16090?, 0xc02a402e60?}, {0x2c054b0?, 0xc0157708e0?}, 0x656b616c2d646567?, 0x377634773635612d?, {0xc01c2418c0?, 0x657461766972702d?})
		/go/pkg/mod/k8s.io/kubernetes@v1.31.7/pkg/scheduler/framework/runtime/framework.go:1211 +0x2ed
	k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunScorePlugins.func2(0x2)
		/go/pkg/mod/k8s.io/kubernetes@v1.31.7/pkg/scheduler/framework/runtime/framework.go:1140 +0x3b4
	k8s.io/kubernetes/pkg/scheduler/framework/parallelize.Parallelizer.Until.func1(0x2)
		/go/pkg/mod/k8s.io/kubernetes@v1.31.7/pkg/scheduler/framework/parallelize/parallelism.go:60 +0x46
	k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
		/go/pkg/mod/k8s.io/client-go@v0.31.7/util/workqueue/parallelizer.go:90 +0xf3
	created by k8s.io/client-go/util/workqueue.ParallelizeUntil in goroutine 729
		/go/pkg/mod/k8s.io/client-go@v0.31.7/util/workqueue/parallelizer.go:76 +0x1fb
 >

Other logs, links

It's semi-new code from the scheduler rewrite in #1163 — but this was also the first release with the latest Kubernetes upgrade, so potentially #1322 ?

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions