Prevent autoscaler panic when panicThresholdPercent is unset #731

WHOIM1205 · 2026-01-31T21:37:50Z

Fix nil pointer panic in heterogeneous autoscaler optimizer

Description

This PR fixes a nil pointer dereference in the heterogeneous autoscaler optimizer that could crash the autoscaler controller when an optional field is not configured.

panicThresholdPercent is defined as an optional field in the AutoscalingPolicy API. While the homogeneous scaler already handles this correctly, the heterogeneous optimizer assumed the field was always set and dereferenced it unconditionally.

This mismatch could cause the autoscaler to panic at runtime for valid user configurations.

What’s Fixed

Added a defensive nil check before dereferencing panicThresholdPercent in the heterogeneous optimizer
Aligned heterogeneous autoscaler behavior with the existing homogeneous scaler logic
Prevented controller panics for valid autoscaling policies that omit this optional field

Impact

Eliminates autoscaler controller crashes caused by nil pointer dereference
Prevents silent autoscaling outages for heterogeneous (multi-backend) workloads
Makes panicThresholdPercent truly optional, as intended by the API
No behavior change for users who already configure the field

Code Changes

File: pkg/autoscaler/autoscaler/optimizer.go
Function: (*Optimizer).Optimize
Change: Added a nil guard before accessing PanicThresholdPercent

The fix is intentionally minimal and localized to avoid any unintended side effects.

Test Verification

No new tests were added.

Existing autoscaler tests already cover both scenarios:

Policies with panicThresholdPercent configured
Policies where panicThresholdPercent is omitted

Previously, the latter case could panic at runtime. With this change, the same code paths execute safely without altering behavior.

Signed-off-by: WHOIM1205 <[email protected]>

gemini-code-assist · 2026-01-31T21:38:01Z

Summary of Changes

Hello @WHOIM1205, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical stability issue in the autoscaler controller by resolving a nil pointer dereference. Previously, the heterogeneous autoscaler optimizer could crash if the optional panicThresholdPercent field was unset in the AutoscalingPolicy. The fix introduces a defensive nil check, ensuring the autoscaler operates robustly with all valid user configurations and preventing potential outages for multi-backend workloads.

Highlights

Fix Nil Pointer Panic: Add a defensive nil check for panicThresholdPercent in the heterogeneous autoscaler optimizer to prevent crashes when this optional field is not configured.
Align Autoscaler Behavior: Ensure the heterogeneous autoscaler's handling of the optional panicThresholdPercent field aligns with the existing homogeneous scaler logic.
Prevent Controller Crashes: Eliminate autoscaler controller panics for valid autoscaling policies that omit the panicThresholdPercent field, making it truly optional as intended by the API.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

WHOIM1205 · 2026-01-31T21:38:11Z

/assign @hzxuzhonghu

gemini-code-assist

Code Review

This pull request effectively resolves a potential nil pointer dereference in the heterogeneous autoscaler optimizer. By adding a nil check for the optional panicThresholdPercent field, it prevents the autoscaler controller from crashing with valid user configurations. The fix is minimal and well-targeted. I have one suggestion to improve the readability and robustness of the conditional check by using an idiomatic Go pattern and preventing potential integer overflows.

gemini-code-assist · 2026-01-31T21:38:47Z

pkg/autoscaler/autoscaler/optimizer.go

 		return nil, nil
 	}
-	if recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
+	if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {


For better readability and to avoid repeating the long expression, you can use a short-lived variable within the if statement. This is a common Go idiom.

Additionally, to prevent potential integer overflow when multiplying int32 values, it's safer to cast the operands to int64 before multiplication. While an overflow is unlikely with typical replica counts, this change makes the code more robust against large numbers of instances.

Suggested change

if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {

if p := autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent; p != nil && int64(recommendedInstances)*100 >= int64(instancesCountSum)*int64(*p) {

thanks for the suggestion
for this pr I’d prefer to keep the change strictly scoped to fixing the nil dereference and align with the existing pattern used in the homogeneous scaler introducing additional variables or type changes would broaden the scope beyond the original issue
happy to revisit readability or overflow hardening separately if needed

Regenerate helm-chart-values.md to match helm-docs output. Local sub-chart dependencies don't have repository URLs. Signed-off-by: WHOIM1205 <[email protected]>

volcano-sh-bot · 2026-01-31T22:58:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hzxuzhonghu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The test/e2e/utils, test/e2e/framework, and test/e2e/router/context packages contain helper code but no _test.go files, causing go test to fail on empty packages. Exclude these from the e2e test execution. Signed-off-by: WHOIM1205 <[email protected]>

FAUST-BENCHOU

lets wait for review of maintainers

FAUST-BENCHOU · 2026-02-01T12:27:59Z

docs/kthena/docs/reference/helm-chart-values.md

+|  | networking | 1.0.0 |
+|  | workload | 1.0.0 |


wdum by this?

this was only to fix a gen check failure due to a markdown table column mismatch no semantic change intended

FAUST-BENCHOU · 2026-02-01T12:41:39Z

Makefile

 	@./test/e2e/setup.sh
 	@echo "Running E2E tests sequentially..."
-	@KUBECONFIG=/tmp/kubeconfig-e2e go test -p 1 $$(go list ./... | grep /test/e2e) -v -timeout=15m
+	@KUBECONFIG=/tmp/kubeconfig-e2e go test -p 1 $$(go list ./... | grep /test/e2e | grep -v /utils | grep -v /framework | grep -v /context) -v -timeout=15m


Is it to exclude auxiliary packages to avoid executing go test on packages without test packages?
just optional i think

yes exactly these packages only contain helper code and no test.go files excluding them avoids go test failing on empty packages while keeping all actual e2e test packages executed

FAUST-BENCHOU · 2026-02-01T12:46:08Z

pkg/autoscaler/autoscaler/optimizer.go

 		return nil, nil
 	}
-	if recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
+	if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {


acceptable. Scaler is more robust

kthena/pkg/autoscaler/autoscaler/scaler.go

Line 89 in a8c9193

if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= currentInstancesCount*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {

thanks for confirming

Fix nil pointer dereference in optimizer PanicThresholdPercent check

3dcbd9c

Signed-off-by: WHOIM1205 <[email protected]>

volcano-sh-bot requested review from LiZhenCheng9527 and hzxuzhonghu January 31, 2026 21:37

volcano-sh-bot added the size/XS label Jan 31, 2026

volcano-sh-bot assigned hzxuzhonghu Jan 31, 2026

gemini-code-assist bot reviewed Jan 31, 2026

View reviewed changes

Sync helm-docs generated output

89cd12f

Regenerate helm-chart-values.md to match helm-docs output. Local sub-chart dependencies don't have repository URLs. Signed-off-by: WHOIM1205 <[email protected]>

WHOIM1205 force-pushed the fix/autoscaler-nil-panic-threshold branch from 255552a to ea1f7d2 Compare January 31, 2026 23:01

FAUST-BENCHOU reviewed Feb 1, 2026

View reviewed changes

FAUST-BENCHOU mentioned this pull request Feb 1, 2026

Fix router controllers dropping tombstone delete events #730

Open

	if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances100 >= instancesCountSum(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
	if p := autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent; p != nil && int64(recommendedInstances)100 >= int64(instancesCountSum)int64(*p) {

Prevent autoscaler panic when panicThresholdPercent is unset #731

Are you sure you want to change the base?

Prevent autoscaler panic when panicThresholdPercent is unset #731

Conversation

WHOIM1205 commented Jan 31, 2026

Fix nil pointer panic in heterogeneous autoscaler optimizer

Description

What’s Fixed

Impact

Code Changes

Test Verification

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

WHOIM1205 commented Jan 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

WHOIM1205 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

volcano-sh-bot commented Jan 31, 2026

Uh oh!

FAUST-BENCHOU left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FAUST-BENCHOU Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

WHOIM1205 Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

FAUST-BENCHOU Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

WHOIM1205 Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

FAUST-BENCHOU Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

WHOIM1205 Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FAUST-BENCHOU left a comment •

edited

Loading