Skip to content

Conversation

@WHOIM1205
Copy link
Contributor

Fix nil pointer panic in heterogeneous autoscaler optimizer

Description

This PR fixes a nil pointer dereference in the heterogeneous autoscaler optimizer that could crash the autoscaler controller when an optional field is not configured.

panicThresholdPercent is defined as an optional field in the AutoscalingPolicy API. While the homogeneous scaler already handles this correctly, the heterogeneous optimizer assumed the field was always set and dereferenced it unconditionally.

This mismatch could cause the autoscaler to panic at runtime for valid user configurations.


What’s Fixed

  • Added a defensive nil check before dereferencing panicThresholdPercent in the heterogeneous optimizer
  • Aligned heterogeneous autoscaler behavior with the existing homogeneous scaler logic
  • Prevented controller panics for valid autoscaling policies that omit this optional field

Impact

  • Eliminates autoscaler controller crashes caused by nil pointer dereference
  • Prevents silent autoscaling outages for heterogeneous (multi-backend) workloads
  • Makes panicThresholdPercent truly optional, as intended by the API
  • No behavior change for users who already configure the field

Code Changes

  • File: pkg/autoscaler/autoscaler/optimizer.go
  • Function: (*Optimizer).Optimize
  • Change: Added a nil guard before accessing PanicThresholdPercent

The fix is intentionally minimal and localized to avoid any unintended side effects.


Test Verification

No new tests were added.

Existing autoscaler tests already cover both scenarios:

  • Policies with panicThresholdPercent configured
  • Policies where panicThresholdPercent is omitted

Previously, the latter case could panic at runtime. With this change, the same code paths execute safely without altering behavior.


@gemini-code-assist
Copy link

Summary of Changes

Hello @WHOIM1205, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical stability issue in the autoscaler controller by resolving a nil pointer dereference. Previously, the heterogeneous autoscaler optimizer could crash if the optional panicThresholdPercent field was unset in the AutoscalingPolicy. The fix introduces a defensive nil check, ensuring the autoscaler operates robustly with all valid user configurations and preventing potential outages for multi-backend workloads.

Highlights

  • Fix Nil Pointer Panic: Add a defensive nil check for panicThresholdPercent in the heterogeneous autoscaler optimizer to prevent crashes when this optional field is not configured.
  • Align Autoscaler Behavior: Ensure the heterogeneous autoscaler's handling of the optional panicThresholdPercent field aligns with the existing homogeneous scaler logic.
  • Prevent Controller Crashes: Eliminate autoscaler controller panics for valid autoscaling policies that omit the panicThresholdPercent field, making it truly optional as intended by the API.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@WHOIM1205
Copy link
Contributor Author

/assign @hzxuzhonghu

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a potential nil pointer dereference in the heterogeneous autoscaler optimizer. By adding a nil check for the optional panicThresholdPercent field, it prevents the autoscaler controller from crashing with valid user configurations. The fix is minimal and well-targeted. I have one suggestion to improve the readability and robustness of the conditional check by using an idiomatic Go pattern and preventing potential integer overflows.

return nil, nil
}
if recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and to avoid repeating the long expression, you can use a short-lived variable within the if statement. This is a common Go idiom.

Additionally, to prevent potential integer overflow when multiplying int32 values, it's safer to cast the operands to int64 before multiplication. While an overflow is unlikely with typical replica counts, this change makes the code more robust against large numbers of instances.

Suggested change
if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
if p := autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent; p != nil && int64(recommendedInstances)*100 >= int64(instancesCountSum)*int64(*p) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion
for this pr I’d prefer to keep the change strictly scoped to fixing the nil dereference and align with the existing pattern used in the homogeneous scaler introducing additional variables or type changes would broaden the scope beyond the original issue
happy to revisit readability or overflow hardening separately if needed

Regenerate helm-chart-values.md to match helm-docs output.
Local sub-chart dependencies don't have repository URLs.

Signed-off-by: WHOIM1205 <[email protected]>
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hzxuzhonghu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The test/e2e/utils, test/e2e/framework, and test/e2e/router/context
packages contain helper code but no _test.go files, causing go test
to fail on empty packages. Exclude these from the e2e test execution.

Signed-off-by: WHOIM1205 <[email protected]>
@WHOIM1205 WHOIM1205 force-pushed the fix/autoscaler-nil-panic-threshold branch from 255552a to ea1f7d2 Compare January 31, 2026 23:01
Copy link
Contributor

@FAUST-BENCHOU FAUST-BENCHOU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets wait for review of maintainers

Comment on lines +11 to +12
| | networking | 1.0.0 |
| | workload | 1.0.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdum by this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was only to fix a gen check failure due to a markdown table column mismatch no semantic change intended

@./test/e2e/setup.sh
@echo "Running E2E tests sequentially..."
@KUBECONFIG=/tmp/kubeconfig-e2e go test -p 1 $$(go list ./... | grep /test/e2e) -v -timeout=15m
@KUBECONFIG=/tmp/kubeconfig-e2e go test -p 1 $$(go list ./... | grep /test/e2e | grep -v /utils | grep -v /framework | grep -v /context) -v -timeout=15m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it to exclude auxiliary packages to avoid executing go test on packages without test packages?
just optional i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly these packages only contain helper code and no test.go files excluding them avoids go test failing on empty packages while keeping all actual e2e test packages executed

return nil, nil
}
if recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acceptable. Scaler is more robust

if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= currentInstancesCount*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for confirming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants