Skip to content

Conversation

@tiffanny29631
Copy link
Contributor

@tiffanny29631 tiffanny29631 commented Nov 10, 2025

ResourceGroup metrics were showing stale values from deleted ResourceGroups because the controller never received deletion events. Since the pod persists, metrics continue to be emitted but with incorrect values.

Example: When a RootSync with 30 resources is deleted, ready_resource_count will still show 30 even though the ResourceGroup no longer exists. Resetting the resource metrics to zero ensures the aggregate metric values across all ResourceGroups remain correct.

Root controller now sends deletion events, and ResourceGroup controller resets resource metrics to zero for deleted ResourceGroups, ensuring aggregate metric values remain correct.

ResourceGroup metrics were showing stale values from deleted ResourceGroups
because the controller never received deletion events. Since the pod persists,
metrics continue to be emitted but with incorrect values. Root controller now
sends deletion events, and ResourceGroup controller resets all metrics to zero
for deleted ResourceGroups, ensuring aggregate metric values remain correct.
@google-oss-prow google-oss-prow bot requested review from Camila-B and mikebz November 10, 2025 22:51
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from tiffanny29631. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link

Summary of Changes

Hello @tiffanny29631, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where ResourceGroup metrics would display stale values after a ResourceGroup was deleted. The changes ensure that upon deletion, the root controller propagates a deletion event, prompting the ResourceGroup controller to reset all relevant metrics to zero. This guarantees that aggregate metric values remain accurate and reflect the current state of active ResourceGroups.

Highlights

  • Deletion Event Propagation: The root controller now explicitly sends deletion events for ResourceGroups, ensuring that downstream controllers are notified when a resource is removed from the cluster.
  • Metric Reset on Deletion: The ResourceGroup controller has been updated to reset all associated resource metrics to zero when a ResourceGroup is deleted or not found, preventing stale metric values from persisting.
  • New Metric Reset Function: A new helper function, resetResourceMetrics, was introduced to encapsulate the logic for zeroing out various resource-related metrics upon ResourceGroup deletion.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the issue of stale metrics for deleted ResourceGroups. The changes ensure that metrics are reset to zero upon ResourceGroup deletion by introducing a resetResourceMetrics function and having the root controller send deletion events. The logic is sound and the test updates reflect the new behavior. I've identified an opportunity for refactoring to reduce code duplication in pkg/resourcegroup/controllers/root/root_controller.go, which would improve maintainability. Overall, this is a good fix.

@mikebz mikebz requested a review from Copilot November 10, 2025 23:34
Copilot finished reviewing on behalf of mikebz November 10, 2025 23:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where ResourceGroup metrics displayed stale values after ResourceGroups were deleted. The root cause was that the Root controller never sent deletion events to the ResourceGroup controller, so metrics continued to show outdated values even after deletion. The fix ensures deletion events are sent and metrics are reset to zero when ResourceGroups are deleted.

Key changes:

  • Root controller now sends deletion events to the ResourceGroup controller for both deletion scenarios (NotFound and DeletionTimestamp)
  • ResourceGroup controller resets all resource metrics to zero when handling deleted ResourceGroups
  • Test expectations updated to verify deletion events are properly sent

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/resourcegroup/controllers/root/root_controller.go Adds deletion event sending for both NotFound and DeletionTimestamp scenarios, ensuring the ResourceGroup controller receives notifications when ResourceGroups are deleted
pkg/resourcegroup/controllers/resourcegroup/resourcegroup_controller.go Introduces resetResourceMetrics function and calls it when ResourceGroups are deleted, resetting all metrics to zero to prevent stale values
pkg/resourcegroup/controllers/root/root_controller_test.go Updates test to validate that deletion events are sent to the channel instead of verifying no events are sent

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// TestReconcilerFinalizer_ResourceGroupMetricsReset tests that when a ResourceGroup
// is deleted, all resource-related metrics are reset to 0.
func TestReconcilerFinalizer_ResourceGroupMetricsReset(t *testing.T) {
nt := nomostest.New(t, nomostesting.MultiRepos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggest using Unstructured repository for testing instead of hierarchical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

defer cancel()
go nomostest.TailReconcilerLogs(ctx, nt, nomostest.RootReconcilerObjectKey(rootSyncKey.Name))

// Enable deletion propagation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the deletion propagation relevant to the test case? I believe the ResourceGroup should get cleaned up regardless of the deletion propagation policy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants