-
Notifications
You must be signed in to change notification settings - Fork 51
fix: reset ResourceGroup metrics on deletion #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ResourceGroup metrics were showing stale values from deleted ResourceGroups because the controller never received deletion events. Since the pod persists, metrics continue to be emitted but with incorrect values. Root controller now sends deletion events, and ResourceGroup controller resets all metrics to zero for deleted ResourceGroups, ensuring aggregate metric values remain correct.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary of ChangesHello @tiffanny29631, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where ResourceGroup metrics would display stale values after a ResourceGroup was deleted. The changes ensure that upon deletion, the root controller propagates a deletion event, prompting the ResourceGroup controller to reset all relevant metrics to zero. This guarantees that aggregate metric values remain accurate and reflect the current state of active ResourceGroups. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses the issue of stale metrics for deleted ResourceGroups. The changes ensure that metrics are reset to zero upon ResourceGroup deletion by introducing a resetResourceMetrics function and having the root controller send deletion events. The logic is sound and the test updates reflect the new behavior. I've identified an opportunity for refactoring to reduce code duplication in pkg/resourcegroup/controllers/root/root_controller.go, which would improve maintainability. Overall, this is a good fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where ResourceGroup metrics displayed stale values after ResourceGroups were deleted. The root cause was that the Root controller never sent deletion events to the ResourceGroup controller, so metrics continued to show outdated values even after deletion. The fix ensures deletion events are sent and metrics are reset to zero when ResourceGroups are deleted.
Key changes:
- Root controller now sends deletion events to the ResourceGroup controller for both deletion scenarios (NotFound and DeletionTimestamp)
- ResourceGroup controller resets all resource metrics to zero when handling deleted ResourceGroups
- Test expectations updated to verify deletion events are properly sent
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/resourcegroup/controllers/root/root_controller.go | Adds deletion event sending for both NotFound and DeletionTimestamp scenarios, ensuring the ResourceGroup controller receives notifications when ResourceGroups are deleted |
| pkg/resourcegroup/controllers/resourcegroup/resourcegroup_controller.go | Introduces resetResourceMetrics function and calls it when ResourceGroups are deleted, resetting all metrics to zero to prevent stale values |
| pkg/resourcegroup/controllers/root/root_controller_test.go | Updates test to validate that deletion events are sent to the channel instead of verifying no events are sent |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
55ace24 to
5cd18a2
Compare
| // TestReconcilerFinalizer_ResourceGroupMetricsReset tests that when a ResourceGroup | ||
| // is deleted, all resource-related metrics are reset to 0. | ||
| func TestReconcilerFinalizer_ResourceGroupMetricsReset(t *testing.T) { | ||
| nt := nomostest.New(t, nomostesting.MultiRepos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest using Unstructured repository for testing instead of hierarchical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| defer cancel() | ||
| go nomostest.TailReconcilerLogs(ctx, nt, nomostest.RootReconcilerObjectKey(rootSyncKey.Name)) | ||
|
|
||
| // Enable deletion propagation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the deletion propagation relevant to the test case? I believe the ResourceGroup should get cleaned up regardless of the deletion propagation policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
5cd18a2 to
1d4ae20
Compare
ResourceGroup metrics were showing stale values from deleted ResourceGroups because the controller never received deletion events. Since the pod persists, metrics continue to be emitted but with incorrect values.
Example: When a RootSync with 30 resources is deleted, ready_resource_count will still show 30 even though the ResourceGroup no longer exists. Resetting the resource metrics to zero ensures the aggregate metric values across all ResourceGroups remain correct.
Root controller now sends deletion events, and ResourceGroup controller resets resource metrics to zero for deleted ResourceGroups, ensuring aggregate metric values remain correct.