feat: Implement metrics for images#1532
Conversation
WalkthroughRefactors metrics from application-level to ImageUpdater CR-level, adds ImageUpdaterConfig.EnableCRMetrics flag to gate CR-scoped metrics collection and cleanup, enables CR metrics in controller run, disables them in webhook-only run, and updates tests and docs to use the new ImageUpdaterCR metrics API. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (run / webhook)
participant Controller as ImageUpdater Controller/Reconciler
participant Metrics as ImageUpdaterCR Metrics
participant Registry as Prometheus Registry
CLI->>Controller: start (cfg.EnableCRMetrics = true|false)
Controller->>Controller: reconcile ImageUpdater CR
Controller->>Metrics: SetNumberOfApplications(cr, ns, matched)
Controller->>Metrics: SetNumberOfImagesWatched(cr, ns, watched)
alt not warmup && not dryRun
Controller->>Metrics: IncreaseImageUpdate(cr, ns, updated)
end
Controller->>Metrics: IncreaseUpdateErrors(cr, ns, errors)
Metrics->>Registry: expose/update metric series
Note over Controller,Metrics: on CR deletion (finalizer)
alt EnableCRMetrics == true
Controller->>Metrics: RemoveImageUpdaterMetrics(cr, ns)
Metrics->>Registry: remove metric label series
end
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1532 +/- ##
==========================================
+ Coverage 71.48% 73.34% +1.86%
==========================================
Files 50 53 +3
Lines 4667 5087 +420
==========================================
+ Hits 3336 3731 +395
- Misses 1133 1147 +14
- Partials 198 209 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/imageupdater_controller_test.go`:
- Around line 30-41: The nil-check for apm is currently after apm is
dereferenced causing a potential panic; either remove the redundant "if apm !=
nil" guard entirely (since InitMetrics guarantees non-nil from
metrics.ImageUpdaterCR()) or move the nil guard to immediately after apm :=
metrics.ImageUpdaterCR() (before calling apm.SetNumberOfApplications /
SetNumberOfImagesWatched / IncreaseImageUpdate / IncreaseUpdateErrors), or use a
require/assert to fail-fast if apm is nil; update the test to reference apm only
after that early nil-check or removal.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cmd/common_test.gocmd/run.gocmd/webhook.godocs/install/installation.mdinternal/controller/imageupdater_controller.gointernal/controller/imageupdater_controller_test.gointernal/controller/reconcile.gointernal/controller/reconcile_test.gopkg/metrics/metrics.gopkg/metrics/metrics_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/metrics/metrics_test.go (1)
133-147: Extend removal test to assertApplicationsTotalcleanup too.
RemoveImageUpdaterMetricsdeletesApplicationsTotalas well, but the subtest only asserts the image vectors. AddingApplicationsTotalassertions closes a meaningful regression gap.Proposed test diff
assert.Equal(t, 2, testutil.CollectAndCount(apm.ImagesWatchedTotal)) assert.Equal(t, 2, testutil.CollectAndCount(apm.ImagesUpdatedTotal)) assert.Equal(t, 2, testutil.CollectAndCount(apm.ImagesUpdatedErrorsTotal)) + assert.Equal(t, 2, testutil.CollectAndCount(apm.ApplicationsTotal)) apm.RemoveImageUpdaterMetrics("cr1", "ns1") assert.Equal(t, 1, testutil.CollectAndCount(apm.ImagesWatchedTotal)) assert.Equal(t, float64(20), testutil.ToFloat64(apm.ImagesWatchedTotal.WithLabelValues("cr2", "ns2"))) assert.Equal(t, 1, testutil.CollectAndCount(apm.ImagesUpdatedTotal)) assert.Equal(t, float64(6), testutil.ToFloat64(apm.ImagesUpdatedTotal.WithLabelValues("cr2", "ns2"))) assert.Equal(t, 1, testutil.CollectAndCount(apm.ImagesUpdatedErrorsTotal)) assert.Equal(t, float64(3), testutil.ToFloat64(apm.ImagesUpdatedErrorsTotal.WithLabelValues("cr2", "ns2"))) + assert.Equal(t, 1, testutil.CollectAndCount(apm.ApplicationsTotal)) + assert.Equal(t, float64(1), testutil.ToFloat64(apm.ApplicationsTotal.WithLabelValues("cr2", "ns2")))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/metrics/metrics_test.go` around lines 133 - 147, The test currently verifies ImagesWatchedTotal, ImagesUpdatedTotal, and ImagesUpdatedErrorsTotal after calling RemoveImageUpdaterMetrics("cr1","ns1") but omits asserting that ApplicationsTotal was removed; update the subtest in metrics_test.go to also assert the collector count for apm.ApplicationsTotal decreased (e.g., from 2 to 1) and that the remaining label values for ("cr2","ns2") equals the expected float value, referencing the ApplicationsTotal metric and the RemoveImageUpdaterMetrics function to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/metrics/metrics_test.go`:
- Around line 133-147: The test currently verifies ImagesWatchedTotal,
ImagesUpdatedTotal, and ImagesUpdatedErrorsTotal after calling
RemoveImageUpdaterMetrics("cr1","ns1") but omits asserting that
ApplicationsTotal was removed; update the subtest in metrics_test.go to also
assert the collector count for apm.ApplicationsTotal decreased (e.g., from 2 to
1) and that the remaining label values for ("cr2","ns2") equals the expected
float value, referencing the ApplicationsTotal metric and the
RemoveImageUpdaterMetrics function to locate the change.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cmd/common_test.gocmd/run.gocmd/webhook.godocs/install/installation.mdinternal/controller/imageupdater_controller.gointernal/controller/imageupdater_controller_test.gointernal/controller/reconcile.gointernal/controller/reconcile_test.gopkg/metrics/metrics.gopkg/metrics/metrics_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/controller/imageupdater_controller.go
- cmd/webhook.go
- internal/controller/reconcile_test.go
- cmd/run.go
pkg/metrics/metrics.go
Outdated
| // RemoveNumberOfApplications removes the applications-watched gauge for a given ImageUpdater CR. | ||
| func (iucm *ImageUpdaterCRMetrics) RemoveNumberOfApplications(name, namespace string) { | ||
| iucm.ApplicationsTotal.DeleteLabelValues(name, namespace) | ||
| } |
There was a problem hiding this comment.
RemoveNumberOfApplications func is only used in tests. Do we really need it?
pkg/metrics/metrics.go
Outdated
| // ResetApplicationsTotal resets the applications-watched gauge (all CRs). Used in tests. | ||
| func (iucm *ImageUpdaterCRMetrics) ResetApplicationsTotal() { | ||
| iucm.ApplicationsTotal.Reset() | ||
| } |
There was a problem hiding this comment.
ResetApplicationsTotal func is only used in tests. Do we really need it?
| * `argocd_image_updater_images_errors_total` | ||
| * `argocd_image_updater_k8s_api_requests_total` | ||
| * `argocd_image_updater_k8s_api_errors_total` | ||
| * `argocd_image_updater_registry_requests_total` |
There was a problem hiding this comment.
can we have a sample metrics output for illustration purpose?
Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: dkarpele <karpelevich@gmail.com>
- remove functions without usage Signed-off-by: dkarpele <karpelevich@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pkg/metrics/metrics_test.go (1)
88-93: Add assertions inTestMetricsOperationsfor the new CR metric calls.Right now this path executes without validating emitted values, so regressions can slip through unnoticed.
Suggested assertions
apm := ImageUpdaterCR() apm.IncreaseImageUpdate("cr1", "ns1", 1) apm.IncreaseUpdateErrors("cr1", "ns1", 2) apm.SetNumberOfApplications("cr1", "ns1", 3) apm.SetNumberOfImagesWatched("cr1", "ns1", 4) + + assert.Equal(t, float64(1), testutil.ToFloat64(apm.ImagesUpdatedTotal.WithLabelValues("cr1", "ns1"))) + assert.Equal(t, float64(2), testutil.ToFloat64(apm.ImagesUpdatedErrorsTotal.WithLabelValues("cr1", "ns1"))) + assert.Equal(t, float64(3), testutil.ToFloat64(apm.ApplicationsTotal.WithLabelValues("cr1", "ns1"))) + assert.Equal(t, float64(4), testutil.ToFloat64(apm.ImagesWatchedTotal.WithLabelValues("cr1", "ns1")))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/metrics/metrics_test.go` around lines 88 - 93, In TestMetricsOperations, after creating apm := ImageUpdaterCR() and calling apm.IncreaseImageUpdate("cr1","ns1",1), apm.IncreaseUpdateErrors("cr1","ns1",2), apm.SetNumberOfApplications("cr1","ns1",3) and apm.SetNumberOfImagesWatched("cr1","ns1",4), add assertions that read the corresponding exported metric values and verify they match the expected totals (e.g. image update counter == 1, update error counter == 2, applications gauge == 3, imagesWatched gauge == 4) using the package’s metric retrieval helpers or the Prometheus/testutil utilities used elsewhere in the tests so failures surface in TestMetricsOperations.cmd/webhook.go (1)
68-70: Add a regression test for webhook-only metrics gating.This branch is important for metric lifecycle behavior and is easy to regress. Please add a command-level/unit test that asserts
cfg.EnableCRMetricsis forced tofalsein webhook-only execution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/webhook.go` around lines 68 - 70, Add a unit test that verifies webhook-only execution forces cfg.EnableCRMetrics to false: create a test (e.g., TestWebhookCommand_DisablesCRMetrics) in the cmd package that invokes the webhook command entrypoint (the function/constructor that builds/runs the webhook command) with webhook-only mode enabled and then inspects the resulting cfg object to assert cfg.EnableCRMetrics == false; if the command returns its config indirectly, adapt the test to run the command initialization path and capture the cfg used when webhook-only is set. Ensure the test fails if cfg.EnableCRMetrics is not explicitly set to false.internal/controller/reconcile_test.go (1)
2312-2328: Strengthen this metric test to avoid false positives.Using cumulative assertions (
GreaterOrEqual) with shared labels can pass even if this specific run stops incrementing. Isolate labels for this case (or reset state) so you can assert exact values.Suggested approach
- cr: baseCr, + cr: &argocdimageupdaterv1alpha1.ImageUpdater{ + ObjectMeta: metav1.ObjectMeta{ + Name: "metrics-cr-isolated", + Namespace: "argocd", + }, + Spec: baseCr.Spec, + }, ... - assert.GreaterOrEqual(t, updated, float64(res.NumImagesUpdated), "images updated total") - assert.GreaterOrEqual(t, errors, float64(res.NumErrors), "images errors total") + assert.Equal(t, float64(res.NumImagesUpdated), updated, "images updated total") + assert.Equal(t, float64(res.NumErrors), errors, "images errors total")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/reconcile_test.go` around lines 2312 - 2328, This test currently reads cumulative metrics keyed by cr.Name/cr.Namespace causing false positives; make the run isolated by giving the test CR unique labels (e.g., set baseCr.Name to a random/unique value or append a test-specific suffix before calling the reconciler in this test's setup) so metrics.ImageUpdaterCR().ImagesWatchedTotal/ImagesUpdatedTotal/ImagesUpdatedErrorsTotal are scoped to this run, then switch the counter assertions (for updated and errors) from GreaterOrEqual to exact Equals against res.NumImagesUpdated and res.NumErrors; keep referencing the same symbols (baseCr, postCheck closure, ImageUpdaterReconciler via the test harness, and metrics.ImageUpdaterCR()/iucm.Images*Total.WithLabelValues) when making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/webhook.go`:
- Around line 68-70: Add a unit test that verifies webhook-only execution forces
cfg.EnableCRMetrics to false: create a test (e.g.,
TestWebhookCommand_DisablesCRMetrics) in the cmd package that invokes the
webhook command entrypoint (the function/constructor that builds/runs the
webhook command) with webhook-only mode enabled and then inspects the resulting
cfg object to assert cfg.EnableCRMetrics == false; if the command returns its
config indirectly, adapt the test to run the command initialization path and
capture the cfg used when webhook-only is set. Ensure the test fails if
cfg.EnableCRMetrics is not explicitly set to false.
In `@internal/controller/reconcile_test.go`:
- Around line 2312-2328: This test currently reads cumulative metrics keyed by
cr.Name/cr.Namespace causing false positives; make the run isolated by giving
the test CR unique labels (e.g., set baseCr.Name to a random/unique value or
append a test-specific suffix before calling the reconciler in this test's
setup) so
metrics.ImageUpdaterCR().ImagesWatchedTotal/ImagesUpdatedTotal/ImagesUpdatedErrorsTotal
are scoped to this run, then switch the counter assertions (for updated and
errors) from GreaterOrEqual to exact Equals against res.NumImagesUpdated and
res.NumErrors; keep referencing the same symbols (baseCr, postCheck closure,
ImageUpdaterReconciler via the test harness, and
metrics.ImageUpdaterCR()/iucm.Images*Total.WithLabelValues) when making the
changes.
In `@pkg/metrics/metrics_test.go`:
- Around line 88-93: In TestMetricsOperations, after creating apm :=
ImageUpdaterCR() and calling apm.IncreaseImageUpdate("cr1","ns1",1),
apm.IncreaseUpdateErrors("cr1","ns1",2),
apm.SetNumberOfApplications("cr1","ns1",3) and
apm.SetNumberOfImagesWatched("cr1","ns1",4), add assertions that read the
corresponding exported metric values and verify they match the expected totals
(e.g. image update counter == 1, update error counter == 2, applications gauge
== 3, imagesWatched gauge == 4) using the package’s metric retrieval helpers or
the Prometheus/testutil utilities used elsewhere in the tests so failures
surface in TestMetricsOperations.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cmd/common_test.gocmd/run.gocmd/webhook.godocs/install/installation.mdinternal/controller/imageupdater_controller.gointernal/controller/imageupdater_controller_test.gointernal/controller/reconcile.gointernal/controller/reconcile_test.gopkg/metrics/metrics.gopkg/metrics/metrics_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- cmd/run.go
- internal/controller/imageupdater_controller_test.go
- cmd/common_test.go
Closes: #1511
Closes: #1514
Metrics are disabled for webhook without polling mode.
Summary by CodeRabbit
New Features
Documentation