Skip to content

feat: Implement metrics for images#1532

Merged
dkarpele merged 2 commits intoargoproj-labs:masterfrom
dkarpele:dk-GITOPS-8068
Mar 4, 2026
Merged

feat: Implement metrics for images#1532
dkarpele merged 2 commits intoargoproj-labs:masterfrom
dkarpele:dk-GITOPS-8068

Conversation

@dkarpele
Copy link
Collaborator

@dkarpele dkarpele commented Mar 2, 2026

Closes: #1511
Closes: #1514
Metrics are disabled for webhook without polling mode.

Summary by CodeRabbit

  • New Features

    • Per-CustomResource Prometheus metrics added: images watched (gauge), images updated (counter), and update errors (counter), aggregated and reported per ImageUpdater CR.
    • CR-scoped metrics collection is configurable: enabled in normal polling runs and disabled in webhook-only mode to avoid orphaned series.
  • Documentation

    • Installation/metrics docs updated with new metrics, label names, sample /metrics output, and mode-specific behavior.

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

Walkthrough

Refactors metrics from application-level to ImageUpdater CR-level, adds ImageUpdaterConfig.EnableCRMetrics flag to gate CR-scoped metrics collection and cleanup, enables CR metrics in controller run, disables them in webhook-only run, and updates tests and docs to use the new ImageUpdaterCR metrics API.

Changes

Cohort / File(s) Summary
Metrics core
pkg/metrics/metrics.go, pkg/metrics/metrics_test.go
Replaced application-scoped metrics with ImageUpdaterCRMetrics; added ImageUpdaterCR accessor and NewImageUpdaterCRMetrics() constructor. Introduced CR-labeled metrics: ImagesWatchedTotal, ImagesUpdatedTotal, ImagesUpdatedErrorsTotal, and removal API RemoveImageUpdaterMetrics. Updated metric labels to image_updater_cr_name / image_updater_cr_namespace.
Controller logic & tests
internal/controller/imageupdater_controller.go, internal/controller/reconcile.go, internal/controller/reconcile_test.go, internal/controller/imageupdater_controller_test.go
Added EnableCRMetrics bool to ImageUpdaterConfig. Reconciler now records CR-scoped metrics (SetNumberOfApplications, SetNumberOfImagesWatched, IncreaseImageUpdate, IncreaseUpdateErrors) and sets result.ApplicationsMatched. Finalizer metric cleanup is conditional on EnableCRMetrics. Tests updated to assert CR-scoped metrics.
CLI / mode bootstrap & tests
cmd/run.go, cmd/webhook.go, cmd/common_test.go
Controller Run sets cfg.EnableCRMetrics = true; webhook Run sets cfg.EnableCRMetrics = false (disable CR metrics in webhook-only mode). Tests updated to use ImageUpdaterCR() accessor.
Documentation
docs/install/installation.md
Expanded metrics documentation: introduced CR-scoped metrics list and sample /metrics output, clarified that CR metrics are populated in polling/controller mode and disabled in webhook-only mode; removed/merged previous metrics entries accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (run / webhook)
    participant Controller as ImageUpdater Controller/Reconciler
    participant Metrics as ImageUpdaterCR Metrics
    participant Registry as Prometheus Registry

    CLI->>Controller: start (cfg.EnableCRMetrics = true|false)
    Controller->>Controller: reconcile ImageUpdater CR
    Controller->>Metrics: SetNumberOfApplications(cr, ns, matched)
    Controller->>Metrics: SetNumberOfImagesWatched(cr, ns, watched)
    alt not warmup && not dryRun
        Controller->>Metrics: IncreaseImageUpdate(cr, ns, updated)
    end
    Controller->>Metrics: IncreaseUpdateErrors(cr, ns, errors)
    Metrics->>Registry: expose/update metric series
    Note over Controller,Metrics: on CR deletion (finalizer)
    alt EnableCRMetrics == true
        Controller->>Metrics: RemoveImageUpdaterMetrics(cr, ns)
        Metrics->>Registry: remove metric label series
    end
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Implement metrics for images' directly summarizes the main changes, which restore and implement missing Prometheus metrics for image updates and errors.
Linked Issues check ✅ Passed All objectives from #1511 and #1514 are met: metrics (images_updated_total, images_errors_total) are re-enabled and incremented appropriately, RemoveImageUpdaterMetrics cleanup is called in both Reconcile and webhook modes via EnableCRMetrics flag, and metric lifecycle is properly managed.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing and enabling image metrics (images_updated_total, images_errors_total, images_watched_total) and ensuring proper metric cleanup in both polling and webhook-only modes, with no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 95.34884% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.34%. Comparing base (c1674be) to head (90074cd).
⚠️ Report is 47 commits behind head on master.

Files with missing lines Patch % Lines
cmd/run.go 0.00% 1 Missing ⚠️
cmd/webhook.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1532      +/-   ##
==========================================
+ Coverage   71.48%   73.34%   +1.86%     
==========================================
  Files          50       53       +3     
  Lines        4667     5087     +420     
==========================================
+ Hits         3336     3731     +395     
- Misses       1133     1147      +14     
- Partials      198      209      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/imageupdater_controller_test.go`:
- Around line 30-41: The nil-check for apm is currently after apm is
dereferenced causing a potential panic; either remove the redundant "if apm !=
nil" guard entirely (since InitMetrics guarantees non-nil from
metrics.ImageUpdaterCR()) or move the nil guard to immediately after apm :=
metrics.ImageUpdaterCR() (before calling apm.SetNumberOfApplications /
SetNumberOfImagesWatched / IncreaseImageUpdate / IncreaseUpdateErrors), or use a
require/assert to fail-fast if apm is nil; update the test to reference apm only
after that early nil-check or removal.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36868d9 and 12ae7ab.

📒 Files selected for processing (10)
  • cmd/common_test.go
  • cmd/run.go
  • cmd/webhook.go
  • docs/install/installation.md
  • internal/controller/imageupdater_controller.go
  • internal/controller/imageupdater_controller_test.go
  • internal/controller/reconcile.go
  • internal/controller/reconcile_test.go
  • pkg/metrics/metrics.go
  • pkg/metrics/metrics_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/metrics/metrics_test.go (1)

133-147: Extend removal test to assert ApplicationsTotal cleanup too.

RemoveImageUpdaterMetrics deletes ApplicationsTotal as well, but the subtest only asserts the image vectors. Adding ApplicationsTotal assertions closes a meaningful regression gap.

Proposed test diff
 		assert.Equal(t, 2, testutil.CollectAndCount(apm.ImagesWatchedTotal))
 		assert.Equal(t, 2, testutil.CollectAndCount(apm.ImagesUpdatedTotal))
 		assert.Equal(t, 2, testutil.CollectAndCount(apm.ImagesUpdatedErrorsTotal))
+		assert.Equal(t, 2, testutil.CollectAndCount(apm.ApplicationsTotal))

 		apm.RemoveImageUpdaterMetrics("cr1", "ns1")

 		assert.Equal(t, 1, testutil.CollectAndCount(apm.ImagesWatchedTotal))
 		assert.Equal(t, float64(20), testutil.ToFloat64(apm.ImagesWatchedTotal.WithLabelValues("cr2", "ns2")))

 		assert.Equal(t, 1, testutil.CollectAndCount(apm.ImagesUpdatedTotal))
 		assert.Equal(t, float64(6), testutil.ToFloat64(apm.ImagesUpdatedTotal.WithLabelValues("cr2", "ns2")))

 		assert.Equal(t, 1, testutil.CollectAndCount(apm.ImagesUpdatedErrorsTotal))
 		assert.Equal(t, float64(3), testutil.ToFloat64(apm.ImagesUpdatedErrorsTotal.WithLabelValues("cr2", "ns2")))
+		assert.Equal(t, 1, testutil.CollectAndCount(apm.ApplicationsTotal))
+		assert.Equal(t, float64(1), testutil.ToFloat64(apm.ApplicationsTotal.WithLabelValues("cr2", "ns2")))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/metrics/metrics_test.go` around lines 133 - 147, The test currently
verifies ImagesWatchedTotal, ImagesUpdatedTotal, and ImagesUpdatedErrorsTotal
after calling RemoveImageUpdaterMetrics("cr1","ns1") but omits asserting that
ApplicationsTotal was removed; update the subtest in metrics_test.go to also
assert the collector count for apm.ApplicationsTotal decreased (e.g., from 2 to
1) and that the remaining label values for ("cr2","ns2") equals the expected
float value, referencing the ApplicationsTotal metric and the
RemoveImageUpdaterMetrics function to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/metrics/metrics_test.go`:
- Around line 133-147: The test currently verifies ImagesWatchedTotal,
ImagesUpdatedTotal, and ImagesUpdatedErrorsTotal after calling
RemoveImageUpdaterMetrics("cr1","ns1") but omits asserting that
ApplicationsTotal was removed; update the subtest in metrics_test.go to also
assert the collector count for apm.ApplicationsTotal decreased (e.g., from 2 to
1) and that the remaining label values for ("cr2","ns2") equals the expected
float value, referencing the ApplicationsTotal metric and the
RemoveImageUpdaterMetrics function to locate the change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12ae7ab and a88f668.

📒 Files selected for processing (10)
  • cmd/common_test.go
  • cmd/run.go
  • cmd/webhook.go
  • docs/install/installation.md
  • internal/controller/imageupdater_controller.go
  • internal/controller/imageupdater_controller_test.go
  • internal/controller/reconcile.go
  • internal/controller/reconcile_test.go
  • pkg/metrics/metrics.go
  • pkg/metrics/metrics_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/controller/imageupdater_controller.go
  • cmd/webhook.go
  • internal/controller/reconcile_test.go
  • cmd/run.go

@dkarpele dkarpele marked this pull request as draft March 2, 2026 21:42
@dkarpele dkarpele marked this pull request as ready for review March 3, 2026 16:41
// RemoveNumberOfApplications removes the applications-watched gauge for a given ImageUpdater CR.
func (iucm *ImageUpdaterCRMetrics) RemoveNumberOfApplications(name, namespace string) {
iucm.ApplicationsTotal.DeleteLabelValues(name, namespace)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveNumberOfApplications func is only used in tests. Do we really need it?

// ResetApplicationsTotal resets the applications-watched gauge (all CRs). Used in tests.
func (iucm *ImageUpdaterCRMetrics) ResetApplicationsTotal() {
iucm.ApplicationsTotal.Reset()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResetApplicationsTotal func is only used in tests. Do we really need it?

* `argocd_image_updater_images_errors_total`
* `argocd_image_updater_k8s_api_requests_total`
* `argocd_image_updater_k8s_api_errors_total`
* `argocd_image_updater_registry_requests_total`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a sample metrics output for illustration purpose?

@chengfang chengfang changed the title feat: Implement metrics for images* feat: Implement metrics for images Mar 3, 2026
dkarpele and others added 2 commits March 3, 2026 21:34
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: dkarpele <karpelevich@gmail.com>
- remove functions without usage

Signed-off-by: dkarpele <karpelevich@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
pkg/metrics/metrics_test.go (1)

88-93: Add assertions in TestMetricsOperations for the new CR metric calls.

Right now this path executes without validating emitted values, so regressions can slip through unnoticed.

Suggested assertions
 	apm := ImageUpdaterCR()
 	apm.IncreaseImageUpdate("cr1", "ns1", 1)
 	apm.IncreaseUpdateErrors("cr1", "ns1", 2)
 	apm.SetNumberOfApplications("cr1", "ns1", 3)
 	apm.SetNumberOfImagesWatched("cr1", "ns1", 4)
+
+	assert.Equal(t, float64(1), testutil.ToFloat64(apm.ImagesUpdatedTotal.WithLabelValues("cr1", "ns1")))
+	assert.Equal(t, float64(2), testutil.ToFloat64(apm.ImagesUpdatedErrorsTotal.WithLabelValues("cr1", "ns1")))
+	assert.Equal(t, float64(3), testutil.ToFloat64(apm.ApplicationsTotal.WithLabelValues("cr1", "ns1")))
+	assert.Equal(t, float64(4), testutil.ToFloat64(apm.ImagesWatchedTotal.WithLabelValues("cr1", "ns1")))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/metrics/metrics_test.go` around lines 88 - 93, In TestMetricsOperations,
after creating apm := ImageUpdaterCR() and calling
apm.IncreaseImageUpdate("cr1","ns1",1), apm.IncreaseUpdateErrors("cr1","ns1",2),
apm.SetNumberOfApplications("cr1","ns1",3) and
apm.SetNumberOfImagesWatched("cr1","ns1",4), add assertions that read the
corresponding exported metric values and verify they match the expected totals
(e.g. image update counter == 1, update error counter == 2, applications gauge
== 3, imagesWatched gauge == 4) using the package’s metric retrieval helpers or
the Prometheus/testutil utilities used elsewhere in the tests so failures
surface in TestMetricsOperations.
cmd/webhook.go (1)

68-70: Add a regression test for webhook-only metrics gating.

This branch is important for metric lifecycle behavior and is easy to regress. Please add a command-level/unit test that asserts cfg.EnableCRMetrics is forced to false in webhook-only execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/webhook.go` around lines 68 - 70, Add a unit test that verifies
webhook-only execution forces cfg.EnableCRMetrics to false: create a test (e.g.,
TestWebhookCommand_DisablesCRMetrics) in the cmd package that invokes the
webhook command entrypoint (the function/constructor that builds/runs the
webhook command) with webhook-only mode enabled and then inspects the resulting
cfg object to assert cfg.EnableCRMetrics == false; if the command returns its
config indirectly, adapt the test to run the command initialization path and
capture the cfg used when webhook-only is set. Ensure the test fails if
cfg.EnableCRMetrics is not explicitly set to false.
internal/controller/reconcile_test.go (1)

2312-2328: Strengthen this metric test to avoid false positives.

Using cumulative assertions (GreaterOrEqual) with shared labels can pass even if this specific run stops incrementing. Isolate labels for this case (or reset state) so you can assert exact values.

Suggested approach
-			cr:     baseCr,
+			cr: &argocdimageupdaterv1alpha1.ImageUpdater{
+				ObjectMeta: metav1.ObjectMeta{
+					Name:      "metrics-cr-isolated",
+					Namespace: "argocd",
+				},
+				Spec: baseCr.Spec,
+			},
...
-				assert.GreaterOrEqual(t, updated, float64(res.NumImagesUpdated), "images updated total")
-				assert.GreaterOrEqual(t, errors, float64(res.NumErrors), "images errors total")
+				assert.Equal(t, float64(res.NumImagesUpdated), updated, "images updated total")
+				assert.Equal(t, float64(res.NumErrors), errors, "images errors total")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/reconcile_test.go` around lines 2312 - 2328, This test
currently reads cumulative metrics keyed by cr.Name/cr.Namespace causing false
positives; make the run isolated by giving the test CR unique labels (e.g., set
baseCr.Name to a random/unique value or append a test-specific suffix before
calling the reconciler in this test's setup) so
metrics.ImageUpdaterCR().ImagesWatchedTotal/ImagesUpdatedTotal/ImagesUpdatedErrorsTotal
are scoped to this run, then switch the counter assertions (for updated and
errors) from GreaterOrEqual to exact Equals against res.NumImagesUpdated and
res.NumErrors; keep referencing the same symbols (baseCr, postCheck closure,
ImageUpdaterReconciler via the test harness, and
metrics.ImageUpdaterCR()/iucm.Images*Total.WithLabelValues) when making the
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/webhook.go`:
- Around line 68-70: Add a unit test that verifies webhook-only execution forces
cfg.EnableCRMetrics to false: create a test (e.g.,
TestWebhookCommand_DisablesCRMetrics) in the cmd package that invokes the
webhook command entrypoint (the function/constructor that builds/runs the
webhook command) with webhook-only mode enabled and then inspects the resulting
cfg object to assert cfg.EnableCRMetrics == false; if the command returns its
config indirectly, adapt the test to run the command initialization path and
capture the cfg used when webhook-only is set. Ensure the test fails if
cfg.EnableCRMetrics is not explicitly set to false.

In `@internal/controller/reconcile_test.go`:
- Around line 2312-2328: This test currently reads cumulative metrics keyed by
cr.Name/cr.Namespace causing false positives; make the run isolated by giving
the test CR unique labels (e.g., set baseCr.Name to a random/unique value or
append a test-specific suffix before calling the reconciler in this test's
setup) so
metrics.ImageUpdaterCR().ImagesWatchedTotal/ImagesUpdatedTotal/ImagesUpdatedErrorsTotal
are scoped to this run, then switch the counter assertions (for updated and
errors) from GreaterOrEqual to exact Equals against res.NumImagesUpdated and
res.NumErrors; keep referencing the same symbols (baseCr, postCheck closure,
ImageUpdaterReconciler via the test harness, and
metrics.ImageUpdaterCR()/iucm.Images*Total.WithLabelValues) when making the
changes.

In `@pkg/metrics/metrics_test.go`:
- Around line 88-93: In TestMetricsOperations, after creating apm :=
ImageUpdaterCR() and calling apm.IncreaseImageUpdate("cr1","ns1",1),
apm.IncreaseUpdateErrors("cr1","ns1",2),
apm.SetNumberOfApplications("cr1","ns1",3) and
apm.SetNumberOfImagesWatched("cr1","ns1",4), add assertions that read the
corresponding exported metric values and verify they match the expected totals
(e.g. image update counter == 1, update error counter == 2, applications gauge
== 3, imagesWatched gauge == 4) using the package’s metric retrieval helpers or
the Prometheus/testutil utilities used elsewhere in the tests so failures
surface in TestMetricsOperations.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a88f668 and 90074cd.

📒 Files selected for processing (10)
  • cmd/common_test.go
  • cmd/run.go
  • cmd/webhook.go
  • docs/install/installation.md
  • internal/controller/imageupdater_controller.go
  • internal/controller/imageupdater_controller_test.go
  • internal/controller/reconcile.go
  • internal/controller/reconcile_test.go
  • pkg/metrics/metrics.go
  • pkg/metrics/metrics_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • cmd/run.go
  • internal/controller/imageupdater_controller_test.go
  • cmd/common_test.go

@dkarpele dkarpele merged commit 5718b68 into argoproj-labs:master Mar 4, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants