Skip to content

fix: support metrics collection with alternate report storage#2856

Open
rzala wants to merge 2 commits intoaquasecurity:mainfrom
rzala:fix/alternate-storage-metrics
Open

fix: support metrics collection with alternate report storage#2856
rzala wants to merge 2 commits intoaquasecurity:mainfrom
rzala:fix/alternate-storage-metrics

Conversation

@rzala
Copy link

@rzala rzala commented Jan 15, 2026

Description

Fixes #2610

This PR adds support for Prometheus metrics collection when alternate report storage (filesystem-based) is enabled. Previously, enabling alternateReportStorage.enabled: true would cause all metrics like trivy_image_vulnerabilities to stop working because the metrics collector only read from Kubernetes CRDs.

Changes

  • Added StorageReader interface to abstract storage backend operations
  • Implemented CRDStorageReader for reading from Kubernetes CRDs (existing default behavior)
  • Implemented FilesystemStorageReader for reading from alternate storage filesystem
  • Updated ResourcesMetricsCollector to use the StorageReader abstraction
  • Added comprehensive unit tests for both storage backends and edge cases

Impact

  • Backward compatible - No breaking changes, CRD-based metrics work exactly as before
  • Dual-mode support - Metrics now work with both CRD and filesystem storage
  • All report types supported - VulnerabilityReport, ExposedSecretReport, ConfigAuditReport, RbacAssessmentReport, InfraAssessmentReport, ClusterComplianceReport
  • Production ready - Robust error handling, logging, and graceful degradation

Testing

Added comprehensive unit tests in pkg/metrics/storage_reader_test.go:

  • Tests for CRD-based collection (validates backward compatibility)
  • Tests for filesystem-based collection
  • Edge case testing (missing directories, corrupt files, permissions)
  • Backend selection tests

How to Test

With CRD Storage (default):

  1. Deploy trivy-operator with default configuration
  2. Verify metrics endpoint returns data

With Alternate Storage:

  1. Enable alternate storage: alternateReportStorage.enabled: true
  2. Configure PVC-based storage
  3. Verify metrics endpoint still returns data
  4. Confirm metrics match report contents

Checklist

@rzala rzala requested a review from simar7 as a code owner January 15, 2026 17:04
@CLAassistant
Copy link

CLAassistant commented Jan 15, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the bug label Jan 15, 2026
@rzala rzala marked this pull request as draft January 15, 2026 19:55
Fixes aquasecurity#2610

When alternate report storage is enabled, reports are saved to
filesystem instead of Kubernetes CRDs. The metrics collector was
only reading from CRDs, causing all metrics to become unavailable.

This commit adds a storage abstraction layer that allows the metrics
collector to read from either CRDs (default) or filesystem (when
alternate storage is enabled), maintaining full backward compatibility.

Additionally, adds validation to skip malformed reports without proper
metadata (name/labels), preventing duplicate metric errors from stale
files in alternate storage directories.

Changes:
- Add StorageReader interface for storage backend abstraction
- Implement CRDStorageReader for reading from Kubernetes CRDs
- Implement FilesystemStorageReader for reading from alternate storage
- Add validation to filter out malformed reports without metadata
- Update ResourcesMetricsCollector to use StorageReader
- Add comprehensive unit tests for both storage backends

All report types are supported: VulnerabilityReport, ExposedSecretReport,
ConfigAuditReport, RbacAssessmentReport, InfraAssessmentReport, and
ClusterComplianceReport.
@rzala rzala force-pushed the fix/alternate-storage-metrics branch from cbc0a1b to 0452d01 Compare January 15, 2026 20:11
@rzala rzala marked this pull request as ready for review January 16, 2026 08:16
@Sydney-MF
Copy link

Any idea on when this PR would get merged? Thanks:)

@afdesk
Copy link
Contributor

afdesk commented Feb 2, 2026

@Sydney-MF thanks for the ping
I'll take a look tomorrow

@Sydney-MF
Copy link

@afdesk Any updates on this?:) really appreciate you for taking the time to take a look at this by the way

@afdesk
Copy link
Contributor

afdesk commented Feb 4, 2026

@afdesk Any updates on this?:) really appreciate you for taking the time to take a look at this by the way

i'm on it right now
could you pls fix linter errors?

@Sydney-MF
Copy link

i'm on it right now could you pls fix linter errors?

It's not my PR so I'm not so sure if I should just go about it and modify it 😅

@rzal
Copy link

rzal commented Feb 4, 2026

Will have a look at the linting errors soon

@afdesk
Copy link
Contributor

afdesk commented Feb 4, 2026

Hi guys again! thanks for your efforts and contribution! I need more time to investigate it.

I have some concerns about performance here - StorageReader re-reads folders very ofter.
We have to evaluate it

@Sydney-MF
Copy link

Sydney-MF commented Feb 4, 2026

Hi guys again! thanks for your efforts and contribution! I need more time to investigate it.

I have some concerns about performance here - StorageReader re-reads folders very ofter. We have to evaluate it

No worries, please keep us updated since there's quite an issue with the etcd limit issue so the only way to make Trivy Operator work was for us to use alternateStorage, but then we lose the ability to see the metrics which is a big part of this tool's feature. But once again thank you for taking the time to take a look at this PR:)

Also thank you @rzala @rzal for the PR.

@afdesk afdesk added this to the v0.31.0 milestone Feb 9, 2026
- Replace interface{} with any (gofmt)
- Fix file/directory permissions (gosec G301/G306)
- Fix import ordering (gci)
- Use %q for quoted strings in error messages (gocritic)
- Replace assert.Len with assert.Empty (testifylint)
- Mark unused context parameters with underscore (revive)
@rzala rzala requested a review from afdesk as a code owner February 13, 2026 12:29
@rzala
Copy link
Author

rzala commented Feb 13, 2026

@Sydney-MF thanks for the interest and for nudging this along!

@afdesk linting errors have been fixed in the latest push. Regarding the performance concern around frequent directory re-reads — happy to look into adding a file-watcher or in-memory cache with TTL if that's the preferred direction. Let me know what approach you'd like and I'll update the PR.

@baznikin
Copy link

Maybe fine-tune this feature if disk scanning really became a problem? We're evaluating Trivy operator and only way we can run it is 1h ttl on CRD resources and limiting targets and scanners to very narrow set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alternate Report Storage doesn't provides metrics

6 participants