-
Notifications
You must be signed in to change notification settings - Fork 193
feat: add support for config retention for tracer + fix metrics default #2269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CRD
participant Controller
participant ResourceTemplate
participant Tempo
User->>CRD: Set traces.storage.retention (e.g., 2160h)
CRD->>Controller: Pass retention value via spec
Controller->>ResourceTemplate: Inject TracesRetention into template data
ResourceTemplate->>Tempo: Configure block_retention/global.traces with retention value
Tempo-->>User: Applies retention policy for trace data
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (12)
api/services/v1alpha1/monitoring_types.go (1)
115-117
: Fix typos and correct terminology in Retention commentIt should refer to "trace data" and "globally"; also fix the typo.
- // Retention specifies how long metrics data should be retained gloabally (e.g., "60m", "10h") + // Retention specifies how long trace data should be retained globally (e.g., "60m", "10h")api/services/v1alpha1/zz_generated.deepcopy.go (1)
339-339
: DeepCopyInto: explicit Retention copy is redundant but harmless; ensure codegen source is the authority
*out = *in
already copiesmetav1.Duration
by value. The extraout.Retention = in.Retention
is fine, but remember this file is controller-gen output—avoid hand edits and rely on regenerating from the Go types.internal/controller/services/monitoring/resources/tempo-monolithic.tmpl.yaml (1)
14-19
: Ensure extraConfig shape matches TempoMonolithic CRD; use block string if CRD expects raw YAMLSome Tempo operator versions expect
extraConfig
as a raw YAML string rather than a structured map. If that’s your case, switch to a block scalar:- extraConfig: - tempo: - compactor: - compaction: - block_retention: {{.TracesRetention}} # default 90days + extraConfig: |- + tempo: + compactor: + compaction: + block_retention: {{.TracesRetention}} # default 90daysbundle/manifests/services.platform.opendatahub.io_monitorings.yaml (1)
164-169
: Fix wording and consider validation: it’s “trace data”, and you may add a duration pattern
- The description currently says “metrics data” under
traces.storage.retention
. It should say “trace data”.- Optional: add a pattern validation aligned with Go
time.Duration
units (e.g.,ns|us|µs|ms|s|m|h
) to catch typos early.Note: This CRD is generated. Please update the Go type comments/markers in
api/services/v1alpha1/monitoring_types.go
and re-run codegen instead of editing YAML directly. Example:// TracesStorage defines the storage configuration for tracing. type TracesStorage struct { // ... // +kubebuilder:default:=2160h // +kubebuilder:validation:Pattern="^\\d+(ns|us|µs|ms|s|m|h)$" // Retention specifies how long trace data should be retained (e.g., "60m", "10h"). Retention metav1.Duration `json:"retention,omitempty"` }config/crd/bases/services.platform.opendatahub.io_monitorings.yaml (2)
164-168
: Fix typos and correct terminology in traces retention descriptionThis is under traces.storage, but the text says “metrics data” and has a typo (“gloabally”). Suggest tightening the wording and units note.
- retention: - default: 2160h - description: Retention specifies how long metrics data should - be retained gloabally (e.g., "60m", "10h") - type: string + retention: + default: 2160h + description: Retention specifies how long trace data should + be retained globally. Supports Go time.Duration units (e.g., "60m", "10h"). + type: string
164-168
: Optionally add schema validation for duration formatTo prevent invalid values (e.g., “90d” which Go time.Duration won’t parse), add a regex pattern that matches supported units. This keeps admission consistent with the metav1.Duration backing type.
retention: default: 2160h description: Retention specifies how long trace data should be retained globally. Supports Go time.Duration units (e.g., "60m", "10h"). type: string + pattern: ^\d+(ns|us|µs|ms|s|m|h)$
docs/api-overview.md (1)
3043-3043
: Optional: consider unit consistency with metrics retentionMetrics retention default is documented as “90d” while traces retention uses “2160h” (both equal 90 days). For clarity, consider aligning units across docs.
bundle/manifests/dscinitialization.opendatahub.io_dscinitializations.yaml (1)
216-220
: Fix wording: this is trace retention, not metricsUnder traces.storage.retention, the description says “metrics data.” It should say “trace data” to avoid confusion.
Please update upstream Go type comments (so controller-gen regenerates this text), then regenerate the CRDs/bundle. For illustration, the generated YAML should reflect:
- description: Retention specifies how long metrics data + description: Retention specifies how long trace data should be retained (e.g., "60m", "10h")config/crd/bases/dscinitialization.opendatahub.io_dscinitializations.yaml (1)
216-220
: Correct description wording and typo (“globally”)The description here should reference “trace data,” and there’s a typo “gloabally”.
Please correct the Go source comments (kubebuilder markers/docs) and re-run controller-gen. The generated YAML should read:
- description: Retention specifies how long metrics data - should be retained gloabally (e.g., "60m", "10h") + description: Retention specifies how long trace data + should be retained globally (e.g., "60m", "10h")tests/e2e/monitoring_test.go (3)
296-324
: Simplify setMonitoringTraces by avoiding repeated type assertionsCurrent code repeatedly asserts and checks the "storage" map. Since it’s initialized as a map just above, you can streamline this.
-func setMonitoringTraces(backend, secret, size, retention string) testf.TransformFn { +func setMonitoringTraces(backend, secret, size, retention string) testf.TransformFn { return func(obj *unstructured.Unstructured) error { tracesConfig := map[string]interface{}{ "storage": map[string]interface{}{ "backend": backend, }, } - - if size != "" { - if storage, ok := tracesConfig["storage"].(map[string]interface{}); ok { - storage["size"] = size - } - } - - if secret != "" { - if storage, ok := tracesConfig["storage"].(map[string]interface{}); ok { - storage["secret"] = secret - } - } - - if retention != "" { - if storage, ok := tracesConfig["storage"].(map[string]interface{}); ok { - storage["retention"] = retention - } - } + storage := tracesConfig["storage"].(map[string]interface{}) + if size != "" { + storage["size"] = size + } + if secret != "" { + storage["secret"] = secret + } + if retention != "" { + storage["retention"] = retention + } return unstructured.SetNestedField(obj.Object, tracesConfig, "spec", "monitoring", "traces") } }
356-365
: LGTM: PV backend + retention verification for TempoMonolithicGood coverage: asserting backend, size, and the normalized compactor.retention ("24h0m0s").
Optionally, consider an additional test where retention is omitted to ensure the default ("2160h") is propagated into TempoMonolithic as well, not just the Monitoring CR. I can draft that if helpful.
Also applies to: 386-388
149-152
: Update stale comment to match asserted valueThe comment says “Validate storage retention is set to 1d” but the assertion expects "90d". Update the comment for clarity.
- // Validate storage retention is set to 1d + // Validate storage retention is set to 90d
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
README.md
(1 hunks)api/services/v1alpha1/monitoring_types.go
(1 hunks)api/services/v1alpha1/zz_generated.deepcopy.go
(1 hunks)bundle/manifests/dscinitialization.opendatahub.io_dscinitializations.yaml
(1 hunks)bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
(1 hunks)bundle/manifests/services.platform.opendatahub.io_monitorings.yaml
(1 hunks)config/crd/bases/dscinitialization.opendatahub.io_dscinitializations.yaml
(1 hunks)config/crd/bases/services.platform.opendatahub.io_monitorings.yaml
(1 hunks)docs/api-overview.md
(1 hunks)internal/controller/services/monitoring/monitoring_controller_support.go
(2 hunks)internal/controller/services/monitoring/resources/tempo-monolithic.tmpl.yaml
(1 hunks)internal/controller/services/monitoring/resources/tempo-stack.tmpl.yaml
(1 hunks)tests/e2e/monitoring_test.go
(10 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: mlassak
PR: opendatahub-io/opendatahub-operator#2010
File: config/crd/kustomization.yaml:22-22
Timestamp: 2025-07-22T10:32:09.737Z
Learning: In the opendatahub-operator repository, when FeatureTrackers are being removed or deprecated, the FeatureTracker CRD reference in config/crd/kustomization.yaml should be kept for backward compatibility during the migration period, even if some components no longer use FeatureTrackers.
📚 Learning: 2025-07-22T10:32:09.737Z
Learnt from: mlassak
PR: opendatahub-io/opendatahub-operator#2010
File: config/crd/kustomization.yaml:22-22
Timestamp: 2025-07-22T10:32:09.737Z
Learning: In the opendatahub-operator repository, when FeatureTrackers are being removed or deprecated, the FeatureTracker CRD reference in config/crd/kustomization.yaml should be kept for backward compatibility during the migration period, even if some components no longer use FeatureTrackers.
Applied to files:
config/crd/bases/services.platform.opendatahub.io_monitorings.yaml
config/crd/bases/dscinitialization.opendatahub.io_dscinitializations.yaml
bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
bundle/manifests/services.platform.opendatahub.io_monitorings.yaml
📚 Learning: 2025-07-29T18:44:42.749Z
Learnt from: kahowell
PR: opendatahub-io/opendatahub-operator#2220
File: bundle.rhoai/manifests/rhods-operator.clusterserviceversion.yaml:1548-1550
Timestamp: 2025-07-29T18:44:42.749Z
Learning: In the opendatahub-operator repository's rhoai branch, the CSV manifest intentionally uses `quay.io/opendatahub/opendatahub-operator:latest` as the image reference even though it's for the rhods-operator, and this configuration difference from the main branch is by design.
Applied to files:
bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
📚 Learning: 2025-05-27T11:28:21.229Z
Learnt from: AjayJagan
PR: opendatahub-io/opendatahub-operator#1990
File: Dockerfiles/catalog.Dockerfile:6-6
Timestamp: 2025-05-27T11:28:21.229Z
Learning: The file `Dockerfiles/catalog.Dockerfile` in the opendatahub-operator repository is autogenerated and should not be manually modified.
Applied to files:
bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
📚 Learning: 2025-07-29T18:45:24.518Z
Learnt from: kahowell
PR: opendatahub-io/opendatahub-operator#2220
File: config/crd.rhoai/bases/dscinitialization.opendatahub.io_dscinitializations.yaml:90-107
Timestamp: 2025-07-29T18:45:24.518Z
Learning: In the opendatahub-operator repository, CRD files under config/crd.rhoai/bases/ are generated files (using controller-gen from Go structs), and inconsistencies between enum and pattern validation should not be flagged as they require upstream source code changes rather than direct YAML modifications.
Applied to files:
bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
tests/e2e/monitoring_test.go
📚 Learning: 2025-05-27T08:29:05.312Z
Learnt from: AjayJagan
PR: opendatahub-io/opendatahub-operator#1990
File: hack/update-catalog-template.sh:36-43
Timestamp: 2025-05-27T08:29:05.312Z
Learning: The hack/update-catalog-template.sh script in opendatahub-operator is development tooling where simple regex patterns for version extraction are acceptable, and developers are expected to handle version format requirements themselves.
Applied to files:
bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
📚 Learning: 2025-08-08T07:56:13.134Z
Learnt from: GowthamShanmugam
PR: opendatahub-io/opendatahub-operator#2159
File: tests/e2e/creation_test.go:486-490
Timestamp: 2025-08-08T07:56:13.134Z
Learning: In tests/e2e/creation_test.go, ValidateComponentsDeploymentFailure intentionally uses an extremely restrictive ResourceQuota to simulate component deployment failures; CPU quantities must use valid forms (e.g., "1m" or "0"), since "0.1m" is invalid and causes resource.MustParse to panic.
Applied to files:
tests/e2e/monitoring_test.go
🔇 Additional comments (9)
README.md (1)
346-346
: Docs update aligns with new default retentionAdding
retention: 2160h
undertraces.storage
is consistent with the new API field and defaults.internal/controller/services/monitoring/monitoring_controller_support.go (1)
91-91
: Update default metrics retention to 90d when Storage is nil — LGTMMatches API/docs defaults and keeps behavior consistent.
bundle/manifests/opendatahub-operator.clusterserviceversion.yaml (1)
117-117
: CSV createdAt bump looks fine; confirm bundle regeneration consistencyTimestamp update is expected with a new bundle. Please confirm that all bundle artifacts were regenerated in lockstep (make bundle/manifests) so there’s no drift across CSV, package, and annotations.
internal/controller/services/monitoring/resources/tempo-stack.tmpl.yaml (1)
9-11
: Retention wiring for TempoStack – verify CRD compatibilityLGTM conceptually. Please double-check the Tempo operator version used supports
spec.retention.global.traces
onTempoStack
with tenants.mode=openshift.tests/e2e/monitoring_test.go (5)
257-261
: LGTM: exercise non-default traces retentionPassing "720h" (30d) to setMonitoringTraces is appropriate for validating retention propagation.
269-275
: LGTM: helper accessors for resource namesThe getTempoMonolithicName/getTempoStackName helpers improve consistency/readability.
Also applies to: 273-275
431-444
: LGTM: default traces retention asserted in Monitoring CRVerifying
.spec.traces.storage.retention == "2160h"
when not explicitly set is aligned with the new default.
520-520
: LGTM: use helper for TempoStack nameReadability win; keeps naming consistent across tests.
554-555
: LGTM: verify normalized retention for TempoStackChecking
.spec.retention.global.traces == "1h40m0s"
correctly validates normalization of "100m".
internal/controller/services/monitoring/monitoring_controller_support.go
Show resolved
Hide resolved
/retest |
- default to 90d/2160h - fix missing default retention for metrics Signed-off-by: Wen Zhou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/e2e/monitoring_test.go (1)
503-510
: Fix function comment: remove stale “Returns …” note (function returns nothing)The function doesn’t return a resource; the return note is misleading (this was also noted previously).
// validateTempoStackCreationWithBackend validates TempoStack creation with the specified backend. // It updates the DSCInitialization to configure monitoring traces, waits for the Monitoring resource // to be updated, and ensures the TempoStack CR is created with the correct backend configuration. // // Parameters: // - backend: The storage backend type (e.g., "s3", "gcs") // - secretName: The name of the secret containing backend credentials // - monitoringCondition: Gomega matcher to validate the Monitoring resource state // - monitoringErrorMsg: Error message to display if Monitoring resource validation fails // -// Returns the created TempoStack resource for additional validation by the caller.
🧹 Nitpick comments (2)
tests/e2e/monitoring_test.go (2)
257-260
: Nit: This test sets a 30d retention but doesn’t assert anything related to retentionTo keep the test focused on OTel Collector pipelines, either drop the retention argument here or add an assertion somewhere that uses/reflects it.
Example (drop retention to avoid confusion):
- setMonitoringTraces("pv", "", "", "720h"), // Use 30 days retention for this test + setMonitoringTraces("pv", "", "", ""),
296-296
: Refactor: Build the storage map once to avoid repeated type assertionsCurrent code repeatedly asserts and updates the "storage" map. Constructing it once improves readability and reduces branching.
func setMonitoringTraces(backend, secret, size, retention string) testf.TransformFn { return func(obj *unstructured.Unstructured) error { - tracesConfig := map[string]interface{}{ - "storage": map[string]interface{}{ - "backend": backend, - }, - } - - if size != "" { - if storage, ok := tracesConfig["storage"].(map[string]interface{}); ok { - storage["size"] = size - } - } - - if secret != "" { - if storage, ok := tracesConfig["storage"].(map[string]interface{}); ok { - storage["secret"] = secret - } - } - - if retention != "" { - if storage, ok := tracesConfig["storage"].(map[string]interface{}); ok { - storage["retention"] = retention - } - } - - return unstructured.SetNestedField(obj.Object, tracesConfig, "spec", "monitoring", "traces") + storage := map[string]interface{}{"backend": backend} + if size != "" { + storage["size"] = size + } + if secret != "" { + storage["secret"] = secret + } + if retention != "" { + storage["retention"] = retention + } + tracesConfig := map[string]interface{}{"storage": storage} + return unstructured.SetNestedField(obj.Object, tracesConfig, "spec", "monitoring", "traces") } }Also applies to: 316-320
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/monitoring_test.go
(9 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mlassak
PR: opendatahub-io/opendatahub-operator#2010
File: config/crd/kustomization.yaml:22-22
Timestamp: 2025-07-22T10:32:09.737Z
Learning: In the opendatahub-operator repository, when FeatureTrackers are being removed or deprecated, the FeatureTracker CRD reference in config/crd/kustomization.yaml should be kept for backward compatibility during the migration period, even if some components no longer use FeatureTrackers.
📚 Learning: 2025-08-08T07:56:13.158Z
Learnt from: GowthamShanmugam
PR: opendatahub-io/opendatahub-operator#2159
File: tests/e2e/creation_test.go:486-490
Timestamp: 2025-08-08T07:56:13.158Z
Learning: In tests/e2e/creation_test.go, ValidateComponentsDeploymentFailure intentionally uses an extremely restrictive ResourceQuota to simulate component deployment failures; CPU quantities must use valid forms (e.g., "1m" or "0"), since "0.1m" is invalid and causes resource.MustParse to panic.
Applied to files:
tests/e2e/monitoring_test.go
📚 Learning: 2025-07-29T18:45:24.518Z
Learnt from: kahowell
PR: opendatahub-io/opendatahub-operator#2220
File: config/crd.rhoai/bases/dscinitialization.opendatahub.io_dscinitializations.yaml:90-107
Timestamp: 2025-07-29T18:45:24.518Z
Learning: In the opendatahub-operator repository, CRD files under config/crd.rhoai/bases/ are generated files (using controller-gen from Go structs), and inconsistencies between enum and pattern validation should not be flagged as they require upstream source code changes rather than direct YAML modifications.
Applied to files:
tests/e2e/monitoring_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build/push catalog image
- GitHub Check: Run tests and collect coverage on tests/integration
- GitHub Check: Run tests and collect coverage on internal and pkg
- GitHub Check: golangci-lint
🔇 Additional comments (4)
tests/e2e/monitoring_test.go (4)
351-363
: LGTM: TempoMonolithic retention plumbing validatedAsserting compactor.compaction.block_retention equals "24h0m0s" confirms correct propagation and normalization of "24h".
Also applies to: 386-388
430-432
: LGTM: Default traces retention verified via Monitoring CRGood coverage for the default (90d) rendered as "2160h0m0s" without explicitly setting retention.
Also applies to: 437-442
519-519
: LGTM: TempoStack retention assertion matches 100m → "1h40m0s"The helper usage and assertion on
.spec.retention.global.traces
look correct and aligned with the new retention field.Also applies to: 526-526, 552-554
269-275
: Approve changes after verificationNo call sites were found that pass arguments to the updated helpers—everything now correctly uses the parameterless
getTempoMonolithicName
andgetTempoStackName
. Good cleanup: removing unused parameters reduces noise and avoids misleading signatures.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2269 +/- ##
=======================================
Coverage 41.92% 41.92%
=======================================
Files 144 144
Lines 11387 11387
=======================================
Hits 4774 4774
Misses 6212 6212
Partials 401 401 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: StevenTobin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, we should cherry-pick @zdtsw
/cherry-pick rhoai |
@zdtsw: #2269 failed to apply on top of branch "rhoai":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…lt (opendatahub-io#2269) * feat: add support for config retention for tracer + fix metrics default - default to 90d/2160h - fix missing default retention for metrics --------- Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 8e5e862)
…lt (opendatahub-io#2269) * feat: add support for config retention for tracer + fix metrics default - default to 90d/2160h - fix missing default retention for metrics --------- Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 8e5e862) Signed-off-by: Wen Zhou <[email protected]>
follow in #2291 |
…lt (#2269) (#2291) * feat: add support for config retention for tracer + fix metrics default - default to 90d/2160h - fix missing default retention for metrics --------- (cherry picked from commit 8e5e862) Signed-off-by: Wen Zhou <[email protected]>
…#11010) * update: remove duplicated RBAC (opendatahub-io#2296) Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 3675c84) * feat: add support for config retention for tracer + fix metrics default (opendatahub-io#2269) (opendatahub-io#2291) * feat: add support for config retention for tracer + fix metrics default - default to 90d/2160h - fix missing default retention for metrics --------- (cherry picked from commit 8e5e862) Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 714fe37) --------- Signed-off-by: Wen Zhou <[email protected]>
…lt (opendatahub-io#2269) (opendatahub-io#2291) * feat: add support for config retention for tracer + fix metrics default - default to 90d/2160h - fix missing default retention for metrics --------- (cherry picked from commit 8e5e862) Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 714fe37)
Description
ref : https://issues.redhat.com/browse/RHOAIENG-31400
How Has This Been Tested?
Screenshot or short clip
Merge criteria
Summary by CodeRabbit
New Features
Documentation
Bug Fixes