Skip to content

Fix TargetAllocator reconcilation loop and missing RBAC for events.k8…#4950

Open
IshwarKanse wants to merge 2 commits intoopen-telemetry:mainfrom
IshwarKanse:fix-anyconfig-deepcopy-and-events-rbac
Open

Fix TargetAllocator reconcilation loop and missing RBAC for events.k8…#4950
IshwarKanse wants to merge 2 commits intoopen-telemetry:mainfrom
IshwarKanse:fix-anyconfig-deepcopy-and-events-rbac

Conversation

@IshwarKanse
Copy link
Copy Markdown
Contributor

@IshwarKanse IshwarKanse commented Apr 10, 2026

Summary

Fixes two bugs:

1. AnyConfig.DeepCopyInto shallow copy causes TargetAllocator infinite reconciliation loop

AnyConfig.DeepCopyInto used maps.Copy() which only copied the top-level map entries, leaving nested maps/slices as shared references. When ApplyDefaults injected TLS profile settings (min_version) into the collector's scrape config via applyTLSProfileToScrapeConfigs, it mutated the informer cache through the shared reference.

This caused the TargetAllocator config hash annotation (opentelemetry-targetallocator-config/hash) to alternate between two values on every reconciliation — one with min_version: TLS12 (from the mutated cache) and one without (from a fresh cache read). The Deployment generation counter incremented continuously (~2/sec), flip-flopping between two ReplicaSets.

Root cause: Introduced by #4871 which added PrometheusParser.applyTLSProfileToScrapeConfigs — a function that mutates deeply-nested maps in-place. The shallow DeepCopy existed before but was harmless until nested map mutation was introduced.

Fix: Changed AnyConfig.DeepCopyInto to perform a true deep copy via JSON round-trip (json.Marshaljson.Unmarshal), ensuring nested maps/slices are fully independent copies.

Triggered when: TLS_CONFIGURE_OPERANDS=true (default on OpenShift with TLS profile injection enabled).

2. Missing RBAC for events.k8s.io API group

The operator uses k8s.io/client-go/tools/events (via controller-runtime's mgr.GetEventRecorder()), which targets the events.k8s.io API group. The ClusterRole only granted permission for the core API group (""), causing "Server rejected event (will not retry!)" errors when recording events on managed resources.

Fix: Added +kubebuilder:rbac:groups=events.k8s.io,resources=events,verbs=create;patch markers to all three controllers and regenerated manifests.

Testing

  • Ran chainsaw test --skip-delete tests/e2e-targetallocator/targetallocator-kubernetessdpasses (previously failed with observedGeneration: 21 instead of 1)
  • Verified Deployment generation stays at 1 with a single ReplicaSet after the fix
  • Added unit tests for AnyConfig.DeepCopyInto:
    • TestAnyConfigDeepCopyInto_NestedMapIndependence — verifies mutating nested maps in the copy does not affect the source (reproduces the exact bug)
    • TestAnyConfigDeepCopyInto_NilObject — nil Object stays nil
    • TestAnyConfigDeepCopyInto_EmptyObject — empty map is independent
    • TestAnyConfigDeepCopyInto_PreservesValues — all value types survive the deep copy
  • All existing unit tests pass (go test ./apis/v1beta1/..., go test ./internal/manifests/targetallocator/...)

@IshwarKanse IshwarKanse requested a review from a team as a code owner April 10, 2026 12:17
@IshwarKanse IshwarKanse force-pushed the fix-anyconfig-deepcopy-and-events-rbac branch from d2cc048 to f63e68f Compare April 10, 2026 12:22
@swiatekm
Copy link
Copy Markdown
Contributor

For the event fix, could you add a e2e test verifying that the operator can emit them? I'd like to avoid this breakage in the future.

@IshwarKanse
Copy link
Copy Markdown
Contributor Author

For the event fix, could you add a e2e test verifying that the operator can emit them? I'd like to avoid this breakage in the future.

Added the assertion to check the events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants