fix(event-tracker): prevent duplicate chaos experiment triggers under concurrent reconciles by WHOIM1205 · Pull Request #5409 · litmuschaos/litmus

WHOIM1205 · 2026-01-24T20:54:48Z

Fix race condition causing duplicate chaos experiment triggers in EventTrackerPolicy controller

Summary

This PR fixes a critical race condition in the EventTrackerPolicy controller that could cause the same chaos experiment to be triggered multiple times under concurrent reconciles.

The issue was caused by a combination of:

A no-op local mutex created per reconcile
Side effects (SendRequest) executed inside a retry loop
Concurrent reconciles observing stale IsTriggered=false state

The fix replaces the broken locking logic with a Kubernetes-idiomatic optimistic concurrency approach that guarantees exactly-once experiment triggering.

What was broken

Root issues

sync.Mutex was instantiated inside Reconcile(), so each reconcile had its own lock
SendRequest() (experiment trigger) was executed before the CR status update was safely committed
On Update() conflict, the reconcile retried and re-triggered the experiment
Multiple reconciles could race on the same EventTrackerPolicy, all seeing IsTriggered=false

Impact

Duplicate chaos experiments running simultaneously
Multiple chaos-runner pods competing for the same targets
Unpredictable chaos results
Resource exhaustion in production clusters
Silent CI / GitOps pipeline corruption

The fix

This PR introduces a two-phase, conflict-safe execution model.

Phase 1 — Atomically claim trigger intent

Uses retry.RetryOnConflict
Re-reads the latest EventTrackerPolicy
Marks IsTriggered = "true" before triggering
Commits the update atomically

Phase 2 — Execute side effects

Triggers experiments after the update succeeds
Executes SendRequest() outside the retry loop
Guarantees each experiment is triggered exactly once

No mutexes. No shared memory. Fully Kubernetes-native.

Why this is safe

Works correctly with:
- Multiple controller replicas
- Leader election failover
- Controller restarts
Uses Kubernetes optimistic locking instead of in-process synchronization
Avoids side effects inside retry loops
Preserves existing behavior while eliminating duplicates

How to reproduce (before this fix)

Deploy the event-tracker controller
Create an EventTrackerPolicy with Result=ConditionPassed and IsTriggered=false

apiVersion: eventtracker.litmuschaos.io/v1
kind: EventTrackerPolicy
metadata:
  name: test-policy
  namespace: litmus
status:
  - resourceName: trigger-config
    experimentID: test-experiment-123
    result: ConditionPassed
    isTriggered: "false"

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

WHOIM1205 · 2026-01-24T20:57:00Z

hey @ispeakc0de

This fixes a race in the EventTrackerPolicy reconciler that could trigger the same chaos experiment multiple times under concurrent reconciles by using optimistic concurrency and moving side effects outside the retry loop.

Saranya-jena · 2026-03-17T11:18:09Z

@WHOIM1205 could you fix the pipeline failures and see if the changes are still valid in the latest version?

fix(event-tracker): prevent duplicate experiment triggers in reconciler

d42a746

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

PriteshKiri added 2 commits March 17, 2026 12:17

Merge branch 'master' into fix/eventtracker-race-condition

b057454

Merge branch 'master' into fix/eventtracker-race-condition

7da5390

PriteshKiri requested review from Saranya-jena, amityt and ispeakc0de March 17, 2026 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(event-tracker): prevent duplicate chaos experiment triggers under concurrent reconciles#5409

fix(event-tracker): prevent duplicate chaos experiment triggers under concurrent reconciles#5409
WHOIM1205 wants to merge 3 commits intolitmuschaos:masterfrom
WHOIM1205:fix/eventtracker-race-condition

WHOIM1205 commented Jan 24, 2026

Uh oh!

WHOIM1205 commented Jan 24, 2026

Uh oh!

Saranya-jena commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WHOIM1205 commented Jan 24, 2026

Fix race condition causing duplicate chaos experiment triggers in EventTrackerPolicy controller

Summary

What was broken

Root issues

Impact

The fix

Phase 1 — Atomically claim trigger intent

Phase 2 — Execute side effects

Why this is safe

How to reproduce (before this fix)

Uh oh!

WHOIM1205 commented Jan 24, 2026

Uh oh!

Saranya-jena commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Saranya-jena commented Mar 17, 2026 •

edited

Loading