Skip to content

Conversation

@chetan-rns
Copy link
Collaborator

@chetan-rns chetan-rns commented Feb 11, 2026

What does this PR do / why we need it:

The event writer keeps track of sent/unsent events. On reconnection, the agent creates a new event writer, and all the events in the old event writer are dropped. In this PR:

  • Use the existing event writer if it already exists
  • Reset the retry after so that the unsent events don't have to wait after reconnecting with the new stream
  • Don't use backoff Steps to compare with the retry count since it is decremented internally.

Which issue(s) this PR fixes:

Fixes #715

How to test changes / Special notes to the reviewer:

Checklist

  • Documentation update is required by this PR (and has been updated) OR no documentation update is required.

Summary by CodeRabbit

  • Bug Fixes

    • Improved event handling efficiency by reusing existing connections instead of repeatedly creating new ones, reducing unnecessary reinitialization.
  • Refactor

    • Consolidated retry logic with a consistent maximum retry limit for improved reliability and maintainability.
    • Updated test coverage to reflect enhanced retry behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

When agents reconnect after losing their principal connection, the code now reuses the existing event writer and updates its target stream instead of creating a new writer, preserving unsent and retriable events. Additionally, retry timers are reset on reconnection, and retry limits are consolidated under a new maxEventRetries constant.

Changes

Cohort / File(s) Summary
Connection Handling
agent/connection.go, go.mod
Modified handleStreamEvents to check for existing event writers and reuse them by updating the target stream, avoiding event loss during reconnection scenarios. Updated module dependencies.
Event Writer Core Logic
internal/event/event_writer.go
Introduced maxEventRetries constant to centralize retry limits; updated UpdateTarget to reset per-event retry timers for immediate retries on new connection; replaced hardcoded backoff steps value with the new constant throughout retry logic.
Event Writer Tests
internal/event/event_writer_test.go
Updated max-retry test loop boundary to use maxEventRetries as an inclusive bound, adjusting iteration count to match new retry semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • #724: Introduces or modifies EventWriter methods and logic that are directly affected by the retry timer and backoff constant changes in this PR.

Suggested reviewers

  • jannfis
  • mikeshng

Poem

🐰 Events were lost when connections break—a hop away from despair,
But now the writer stays alive, with timers reset with care,
Retries bloom like clover, through maxEventRetries so fair,
No duplication, just persistence—the agent's answer to reconnect's snare! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main bug fix: preventing event loss after agent reconnection, which is the core objective of the changeset.
Linked Issues check ✅ Passed The code changes directly address issue #715 objectives: reusing existing EventWriter on reconnection, resetting retry timers, and replacing hardcoded Steps with maxEventRetries constant.
Out of Scope Changes check ✅ Passed All changes in agent/connection.go, internal/event/event_writer.go, and tests are directly scoped to fixing event loss during reconnection with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
internal/event/event_writer.go (2)

77-91: Logger not updated on UpdateTarget — log entries will show the stale client address.

NewEventWriter captures grpcutil.AddressFromContext(target.Context()) into ew.log. After UpdateTarget sets a new stream, all subsequent log entries (retries, sends, etc.) will still display the old connection's address, making debugging reconnection scenarios harder.

♻️ Suggested fix
 func (ew *EventWriter) UpdateTarget(target streamWriter) {
 	ew.mu.Lock()
 	defer ew.mu.Unlock()
 	ew.target = target
+	ew.log = ew.log.WithField(logfields.ClientAddr, grpcutil.AddressFromContext(target.Context()))

 	// Reset retry timers so events in sentEvents are retried immediately on the new connection.

85-89: All sent-event retryAfter pointers alias the same local variable.

&now gives every message a pointer to the same stack-allocated time.Time. This works today because no code mutates through the pointer, but it's fragile. Allocating per-message is trivially cheap here and avoids a subtle aliasing surprise.

♻️ Suggested fix
-	now := time.Now()
 	for _, msg := range ew.sentEvents {
 		msg.mu.Lock()
+		now := time.Now()
 		msg.retryAfter = &now
 		msg.mu.Unlock()
 	}

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 16.66667% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.15%. Comparing base (abce285) to head (5e228d9).

Files with missing lines Patch % Lines
agent/connection.go 0.00% 5 Missing ⚠️
internal/event/event_writer.go 28.57% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #749      +/-   ##
==========================================
- Coverage   46.18%   46.15%   -0.04%     
==========================================
  Files         101      101              
  Lines       12513    12520       +7     
==========================================
- Hits         5779     5778       -1     
- Misses       6179     6187       +8     
  Partials      555      555              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Managed/Autonomous agents lose unsent/retriable events when principal connection is lost

2 participants