Add retry logic with exponential backoff ensureAgentReady #887

danstarns · 2025-06-11T12:40:45Z

This PR adds retry logic with exponential backoff to ensureAgentReady to handle race conditions when agents are starting or resuming.

This fixes issues where SendMessage calls immediately after StartAgent would fail with "agent not found" errors due to asynchronous agent initialization.

Copilot

Pull Request Overview

This PR enhances the ensureAgentReady function by adding retry logic with exponential backoff to handle asynchronous agent initialization.

Introduces constants for maximum retries, base delay, and maximum delay.
Wraps getAgentInfo in a retry loop, handling different agent statuses (starting, suspended, terminated) and resuming suspended agents.
Implements backoff delays and context cancellation between attempts, with a final error when all retries are exhausted.

Comments suppressed due to low confidence (3)

runtime/actors/agents.go:180

The error returned from getAgentInfo is not checked, which may hide underlying failures (e.g., database errors). Consider handling err before inspecting info or pid.

info, pid, err := getAgentInfo(ctx, agentId)

runtime/actors/agents.go:172

[nitpick] The function now includes retry and backoff logic; please update or add a doc comment to explain the retry strategy, delay parameters, and the conditions under which it returns errors.

func ensureAgentReady(ctx context.Context, agentId string) (*AgentInfo, *goakt.PID, error) {

runtime/actors/agents.go:179

The new retry and resume logic covers several code paths; consider adding unit tests that simulate different agent statuses and context cancellations to verify behavior across attempts.

for attempt := 0; attempt < maxRetries; attempt++ {

Copilot · 2025-06-11T12:42:32Z

runtime/actors/agents.go

+		}
+
+		// Wait before retrying with exponential backoff
+		delay := time.Duration(float64(baseDelay) * (1.5*float64(attempt) + 1))


The current formula yields linear backoff rather than true exponential backoff. To match the PR description, consider using a power function (e.g., baseDelay * time.Duration(math.Pow(2, float64(attempt)))).

Copilot · 2025-06-11T12:42:32Z

runtime/actors/agents.go

+					go func() {
+						if _, err := spawnActorForAgent(host, plugin, agentId, info.Name, false); err != nil {
+							logger.Err(context.Background(), err).Msgf("Failed to resume agent %s", agentId)
+						}
+					}()


[nitpick] Using context.Background() for logging drops metadata (tracing, cancellation) from the original context. Consider deriving a child context or using the existing ctx for richer diagnostics.

Suggested change

go func() {

if _, err := spawnActorForAgent(host, plugin, agentId, info.Name, false); err != nil {

logger.Err(context.Background(), err).Msgf("Failed to resume agent %s", agentId)

}

}()

go func(parentCtx context.Context) {

derivedCtx := context.WithValue(parentCtx, "operation", "resumeAgent")

if _, err := spawnActorForAgent(host, plugin, agentId, info.Name, false); err != nil {

logger.Err(derivedCtx, err).Msgf("Failed to resume agent %s", agentId)

}

}(ctx)

mattjohnsonpint · 2025-06-11T21:58:44Z

Thanks for pointing out the issue, and for the PR. I decided instead to go with the approach in #890.

Add retry logic with exponential backoff

200b0ea

danstarns requested review from a team and Copilot June 11, 2025 12:40

Copilot AI reviewed Jun 11, 2025

View reviewed changes

*

405be97

mattjohnsonpint closed this Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retry logic with exponential backoff ensureAgentReady #887

Add retry logic with exponential backoff ensureAgentReady #887

Uh oh!

danstarns commented Jun 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 11, 2025

Uh oh!

Copilot AI Jun 11, 2025

Uh oh!

mattjohnsonpint commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Add retry logic with exponential backoff ensureAgentReady #887

Add retry logic with exponential backoff ensureAgentReady #887

Uh oh!

Conversation

danstarns commented Jun 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

mattjohnsonpint commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants