Skip to content
This repository was archived by the owner on Sep 11, 2025. It is now read-only.

Conversation

@mattjohnsonpint
Copy link
Contributor

  • Restore suspended agents on demand as needed rather than on startup
  • Simplify agent suspension process during clean shutdown (and parallelize)
  • Remove extra wait after spawning agent (not needed, since next message will be on same cluster node)

This should dramatically improve the thrashing and memory consumption we've been seeing whenever we roll out a new version to the cluster.

@mattjohnsonpint mattjohnsonpint requested review from a team and Copilot July 14, 2025 20:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables on-demand restoration of suspended agents, simplifies suspension during shutdown with parallel execution, and removes unnecessary cluster sync waits.

  • Add debug logging for topic subscriptions
  • Implement retry logic in SendAgentMessage to restart missing agents on demand
  • Remove automatic agent restoration on startup and inline cluster sync waits; parallelize agent suspension during shutdown

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
runtime/actors/subscriber.go Log debug message when subscribing to a topic
runtime/actors/agents.go Remove startup cluster sync wait; add retry loop with on-demand actor spawning in SendAgentMessage
runtime/actors/actorsystem.go Strip out restore-on-startup logic and helper; inline cluster sync wait; parallelize suspension
CHANGELOG.md Add entry for v0.18.6 with “restore agents on demand” feature
Comments suppressed due to low confidence (2)

runtime/actors/agents.go:205

  • The new retry logic in SendAgentMessage isn't covered by existing tests. Adding unit tests for scenarios like actor restart and eventual success would help ensure this behavior remains reliable.
	const maxRetries = 3

runtime/actors/actorsystem.go:176

  • The shutdown process no longer stops subscription actors after suspending agent actors, which may cause subscription actors to remain running. Consider reintroducing the subscription actor shutdown loop after wg.Wait().
	wg.Wait()

@mattjohnsonpint mattjohnsonpint merged commit 484a7f3 into main Jul 14, 2025
33 checks passed
@mattjohnsonpint mattjohnsonpint deleted the mjp/agents-start-on-demand branch July 14, 2025 20:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants