Stamp last_active in streaming agent loop to prevent heartbeat false-positives by chrisyoung2005 · Pull Request #1090 · RightNow-AI/openfang

chrisyoung2005 · 2026-04-20T04:00:16Z

What

run_agent_loop_streaming was missing the touch_agent() call that the non-streaming run_agent_loop performs before every LLM request. This causes last_active to go stale during slow streaming generations, and the heartbeat monitor flags the agent as unresponsive mid-stream — triggering a crash-recovery cycle.

Why

With a slow local backend (e.g. Ollama qwen3.5:35b generating for minutes, especially under contention from multiple agents sharing one Ollama instance), the agent appears "frozen" to the user. Under the hood, the kernel has already marked it unresponsive, killed the loop, and is restarting it — which re-queues the request, making the problem worse when multiple agents pile on.

The non-streaming path handles this correctly at crates/openfang-runtime/src/agent_loop.rs:446-449:

// Stamp last_active before the (potentially long) LLM call so the
// heartbeat monitor doesn't flag us as unresponsive mid-iteration.
if let Some(k) = &kernel {
    k.touch_agent(&agent_id_str);
}

The streaming path did not have the equivalent, so last_active was only updated between iterations (after streaming finished), not before the long-running call.

Fix

Mirror the non-streaming behavior in run_agent_loop_streaming, immediately before stream_with_retry:

+        // Stamp last_active before the (potentially long) LLM call so the
+        // heartbeat monitor doesn't flag us as unresponsive mid-iteration.
+        if let Some(k) = &kernel {
+            k.touch_agent(&agent_id_str);
+        }
+
         // Stream LLM call with retry, error classification, and circuit breaker
         let provider_name = manifest.model.provider.as_str();
         let mut response = stream_with_retry(

Minimal 6-line change, no behavior change for agents that fit inside the heartbeat window. agent_id_str is already in scope at this point in the function.

Verification

cargo fmt -p openfang-runtime -- --check — clean
cargo clippy -p openfang-runtime --all-targets -- -D warnings — clean
cargo test -p openfang-runtime — 929 passed, 0 failed

Repro path on local Ollama: with heartbeat.default_timeout_secs below actual per-iteration generation time, streaming agents get killed mid-response and re-spawned in a loop. With this patch applied, the same config runs to completion.

🤖 Generated with Claude Code

…positives Fixes RightNow-AI#1089 run_agent_loop_streaming skipped the touch_agent() call that the non-streaming run_agent_loop performs before every LLM request. On slow local inference (e.g. Ollama qwen3.5:35b, multi-minute generations), last_active went stale and the heartbeat monitor flagged the agent as unresponsive, triggering crash recovery mid-stream. With multiple agents sharing one Ollama instance, queued agents appeared frozen while the active one generated. Mirror the non-streaming behavior: stamp last_active immediately before stream_with_retry so the heartbeat window covers the full LLM call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chrisyoung2005 · 2026-04-20T04:16:59Z

Note on failing checks: Format, Clippy, and Security Audit are pre-existing failures on main at the bump v0.6.0 commit — see run 24637821654 on main @ e6bab99, same three checks red.

None of the diffs flagged by cargo fmt --check or cargo clippy are in files this PR touches. Locally against this branch:

cargo fmt -p openfang-runtime -- --check — clean
cargo clippy -p openfang-runtime --all-targets -- -D warnings — clean
cargo test -p openfang-runtime — 929 passed

The Test and Check matrices (ubuntu/macos/windows) all pass here.

chrisyoung2005 mentioned this pull request Apr 20, 2026

Streaming agent loop missing touch_agent → heartbeat false-positives on selected local LLMs #1089

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stamp last_active in streaming agent loop to prevent heartbeat false-positives#1090

Stamp last_active in streaming agent loop to prevent heartbeat false-positives#1090
chrisyoung2005 wants to merge 1 commit intoRightNow-AI:mainfrom
chrisyoung2005:fix/streaming-heartbeat-touch

chrisyoung2005 commented Apr 20, 2026

Uh oh!

chrisyoung2005 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chrisyoung2005 commented Apr 20, 2026

What

Why

Fix

Verification

Uh oh!

chrisyoung2005 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant