Skip to content

fix(routines): resolve message tool channel/target from per-job metadata#708

Open
ilblackdragon wants to merge 4 commits intomainfrom
fix/message-tool-per-job-notify
Open

fix(routines): resolve message tool channel/target from per-job metadata#708
ilblackdragon wants to merge 4 commits intomainfrom
fix/message-tool-per-job-notify

Conversation

@ilblackdragon
Copy link
Member

Summary

  • When a routine's notify.channel is None, the message tool in full-job workers failed with "No target specified and no active conversation" because set_message_tool_context() was only called when channel was Some
  • The previous approach also mutated shared global state (Arc<RwLock> on MessageTool) which raced with concurrent routine jobs (flagged by an existing TODO)
  • Now the routine's notify config is carried in per-job metadata JSON, and MessageTool::execute uses a 3-tier fallback: explicit params → conversation defaults → JobContext.metadata

Test plan

  • message_tool_falls_back_to_job_metadata — verifies metadata fallback resolves channel/target without conversation context
  • message_tool_no_metadata_still_errors — verifies clean error when nothing is configured
  • All 19 existing message tool tests pass (no regressions)
  • cargo clippy --all --all-features — zero warnings
  • Manual: run a routine with notify.channel = None and verify the message tool no longer errors with "No target specified"

🤖 Generated with Claude Code

When a routine's notify.channel is None, the message tool had no way to
resolve channel/target for full-job workers, causing "No target specified"
errors. The previous approach mutated shared global state via
set_message_tool_context(), which also raced with concurrent jobs.

Now the routine's notify config (channel + user) is carried in the job's
metadata JSON, and MessageTool::execute falls back to ctx.metadata when
neither explicit params nor conversation defaults are available. This
eliminates both the None-channel bug and the concurrent-job race.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 02:11
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 8, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in full-job routine workers where the MessageTool failed at execution time because no channel/target context was available. The old approach mutated shared global state on MessageTool (Arc<RwLock>) via set_message_tool_context(), but this call was guarded behind an if let Some(channel) check — meaning it was skipped entirely when notify.channel was None. The fix moves the notify config into per-job JobContext metadata, which is threadsafe and eliminates the race condition with concurrent routine jobs.

Changes:

  • MessageTool::execute gains a 3-tier fallback for both channel and target: explicit params → conversation defaults (default_channel/default_target) → JobContext.metadata keys.
  • execute_full_job in routine_engine.rs now unconditionally writes notify_user to job metadata, and conditionally writes notify_channel when the routine's notify.channel is Some.
  • Two regression tests are added to message.rs to verify the metadata fallback and the clean error path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/tools/builtin/message.rs Adds JobContext metadata as third tier in channel/target resolution; adds two new tests
src/agent/routine_engine.rs Replaces global-state mutation with per-job metadata population; always writes notify_user, conditionally writes notify_channel

Issues found:

  1. Moderate bug — notify.channel == None case still fails for channel resolution: The PR description states it fixes the case where notify.channel is None, but the fix only unconditionally adds notify_user to metadata (line 500). notify_channel is still guarded by if let Some(channel) = &routine.notify.channel (lines 497–498). As a result, when notify.channel is None and the LLM omits the channel parameter in the message tool call, all three tiers of channel resolution in MessageTool::execute fail and the "No channel specified" error is still returned. The NotifyConfig doc comment says channel: None means "broadcast to all channels", suggesting the tool should work in this case. The fix needs to either always populate a sentinel value (e.g., "default" or "broadcast") in metadata when channel is None, or handle the channel-is-None case explicitly in MessageTool::execute.

  2. Nit — stale "see above" reference in comment: The comment at line 491–492 says "Notify config is carried in job metadata (see above)" but there is nothing above this point explaining this mechanism; the metadata is populated immediately below (lines 494–500). The "see above" reference is a dangling relic and should be removed.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// so the LLM can send results without triggering cross-channel approval.
// TODO: This mutates shared global state and can race with concurrent jobs.
// Move notify config into JobContext metadata and apply per-job instead.
// Notify config is carried in job metadata (see above) and read by
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "Notify config is carried in job metadata (see above) and read by MessageTool::execute from JobContext" references "see above" but there is nothing above this function explaining this mechanism. The metadata population happens immediately below (lines 494-500), not above. The comment is self-contradictory: it says the config is carried in metadata, then immediately populates the metadata. The "see above" reference should be removed or replaced with "see below".

Suggested change
// Notify config is carried in job metadata (see above) and read by
// Notify config is carried in job metadata (populated below) and read by

Copilot uses AI. Check for mistakes.
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
metadata["notify_channel"] = serde_json::json!(channel);
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When routine.notify.channel is None, notify_channel is intentionally omitted from the job metadata. However, the PR description states this fix resolves the case where notify.channel is None — yet the message tool's channel resolution (in message.rs lines 115–135) will still fall through all three tiers and return "No channel specified" in that case. Only the notify_user / target resolution is always populated (line 500).

If a full-job routine is expected to work when notify.channel is None (meaning "broadcast to all channels"), the metadata key should still be set unconditionally (e.g., as an empty string or a sentinel value), or the broadcast/default-channel behavior needs to be handled differently in MessageTool::execute. Currently, the channel case is treated symmetrically with target but the population logic is asymmetric: notify_user is always written, notify_channel is only written when non-None.

Suggested change
metadata["notify_channel"] = serde_json::json!(channel);
metadata["notify_channel"] = serde_json::json!(channel);
} else {
// When no explicit channel is configured, write an empty string so
// MessageTool::execute can treat this as "no specific channel" (e.g.,
// broadcast or default-channel behavior) instead of seeing the key
// as entirely absent and reporting "No channel specified".
metadata["notify_channel"] = serde_json::json!("");

Copilot uses AI. Check for mistakes.
Address review feedback:
- Fix stale "see above" comment → "populated below"
- When notify.channel is None, use broadcast_all instead of erroring
  with "No channel specified". This matches NotifyConfig semantics
  where channel=None means "broadcast to all channels"
- Channel resolution is now Option<String>: param → default → metadata → None
- When None, MessageTool uses ChannelManager::broadcast_all(target, response)
  and reports which channels succeeded/failed
- Add regression test for broadcast-all behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 03:04
@github-actions github-actions bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Mar 8, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +227 to +237
let mut failed = Vec::new();
for (ch, result) in &results {
match result {
Ok(()) => succeeded.push(ch.as_str()),
Err(e) => {
tracing::warn!(
channel = %ch,
target = %target,
"broadcast_all: channel failed: {}", e
);
failed.push(ch.as_str());
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed vector is declared and populated but never read or used anywhere after collection. The error handling only checks whether succeeded.is_empty() and uses channel_manager.channel_names() for the error message, making failed dead code. Either remove failed entirely, or use it to include the names of failed channels in the error message (e.g., "Channels that failed: ...") to give the caller more useful diagnostic information.

Copilot uses AI. Check for mistakes.
Comment on lines +491 to +493
// Notify config is carried in job metadata (populated below) and read by
// MessageTool::execute from JobContext — no global state mutation needed.

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 491-492 and 495-496 are two redundant comments explaining the same thing (that notify config is carried in job metadata to avoid global state mutation). The floating comment block at 491-492 is now orphaned from the actual code it describes, while the inline comment at 495-496 sits right above the relevant code. The first comment should be removed to avoid repeating the same explanation twice.

Suggested change
// Notify config is carried in job metadata (populated below) and read by
// MessageTool::execute from JobContext — no global state mutation needed.

Copilot uses AI. Check for mistakes.
Address review feedback:
- Use `failed` vec in error message instead of re-querying channel_names
- Remove redundant orphaned comment block in routine_engine.rs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools size: L 200-499 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants