fix(matrix): @mention detection, sync reliability, and duplicate code cleanup#1057
Open
eldelosdatos wants to merge 1 commit intoRightNow-AI:mainfrom
Open
Conversation
… cleanup ## Problem Three issues with the Matrix channel adapter: ### 1. @mention detection broken for Element clients (group_policy=mention_only unusable) Element sends @mentions as HTML pills in `formatted_body` and via `m.mentions.user_ids[]`, but NOT in the plain text `body`. The adapter only checked `body`, so all Element @mentions were missed. This made `group_policy=mention_only` (the default) effectively ignore all messages in group rooms when using Element. ### 2. HTTP client has no timeout (sync can hang forever) `reqwest::Client::new()` creates a client with no timeout. If the homeserver drops the TCP connection without sending RST/FIN (common in containerized deployments behind proxies), the sync request hangs forever, silently killing message reception. ### 3. Duplicate mention/DM detection code The FIX RightNow-AI#2 and FIX RightNow-AI#3 blocks (mention detection + DM detection) were duplicated in the sync loop, causing `metadata` to be overwritten by the second pass. ## Fix ### Mention detection (matrix.rs) Now checks three sources for @mentions: 1. `content.body` — plain text (CLI/API clients) 2. `content.formatted_body` — HTML pills (Element, FluffyChat, etc.) 3. `content.m.mentions.user_ids[]` — Matrix spec MSC3952 standard ### HTTP client timeout (matrix.rs) Added `timeout(90s)` and `connect_timeout(30s)` to the reqwest Client builder. The Matrix /sync uses `timeout=30000ms` (30s long-poll), so 90s gives 60s margin for network latency while ensuring hung connections are detected and retried. ### Duplicate code removal (matrix.rs) Removed the duplicated FIX RightNow-AI#2 + FIX RightNow-AI#3 blocks that overwrote `metadata`. ### Heartbeat (agents/assistant/agent.toml) Added `heartbeat_interval_secs = 300` to the assistant agent. The default (30s) creates a timeout of 60s, causing idle agents to enter a crash-recovery loop every 90 seconds. With 300s the timeout is 600s (10 min), eliminating false positives for idle agents. ### Observability (matrix.rs + bridge.rs) Added structured logging at every decision point: - Sync loop: iteration counter, shutdown channel state, error details - Bridge loop: message receipt, iteration counter, shutdown handling - Dispatch: policy check results, agent routing, RBAC, send result ### Tests (matrix.rs) Added 9 integration tests using wiremock to validate: - Auth success/failure - Message reception and dispatch - Own-message filtering - Error retry with backoff - Clean shutdown - Command parsing - Event deduplication - Room allowlist filtering ## Breaking Changes None. All changes are backward-compatible. ## Test Plan - [x] `cargo check -p openfang-channels` passes - [x] `cargo test -p openfang-channels` — 12/12 tests pass - [x] Tested on Railway with Synapse homeserver - [x] Verified Element @mentions detected via formatted_body - [x] Verified sync recovery after connection drops
Member
|
Thanks for the matrix reliability work @eldelosdatos — the sync retry + Ship-blocker in the current head:
This is invalid TOML and will break every deployment that uses the bundled
Rebase on latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
formatted_bodyand viam.mentions.user_ids[], not in plain textbody. The adapter only checkedbody, makinggroup_policy=mention_only(default) silently drop all messages in group rooms.reqwest::Client::new()has no timeout. If the homeserver drops TCP silently,/synchangs forever, killing message reception permanently.metadataon the second pass.Problem Details
@mention detection (critical)
Element sends:
hola(no MXID)<a href="https://matrix.to/#/@bot:server">Bot</a> hola["@bot:server"]HTTP timeout
Duplicate code
The FIX #2 (mention detection) and FIX #3 (DM detection) blocks were copy-pasted twice in the sync loop. The second copy overwrites
metadatafrom the first, losing the DM detection results.Test Plan
cargo check -p openfang-channels— compiles cleanlycargo test -p openfang-channels— 12/12 tests pass (9 new + 3 existing)formatted_bodyandm.mentionsgroup_policy=mention_onlyworks correctly with Element pillsNew Tests
test_adapter_auth_and_starttest_adapter_auth_failuretest_sync_receives_messagetest_sync_skips_own_messagestest_sync_retries_on_errortest_shutdown_stops_synctest_command_parsing/command argsparsingtest_dedup_eventstest_allowed_rooms_filterBreaking Changes
None. All changes are backward-compatible.
🤖 Generated with Claude Code