fix: pairing approval channel_type mismatch and UnifiedSessionStore message loss#1892
Conversation
|
@coderabbitai review |
|
/review |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
✅ Action performedReview finished.
|
|
Important Review skippedBot user detected. To trigger a single review, invoke the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR fixes the pairing-approval DM to use normalized channel types, and substantially improves ChangesPairing approval DM channel type normalization
Session store concurrent write and file-locking improvements
Sequence Diagram(s)No sequence diagrams generated. The pairing change is a single-line parameter fix, and the session concurrency changes are internal implementation details (file locking, message merging) without a clear multi-component flow that benefits from visualization. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings. Review areas:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/praisonai/tests/unit/cli/test_unified_session.py (1)
278-279: 💤 Low valueRedundant
load()call.Line 278 loads the session but discards the result; line 279 loads again and assigns to
session. The first call appears unnecessary.Proposed fix
- writer.load("shared-session") - session = writer.load("shared-session") + session = writer.load("shared-session")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai/tests/unit/cli/test_unified_session.py` around lines 278 - 279, The redundant call to writer.load("shared-session") should be removed: keep the single call that assigns the result to session (session = writer.load("shared-session")) and delete the preceding standalone writer.load("shared-session") invocation so the session is only loaded once; locate these calls in the test_unified_session.py test where writer.load is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/praisonai/praisonai/cli/session/unified.py`:
- Around line 306-309: The fallback write path that performs f.seek(0),
f.truncate(), json_data = json.dumps(session.to_dict(),
indent=2).encode('utf-8'), and f.write(json_data) must also call f.flush() and
os.fsync(f.fileno()) after the write to match the Windows/Unix branches and
ensure durability; update the fallback block (around the write using variables f
and json_data) to flush and fsync the file descriptor (add an import/use of
os.fsync if not already available).
- Around line 163-172: Windows locking currently passes a length of 1 to
msvcrt.locking, which only locks one byte; update the locking calls in the
with-open blocks used by load() and save() (the blocks that import msvcrt and
call msvcrt.locking) to lock the full file range by computing the file size
before locking (e.g., use f.seek(0, os.SEEK_END) or os.path.getsize(path) to get
size and use max(1, size) as the length), seek back to the start as needed, call
msvcrt.locking(..., msvcrt.LK_RLCK, length) before read/write and use the same
length when unlocking with msvcrt.LK_UNLCK; apply this change to all occurrences
(the load/save methods and any other msvcrt.locking calls).
---
Nitpick comments:
In `@src/praisonai/tests/unit/cli/test_unified_session.py`:
- Around line 278-279: The redundant call to writer.load("shared-session")
should be removed: keep the single call that assigns the result to session
(session = writer.load("shared-session")) and delete the preceding standalone
writer.load("shared-session") invocation so the session is only loaded once;
locate these calls in the test_unified_session.py test where writer.load is
used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 842e654c-510d-4dca-b4c7-03877c5bc327
📒 Files selected for processing (5)
src/praisonai/praisonai/bots/_unknown_user.pysrc/praisonai/praisonai/cli/session/unified.pysrc/praisonai/tests/integration/bots/test_pairing_agent_e2e.pysrc/praisonai/tests/integration/bots/test_pairing_owner_dm.pysrc/praisonai/tests/unit/cli/test_unified_session.py
| with open(path, 'rb') as f: | ||
| if sys.platform == "win32": | ||
| import msvcrt | ||
| f.seek(0) | ||
| msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, 1) | ||
| try: | ||
| json_data = f.read().decode('utf-8') | ||
| finally: | ||
| f.seek(0) | ||
| msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1) |
There was a problem hiding this comment.
msvcrt.locking only locks 1 byte, making Windows file locking ineffective.
The call msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, 1) locks only 1 byte starting from the current file position. Since session JSON files are always larger than 1 byte, concurrent processes can still read/write past the first byte, defeating the locking mechanism entirely on Windows.
To lock the entire file, you need to lock a range covering the expected file size or use a large sentinel value:
Proposed fix
def _read_disk_session(self, path: Path) -> Optional[UnifiedSession]:
"""Read session from disk without updating cache."""
if not path.exists():
return None
try:
with open(path, 'rb') as f:
if sys.platform == "win32":
import msvcrt
+ # Lock entire file by using file size (or large value for empty files)
+ lock_length = max(os.path.getsize(path), 1)
f.seek(0)
- msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, 1)
+ msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, lock_length)
try:
json_data = f.read().decode('utf-8')
finally:
f.seek(0)
- msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1)
+ msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, lock_length)The same issue exists in save() (lines 215, 246) and load() (lines 339, 345) — all Windows msvcrt.locking calls need to lock the appropriate byte range.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai/praisonai/cli/session/unified.py` around lines 163 - 172,
Windows locking currently passes a length of 1 to msvcrt.locking, which only
locks one byte; update the locking calls in the with-open blocks used by load()
and save() (the blocks that import msvcrt and call msvcrt.locking) to lock the
full file range by computing the file size before locking (e.g., use f.seek(0,
os.SEEK_END) or os.path.getsize(path) to get size and use max(1, size) as the
length), seek back to the start as needed, call msvcrt.locking(...,
msvcrt.LK_RLCK, length) before read/write and use the same length when unlocking
with msvcrt.LK_UNLCK; apply this change to all occurrences (the load/save
methods and any other msvcrt.locking calls).
| f.seek(0) | ||
| f.truncate() # Clear file after acquiring lock | ||
| f.truncate() | ||
| json_data = json.dumps(session.to_dict(), indent=2).encode('utf-8') | ||
| f.write(json_data) |
There was a problem hiding this comment.
Missing flush() and fsync() in fallback write path.
The Windows and Unix branches both call f.flush() and os.fsync(f.fileno()) to ensure data durability, but the fallback path (when file locking is unavailable) skips these calls. While this path is already degraded, adding flush/fsync would provide consistent durability guarantees.
Proposed fix
f.seek(0)
f.truncate()
json_data = json.dumps(session.to_dict(), indent=2).encode('utf-8')
f.write(json_data)
+ f.flush()
+ os.fsync(f.fileno())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| f.seek(0) | |
| f.truncate() # Clear file after acquiring lock | |
| f.truncate() | |
| json_data = json.dumps(session.to_dict(), indent=2).encode('utf-8') | |
| f.write(json_data) | |
| f.seek(0) | |
| f.truncate() | |
| json_data = json.dumps(session.to_dict(), indent=2).encode('utf-8') | |
| f.write(json_data) | |
| f.flush() | |
| os.fsync(f.fileno()) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai/praisonai/cli/session/unified.py` around lines 306 - 309, The
fallback write path that performs f.seek(0), f.truncate(), json_data =
json.dumps(session.to_dict(), indent=2).encode('utf-8'), and f.write(json_data)
must also call f.flush() and os.fsync(f.fileno()) after the write to match the
Windows/Unix branches and ensure durability; update the fallback block (around
the write using variables f and json_data) to flush and fsync the file
descriptor (add an import/use of os.fsync if not already available).
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Claude finished @MervinPraison's task —— View job Architecture Review and Issue Fixes ✅Phase 1: Architecture Review per AGENTS.md
Phase 2: Fix Valid Issues
Phase 3: Final Verdict
Phase 1: Architecture Review ✅Protocol-driven compliance: The changes follow AGENTS.md architecture:
Backward compatibility: No breaking API changes, only internal fixes ✅ Performance: No hot-path regressions - only session file I/O improvements ✅ Phase 2: Critical Issues Fixed ✅Fixed Issue 1: Windows file locking ineffective (CRITICAL)
Fixed Issue 2: Missing durability in fallback path
Fixed Issue 3: Redundant test code
Files Modified:
Commit: Phase 3: Final Verdict ✅APPROVED - This PR is ready to merge: ✅ Fixes critical bugs: Pairing approval and session persistence issues The two main fixes address legitimate production issues:
No blocking issues remain. |
|
Merge review — blocked
Fix list:
|
|
Claude finished @MervinPraison's task —— View job Merge Conflict Resolution ✅Task List:
✅ Resolution CompleteSuccessfully resolved all merge conflicts and rebased onto main branch: Conflicts Resolved:
Critical Fixes Preserved:✅ Pairing approval fix: Branch Status:
The PR is now ready for merge with all critical bug fixes intact and properly rebased on the latest main branch. |
…essage loss - Pass platform channel_type (not chat ID) into pairing callback payloads so verify_and_pair keys match is_paired lookups after owner approval - Reload and merge messages under lock in UnifiedSessionStore.save() to prevent stale-cache overwrites losing concurrent CLI/TUI session writes - Add regression tests for both failure modes Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
…edSessionStore - Fix Windows file locking to lock entire file instead of just 1 byte - Add missing flush/fsync in fallback write path for consistency - Remove redundant load() call in test_stale_cache_save_preserves_concurrent_messages Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
ea06446 to
dc95723
Compare
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Greptile SummaryThis PR fixes two correctness bugs: a channel-type mismatch in the bot pairing approval flow that prevented owners' approvals from ever taking effect, and a stale-cache/overwrite race in
Confidence Score: 3/5The pairing fix is correct and safe, but the Windows locking change introduces a lock/unlock range mismatch that will permanently lock session files on Windows after the first save, and the tests guarding the pairing fix are missing async decorators so they may not have actually run. The _release_exclusive_lock path on Windows recomputes lock_length from the post-write file size, which will almost always differ from the pre-write size locked by _acquire_exclusive_lock. msvcrt.locking(LK_UNLCK) requires the exact same byte range; a mismatch raises IOError and leaves the file locked. Additionally, the async tests in test_pairing_owner_dm.py are missing @pytest.mark.asyncio, so the reported pass count likely reflects coroutines that were collected but never awaited. unified.py (Windows lock/unlock mismatch) and test_pairing_owner_dm.py (missing async test decorators) Important Files Changed
Sequence DiagramsequenceDiagram
participant U as Unknown User
participant H as UnknownUserHandler
participant PS as PairingStore
participant A as BotAdapter
participant O as Owner
U->>H: send message
H->>PS: is_paired(user_id, channel_type)?
PS-->>H: false
H->>PS: generate_code(channel_type)
PS-->>H: code
H->>A: send_approval_dm(owner_id, user_name, code, channel_type, user_id)
Note over H,A: Fix: was passing channel_id (e.g. "dm-123"),<br/>now passes channel_type (e.g. "telegram")
A->>O: DM with inline Approve button
O->>A: tap Approve - callback_data contains channel_type
A->>PS: verify_and_pair(code, user_id, channel_type)
PS-->>A: paired
U->>H: send second message
H->>PS: is_paired(user_id, channel_type)?
PS-->>H: true
H-->>U: message allowed through
|
| @@ -198,8 +202,12 @@ def _acquire_exclusive_lock(self, file_obj) -> None: | |||
| def _release_exclusive_lock(self, file_obj) -> None: | |||
| if sys.platform == "win32": | |||
| import msvcrt | |||
| # Use the same lock length as acquisition | |||
| file_obj.seek(0, os.SEEK_END) | |||
| file_size = file_obj.tell() | |||
| lock_length = max(file_size, 1) | |||
| file_obj.seek(0) | |||
| msvcrt.locking(file_obj.fileno(), msvcrt.LK_UNLCK, 1) | |||
| msvcrt.locking(file_obj.fileno(), msvcrt.LK_UNLCK, lock_length) | |||
| elif _HAS_FCNTL: | |||
| fcntl.flock(file_obj.fileno(), fcntl.LOCK_UN) | |||
There was a problem hiding this comment.
Windows unlock range mismatch after file resize
_release_exclusive_lock recomputes lock_length from the file's current (post-write) size, but _acquire_exclusive_lock locked the file's pre-write size. msvcrt.locking(LK_UNLCK, n) must receive the exact same byte-count that was passed to LK_LOCK; any mismatch raises IOError: [Errno 13] Permission denied, leaving the file permanently locked for the lifetime of the process. This will reliably trigger whenever _write_json_locked changes the JSON payload size (virtually every merge/save cycle).
| # Large constant that covers any realistic session-file size; both | |
| # acquire and release must use the identical value so that | |
| # msvcrt.locking(LK_UNLCK) matches the locked region exactly. | |
| _WIN32_LOCK_LENGTH = 1 << 30 # 1 GiB | |
| def _acquire_exclusive_lock(self, file_obj) -> None: | |
| if sys.platform == "win32": | |
| import msvcrt | |
| file_obj.seek(0) | |
| msvcrt.locking(file_obj.fileno(), msvcrt.LK_LOCK, self._WIN32_LOCK_LENGTH) | |
| elif _HAS_FCNTL: | |
| fcntl.flock(file_obj.fileno(), fcntl.LOCK_EX) | |
| else: | |
| global _WARNED_NO_FCNTL | |
| if not _WARNED_NO_FCNTL: | |
| logger.warning( | |
| "File locking unavailable on this platform (fcntl not available); " | |
| "concurrent writers may corrupt session files." | |
| ) | |
| _WARNED_NO_FCNTL = True | |
| def _release_exclusive_lock(self, file_obj) -> None: | |
| if sys.platform == "win32": | |
| import msvcrt | |
| file_obj.seek(0) | |
| msvcrt.locking(file_obj.fileno(), msvcrt.LK_UNLCK, self._WIN32_LOCK_LENGTH) | |
| elif _HAS_FCNTL: | |
| fcntl.flock(file_obj.fileno(), fcntl.LOCK_UN) |
Summary
Critical bug scan found two high-severity correctness issues in recent code paths.
Bug 1: Pairing inline approval never grants access
Impact: Owner taps Approve on Telegram/Discord/Slack inline buttons; UI shows success but the requester remains blocked on subsequent messages.
Root cause:
_unknown_user.pypassed the chat ID (message.channel.channel_id) intosend_approval_dm, which embedded it in callback data as thechannelfield.handle_approval_callbackstored that value aschannel_typeinPairingStore, butis_paired()checks against the platform type (telegram,discord,slack).Fix: Pass
channel_typeinstead of chat ID into the approval DM callback payload.Bug 2: UnifiedSessionStore loses concurrent messages
Impact: Data loss in CLI
--interactive/ TUI sessions when two processes (or stale in-memory cache) write to the same session file.Root cause:
UnifiedSessionStorewas not updated whenDefaultSessionStore/HierarchicalSessionStorereceived locked read-modify-write fixes.load()returned a stale in-process cache;save()overwrote the full file without reloading from disk.Fix: Reload and merge messages under lock in
save(); always read from disk inload().Validation
tests/unit/cli/test_unified_session.py— newtest_stale_cache_save_preserves_concurrent_messagestests/integration/bots/test_pairing_owner_dm.py— approval flow uses real callback payloadtests/integration/bots/test_pairing_agent_e2e.py— updated channel expectationsSummary by CodeRabbit
Release Notes