fix: prevent CLI session message loss on concurrent saves by cursor[bot] · Pull Request #1854 · MervinPraison/PraisonAI

cursor · 2026-06-04T09:09:13Z

Bug and impact

UnifiedSessionStore.save() wrote the full in-memory session to disk without reloading first. When two CLI/TUI processes (or threads) used the same session file—e.g. TUI plus --interactive, or two terminals resuming the same session—the last writer could overwrite newer chat messages from the other process.

Severity: Data loss of user chat history.

Root cause

save() acquired a file lock but performed a blind overwrite of the JSON file. Process A could load messages 1–10, process B could append message 11 and save, then process A could save its stale copy and drop message 11.

This is the same class of bug fixed for DefaultSessionStore in #1781/#1790, but it remained in the CLI layer after #1837 added cross-platform locking without read-modify-write.

Fix

Reload session JSON under an exclusive lock before writing
Merge messages from disk and incoming session (dedupe by role/content/timestamp)
Track file mtimes so load() refreshes cache when another process updates the file

Validation

pytest tests/unit/cli/test_unified_session.py — 22 passed (including new stale-save and concurrent-save tests)

Summary by CodeRabbit

Bug Fixes
- Enhanced session persistence to reliably handle concurrent updates across multiple processes
- Improved message deduplication and metadata handling during concurrent session access
- Prevents message loss when sessions are updated simultaneously from multiple sources
Tests
- Added unit tests for concurrent session operations

UnifiedSessionStore.save() overwrote the session file without reloading disk state first. Two CLI/TUI processes sharing a session could lose chat history when one saved from a stale in-memory copy. - Reload and merge messages under an exclusive file lock before writing - Invalidate in-process cache when the on-disk file is newer - Add regression tests for stale-save and concurrent writer scenarios Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

MervinPraison · 2026-06-04T09:09:23Z

@coderabbitai review

MervinPraison · 2026-06-04T09:09:24Z

/review

coderabbitai · 2026-06-04T09:09:29Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-06-04T09:09:38Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 120438a0-3865-438c-a47e-95d2398e89d3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

UnifiedSessionStore now ensures multi-process safety by maintaining a re-entrant lock and in-memory cache with per-session mtime tracking. When saving, existing on-disk sessions are merged with incoming changes by deduplicating messages and resolving counters. When loading, cache freshness is validated via mtime; stale cache is discarded. Session deletion evicts cache under lock. Two new tests verify stale-copy and parallel-writer scenarios.

Changes

Concurrent-safe session store with merge conflict handling

Layer / File(s)	Summary
Threading support and merge conflict resolution `src/praisonai/praisonai/cli/session/unified.py`	Adds `threading` import for synchronization. Introduces re-entrant lock, mtime-tracking cache dict, and `_merge_sessions()` helper that deduplicates messages by role/content/timestamp and resolves token/cost/request counters via `max`.
Exclusive-lock and JSON I/O helpers `src/praisonai/praisonai/cli/session/unified.py`	Centralizes exclusive-lock acquisition/release and JSON read/write operations with truncation, flushing, and `fsync` to ensure atomicity and durability.
Concurrent-safe save and load with cache validation `src/praisonai/praisonai/cli/session/unified.py`	`save()` uses exclusive-lock helpers and `_merge_sessions()` to combine concurrent on-disk and incoming sessions, then updates cache/mtimes under lock. `load()` validates cache freshness via `_is_cache_fresh()` mtime comparison; returns cached only if fresh; otherwise reads JSON under lock and refreshes cache.
Thread-safe session deletion `src/praisonai/praisonai/cli/session/unified.py`	Updates `delete()` to evict session and mtime cache entries under the re-entrant lock.
Concurrency behavior tests `src/praisonai/tests/unit/cli/test_unified_session.py`	Adds `ThreadPoolExecutor` import. Two new tests: stale-copy preservation (verifies concurrent saves merge messages without loss) and concurrent writers (verifies parallel writers all save distinct messages).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

MervinPraison/PraisonAI#1837: Both PRs modify UnifiedSessionStore file locking in save/load paths with exclusive locks and centralized lock/I/O helpers.
MervinPraison/PraisonAI#1759: Both PRs add mtime-based cache staleness detection to the session read path to avoid returning stale in-memory cached history.
MervinPraison/PraisonAI#1745: Both PRs prevent concurrent-write message loss via locked read-modify-write with merge/deduplication semantics in session stores.

Suggested reviewers

MervinPraison

Poem

🐰 Locks and merges, caches bright,
Threads no longer race through night,
Messages preserved, not lost in time,
Sessions sync in perfect rhyme! 🔐✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main fix: preventing message loss during concurrent session saves in the CLI layer.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cursor/critical-bug-investigation-7b44

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MervinPraison · 2026-06-04T09:09:47Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
Performance: Any module-level heavy imports? Hot-path regressions?
Tests: Are tests included? Do they cover the changes adequately?
Backward compat: Any public API changes without deprecation?
Code quality: DRY violations, naming conventions, error handling?
Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
Suggest specific improvements with code examples where possible

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/praisonai/praisonai/cli/session/unified.py`:
- Around line 356-360: The deletion block around path.exists() / path.unlink()
is susceptible to a race where unlink() raises FileNotFoundError; modify the
code in the session cleanup logic (the branch that currently checks
path.exists() then calls path.unlink()) to either call
path.unlink(missing_ok=True) or wrap path.unlink() in a try/except
FileNotFoundError and ignore it, then acquire self._lock and pop from
self._cache and self._cache_mtimes as before; ensure the same unique symbols are
used (path.unlink, self._lock, self._cache.pop, self._cache_mtimes.pop) so the
change is localized and thread/process-safe.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 857221f2-d06e-486a-aaf3-402b846f416c

📥 Commits

Reviewing files that changed from the base of the PR and between a9f4bd5 and 2d7f67a.

📒 Files selected for processing (2)

src/praisonai/praisonai/cli/session/unified.py
src/praisonai/tests/unit/cli/test_unified_session.py

coderabbitai · 2026-06-04T09:14:44Z

        if path.exists():
            path.unlink()
-            self._cache.pop(session_id, None)
+            with self._lock:
+                self._cache.pop(session_id, None)
+                self._cache_mtimes.pop(session_id, None)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Concurrent delete could raise FileNotFoundError.

Between path.exists() check and path.unlink(), another process could delete the file, causing unlink() to raise. Consider using missing_ok=True (Python 3.8+) for robustness.

Proposed fix

if path.exists(): - path.unlink() + path.unlink(missing_ok=True) with self._lock:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if path.exists():

path.unlink()

self._cache.pop(session_id, None)

with self._lock:

self._cache.pop(session_id, None)

self._cache_mtimes.pop(session_id, None)

if path.exists():

path.unlink(missing_ok=True)

with self._lock:

self._cache.pop(session_id, None)

self._cache_mtimes.pop(session_id, None)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai/praisonai/cli/session/unified.py` around lines 356 - 360, The deletion block around path.exists() / path.unlink() is susceptible to a race where unlink() raises FileNotFoundError; modify the code in the session cleanup logic (the branch that currently checks path.exists() then calls path.unlink()) to either call path.unlink(missing_ok=True) or wrap path.unlink() in a try/except FileNotFoundError and ignore it, then acquire self._lock and pop from self._cache and self._cache_mtimes as before; ensure the same unique symbols are used (path.unlink, self._lock, self._cache.pop, self._cache_mtimes.pop) so the change is localized and thread/process-safe.

MervinPraison · 2026-06-04T09:19:42Z

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

Protocol-driven: check heavy implementations vs core SDK
Backward compatible: ensure zero feature regressions
Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

praisonai-triage-agent · 2026-06-04T09:20:05Z

Use missing_ok=True to handle concurrent file deletion gracefully. Fixes potential race where another process deletes file between exists() check and unlink() call. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

qodo-code-review · 2026-06-12T09:56:01Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

greptile-apps · 2026-06-12T10:01:12Z

Greptile Summary

This PR adds read-modify-write semantics to UnifiedSessionStore.save(): it reloads the on-disk session under an exclusive file lock, merges messages by deduplicating on (role, content, timestamp), and uses mtime-based cache invalidation in load() to detect concurrent updates. The approach correctly prevents message loss for the primary case described.

Message merge: concurrent writers now each read the latest disk state before writing, so chat history from other processes is preserved; deduplication relies on exact timestamp equality, meaning the same user message sent at different milliseconds is never collapsed.
Token/cost stats: _merge_sessions takes max() of cumulative counters (total_input_tokens, total_output_tokens, total_cost, request_count), which under-counts when both processes accumulate against the same baseline — this is a residual correctness gap not addressed by the PR.
Read path regression: load() was changed to open files with "r+b" (read-write) rather than "rb" (read-only), silently turning a permission-denied error into a missing-session result for read-only files.

Confidence Score: 3/5

The core message-preservation fix is sound, but two issues in the changed code warrant attention before merging: a read path that now requires write file permissions (silently surfacing as a missing session), and cumulative stats that are silently under-counted when two processes work from the same baseline.

The read-modify-write locking strategy and message deduplication logic are correct for their primary goal. However, load() now opens session files in read-write mode when only a read is needed — a permission-denied failure is swallowed as None, making a valid session appear missing. Additionally, _merge_sessions uses max() for additive counters (total_input_tokens, total_cost, request_count), meaning concurrent use will produce lower-than-actual totals for billing and usage tracking. These two issues affect observable correctness on the changed paths and should be resolved before the PR lands.

src/praisonai/praisonai/cli/session/unified.py — specifically the load() open-mode regression and the stat-after-unlock window in save(); the test file is clean.

Important Files Changed

Filename	Overview
src/praisonai/praisonai/cli/session/unified.py	Adds read-modify-write under file lock plus mtime-based cache invalidation; two logic issues remain: load() opens files r+b (needs write permission for a read-only operation) and _merge_sessions uses max() for additive token/cost counters, causing under-counting in concurrent scenarios.
src/praisonai/tests/unit/cli/test_unified_session.py	Adds two well-designed concurrency tests: a stale-save scenario and an 8-writer thread-pool test; tests cover message preservation but do not exercise the token/cost counting paths under concurrent saves.

Sequence Diagram

sequenceDiagram
    participant A as Process A
    participant F as Session File
    participant B as Process B

    Note over A,B: Concurrent save scenario (new behaviour)

    A->>F: open("r+b") + LOCK_EX
    B->>F: open("r+b") [blocks on lock]
    A->>F: read existing JSON
    A->>A: merge(on_disk, incoming_A)
    A->>F: write merged JSON + fsync
    A->>F: release lock
    Note over A: stat().st_mtime → cache_A
    B->>F: acquire LOCK_EX
    B->>F: read merged JSON (includes A's messages)
    B->>B: merge(on_disk, incoming_B)
    B->>F: write merged JSON + fsync
    B->>F: release lock
    Note over B: stat().st_mtime → cache_B

    Note over A,B: load() cache-freshness check
    A->>F: stat().st_mtime
    alt "mtime <= cached_mtime"
        A->>A: return in-memory cache
    else "mtime > cached_mtime"
        A->>F: open("r+b") + LOCK_EX
        A->>F: read latest JSON
        A->>F: release lock
        A->>A: update cache + mtime
    end

_{Reviews (1): Last reviewed commit: "fix: prevent FileNotFoundError race cond..." | Re-trigger Greptile}

greptile-apps · 2026-06-12T10:01:16Z

+        merged.total_input_tokens = max(
+            on_disk.total_input_tokens, incoming.total_input_tokens
+        )
+        merged.total_output_tokens = max(
+            on_disk.total_output_tokens, incoming.total_output_tokens
+        )
+        merged.total_cost = max(on_disk.total_cost, incoming.total_cost)
+        merged.request_count = max(on_disk.request_count, incoming.request_count)


total_input_tokens, total_output_tokens, total_cost, and request_count are cumulative accumulators, not instantaneous values. Using max() silently discards the contribution from whichever process has the smaller running total. For example, if both processes started from a baseline of 100 tokens and Process A added 50 while Process B added 70, max(150, 170) = 170 drops Process A's 50-token contribution entirely. The correct total would require tracking each process's delta from the shared baseline, which the current data model doesn't support. At minimum this should be noted as a known limitation; if exact accounting matters, consider storing per-process deltas or using an append-only log.

Suggested change

merged.total_input_tokens = max(

on_disk.total_input_tokens, incoming.total_input_tokens

)

merged.total_output_tokens = max(

on_disk.total_output_tokens, incoming.total_output_tokens

)

merged.total_cost = max(on_disk.total_cost, incoming.total_cost)

merged.request_count = max(on_disk.request_count, incoming.request_count)

# NOTE: cumulative counters cannot be merged correctly without knowing the

# shared baseline each process started from; max() avoids a total reset but

# will under-count when both processes accumulate from the same starting

# point. Tracked as a known limitation of the current data model.

merged.total_input_tokens = max(

on_disk.total_input_tokens, incoming.total_input_tokens

)

merged.total_output_tokens = max(

on_disk.total_output_tokens, incoming.total_output_tokens

)

merged.total_cost = max(on_disk.total_cost, incoming.total_cost)

merged.request_count = max(on_disk.request_count, incoming.request_count)

greptile-apps · 2026-06-12T10:01:17Z

+            with open(path, "r+b") as f:
+                self._acquire_exclusive_lock(f)
+                try:
+                    data = self._read_json_locked(f)
+                finally:
+                    self._release_exclusive_lock(f)


load() now opens the session file with "r+b" (read-write) mode even though it never writes anything. This silently regresses read-only access: if a session file has been chmod'd to 0444 (or any other state where the process has read but not write permission), open(path, "r+b") raises a PermissionError that is caught and turned into None, making a perfectly valid session appear missing. The original code correctly used "rb" for reads. A shared (LOCK_SH) lock is also sufficient here and avoids serialising concurrent readers.

Suggested change

with open(path, "r+b") as f:

self._acquire_exclusive_lock(f)

try:

data = self._read_json_locked(f)

finally:

self._release_exclusive_lock(f)

with open(path, "rb") as f:

if sys.platform == "win32":

import msvcrt

f.seek(0)

msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, 1)

try:

data = self._read_json_locked(f)

finally:

f.seek(0)

msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1)

elif _HAS_FCNTL:

fcntl.flock(f.fileno(), fcntl.LOCK_SH)

try:

data = self._read_json_locked(f)

finally:

fcntl.flock(f.fileno(), fcntl.LOCK_UN)

else:

data = self._read_json_locked(f)

greptile-apps · 2026-06-12T10:01:18Z

+            try:
+                mtime = path.stat().st_mtime
+            except OSError:
+                mtime = datetime.now().timestamp()
+
+            with self._lock:
+                self._cache[session.session_id] = to_save
+                self._cache_mtimes[session.session_id] = mtime


mtime race between write and stat

The file lock is released when the with open(...) block exits (line 248), but path.stat().st_mtime is read after that point. Another process could write the file in the gap, giving us their later mtime. We then store (our merged content, their mtime) in the cache. On the next load(), _is_cache_fresh compares current_mtime <= cached_mtime and may return True, serving our stale in-memory copy instead of re-reading from disk. The caller would see incomplete message history until the next invalidating event. Reading st_mtime while still holding the file lock (i.e., before releasing it) would close the window.

Copilot AI requested a review from MervinPraison June 4, 2026 09:09

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

MervinPraison marked this pull request as ready for review June 12, 2026 09:55

MervinPraison merged commit 7427367 into main Jun 12, 2026
5 of 6 checks passed

MervinPraison mentioned this pull request Jun 12, 2026

docs: Document concurrent-save safety for CLI UnifiedSessionStore (PR #1854) MervinPraison/PraisonAIDocs#526

Open

8 tasks

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Uh oh!

Conversation

cursor Bot commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug and impact

Root cause

Fix

Validation

Summary by CodeRabbit

Uh oh!

MervinPraison commented Jun 4, 2026

Uh oh!

MervinPraison commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

MervinPraison commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MervinPraison commented Jun 4, 2026

Uh oh!

praisonai-triage-agent Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final Architecture Review ✅

✅ APPROVED - Architecture Review Complete

Architecture Compliance (AGENTS.md)

Code Quality Assessment

Issues Addressed

Summary

Uh oh!

qodo-code-review Bot commented Jun 12, 2026

Qodo reviews are paused for this user.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 12, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cursor Bot commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

praisonai-triage-agent Bot commented Jun 4, 2026 •

edited

Loading