Skip to content

Fix push/pull mutex lifecycle in LeaderSyncProcessor#1029

Open
slashv wants to merge 2 commits intolivestorejs:devfrom
slashv:codex/fix-pushpull-mutex-leak
Open

Fix push/pull mutex lifecycle in LeaderSyncProcessor#1029
slashv wants to merge 2 commits intolivestorejs:devfrom
slashv:codex/fix-pushpull-mutex-leak

Conversation

@slashv
Copy link
Contributor

@slashv slashv commented Feb 15, 2026

Preface

Full disclosure, Codex 5.3 was used to both identify and solve this bug. I was experiencing a sync dead-lock in my LiveStore app when doing heavy bursts of events at startup. After implementing this and running my app against LiveStore with this fix it solved the problem.

I have a conceptual understanding of why this occurred as well as the solution but I have not written any code by hand nor have I examined every line in detail.

Problem

What problem does this pull request address?

  • pushPullMutex could remain held on some failure/exit paths.
  • Paginated pull could re-acquire before release, causing stalls/deadlock symptoms.

Solution

How does this change resolve the problem? Mention trade-offs and alternatives considered.

  • Refactor local push critical section to scoped semaphore usage (withPermits(1)).
  • In pull path, track lock ownership across pages and guarantee release on:
    • terminal page (NoMore)
    • chunk failure
    • interruption/stream teardown (ensuring)
  • Add regression coverage for paginated non-live pull + post-pull local push progress.

Validation

What did you do to verify the change?

  • direnv exec . vitest run tests/package-common/src/leader-thread/LeaderSyncProcessor.test.ts
  • direnv exec . mono ts
  • direnv exec . mono lint --fix

Demo (optional)

  • Before: pending changes could stall indefinitely.
  • After: paginated pull completes and local push resumes; regression test passes.

…utex-leak

# Conflicts:
#	packages/@livestore/common/src/leader-thread/LeaderSyncProcessor.ts
@schickling-assistant
Copy link
Contributor

Thanks for tackling this - the mutex lifecycle changes look solid and the paginated non-live regression test is very helpful.

One coverage gap I still see: could we add tests for lock release on non-happy paths during paginated pull?

  • pull chunk failure while the mutex is already held (mid-pagination)
  • interruption/cancellation of the pull stream while pagination is in progress

Those seem like the most subtle regressions for this area and would give stronger confidence that + always prevent stalls.

Also, could you please rebase this branch onto latest and fix the current lint issue before merge?

Thanks again for driving this fix.

@schickling-assistant
Copy link
Contributor

Small follow-up/correction (my previous comment had formatting glitches from CLI escaping):

The two symbols I referenced are releasePullMutexIfHeld and Effect.ensuring.

Request remains the same:

  • please add coverage for mid-pagination pull failure and interruption/cancellation while the mutex is held
  • please rebase on latest dev and fix the lint issue

Thanks again for the fix and for following up on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants