Skip to content

Conversation

@jackyzha0
Copy link
Member

@jackyzha0 jackyzha0 commented Mar 20, 2025

Why

deadlock in the server-side reconnect case:

  1. establish handshake acquires session lock:
    async with self._session_lock:
  2. if there is a session mismatch, we close the old session:
    await old_session.close()
  3. session.close calls _close_session_callback
    await self._close_session_callback(self)
  4. _delete_session also tries to acquire the session lock
    async with self._session_lock:

What changed

  1. dont need to call _delete_session as .close will already do that
  2. lift out session close outside of session lock
  3. in the handshake case, let the final call to get_or_create_session replace the session and close the old one

Test plan

added a test

Notes

I have a draft of a more in-depth approach on this branch which uses a lock-ownership-transfer based approach that should catch it more generically but ran into ownership problems lol

Seeing as we are planning on migrating chat service to Node anyways so we only have one River server implementation, we hopefully don't have to maintain this surface for much longer 🤞

@jackyzha0 jackyzha0 requested a review from a team as a code owner March 20, 2025 01:05
@jackyzha0 jackyzha0 requested review from blast-hardcheese and removed request for a team March 20, 2025 01:05
@jackyzha0 jackyzha0 requested a review from lhchavez March 20, 2025 01:48
@jackyzha0 jackyzha0 merged commit 2983732 into main Mar 20, 2025
3 checks passed
@jackyzha0 jackyzha0 deleted the jacky-deadlock-patch branch March 20, 2025 02:59
@blast-hardcheese blast-hardcheese added the bug Something isn't working label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants