-
Notifications
You must be signed in to change notification settings - Fork 2
patch reconnect deadlock #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5cd0d32 to
0d6f2c0
Compare
| async with self._session_lock: | ||
| self._set_session(new_session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for safety, one property that the old code had that this one doesn't is that there could be two concurrent calls to this function (with the same id?) and then we would close the old session twice and leak one of the new sessions!
to prevent that what we can do is to rename _set_session to _set_session_locked (and add a docstring comment that it needs the session_lock to operate). and have _set_session_locked detect this mid-air collision and ... do something with the preexisting one (close it???)
with that out of the way, this relaxing of the locks' critical sections seems to be safe, because the Session by itself is safe to create twice (as long as we close it, otherwise we leak tasks!!!)
| # If the session status is mismatched, we should close the old session | ||
| # and let the retry logic to create a new session. | ||
| await old_session.close() | ||
| await self._delete_session(old_session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol dupes
|
@jackyzha0 I applied this delta overtop #147, available at |
Why
deadlock in the server-side reconnect case:
What changed
split up critical sections in server transport similar to client side
_get_sessionacquire lock, dont need to lock for creating the session until we call_set_sessionTest plan
added a test
Notes
I have a draft of a more in-depth approach on this branch which uses a lock-ownership-transfer based approach that should catch it more generically but ran into ownership problems lol
Seeing as we are planning on migrating chat service to Node anyways so we only have one River server implementation, we hopefully don't have to maintain this surface for much longer 🤞