Implement automatic room cleanup #114
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
YRooms #60This PR introduces automatic room cleanup. The
YRoomManagernow has a_watch_rooms()background task that runs every 10 seconds, checking each room.When the room is deleted depends on whether it provides a notebook:
For notebook rooms: the room is deleted when there are no connected clients AND when the kernel is either 'idle' or 'dead' for >10 seconds (2 iterations of inactivity).
For all other rooms: the room is deleted when there are no connected clients for >10 seconds.
With just these changes alone however, notebook kernels stopped working once the notebook room is stopped once. This is because the
DocumentAwareKernelClientpreviously stored direct references to itsYRoominstances, but these references stopped working after the room was deleted.Therefore, I had to make updates to
DocumentAwareKernelClientto store a list room IDs & a reference to theYRoomManagersingleton. Now, it callsyroom_manager.get_room(room_id)for eachroom_idit is connected to when handling a kernel message. This ensures the kernel client always uses the latestYRoomreference it needs.I also fixed #111 because it was helpful in testing this branch.
Technical details
Keen reviewers will note that there are 2 possible race conditions.
What happens if a room gets deleted right before the kernel client calls
get_room()? Will that raise an exception or return a room that's stopping and soon to become unusable?What happens if a room has been inactive for 9 seconds before the kernel client calls
get_room()? Will the room get deleted right after it is returned to the kernel client?These race conditions are safely mitigated.
The room is removed from the
YRoomManagerimmediately as soon as it starts stopping now. That way, if the room was deleted right beforeget_room()was called,get_room()will create a newYRoominstance instead of returning the room that's stopping.YRoomManager.get_room()clears the inactive status on the room it returns. Therefore, the room returned by this method is guaranteed to stay alive for at least 10 seconds.Therefore, the kernel client can always safely access the rooms connected to it, as long as it doesn't spend >10 seconds processing a single message inside of
handle_document_related_message(). I think this is a safe constraint given that most clients will only have a single connected room in almost all cases.Guidance on testing this PR
Non-notebook rooms
Open a text file then close it. Watch the server logs, and you should see these log statements after 10-20 seconds:
Notebook rooms
Create a new notebook with this code cell:
Run this cell and close the notebook immediately. Watch the logs and wait until the room is stopped. After the room is stopped, open the notebook, and you should see that
"hello"is correctly printed in the output. This asserts that the room manager does not stop a notebook room while its kernel is still running (i.e. its execution state is not "idle" or "dead").Then, change
"hello"to"world", re-run the cell, and repeat the steps above. After the room is stopped and the notebook is re-opened, you should see that"world"is correctly printed in the output. This asserts that the kernel client is correctly callingyroom_manager(get_room)to perform operations on the latestYRoominstance, not the previous one that was deleted.BTW, if you're impatient and want to see rooms get deleted more quickly, you can lower the sleep time in
_watch_rooms()to poll faster.