Skip to content

Conversation

@dlqqq
Copy link
Collaborator

@dlqqq dlqqq commented Jun 12, 2025

Description

This PR introduces automatic room cleanup. The YRoomManager now has a _watch_rooms() background task that runs every 10 seconds, checking each room.

When the room is deleted depends on whether it provides a notebook:

  • For notebook rooms: the room is deleted when there are no connected clients AND when the kernel is either 'idle' or 'dead' for >10 seconds (2 iterations of inactivity).

  • For all other rooms: the room is deleted when there are no connected clients for >10 seconds.

With just these changes alone however, notebook kernels stopped working once the notebook room is stopped once. This is because the DocumentAwareKernelClient previously stored direct references to its YRoom instances, but these references stopped working after the room was deleted.

Therefore, I had to make updates to DocumentAwareKernelClient to store a list room IDs & a reference to the YRoomManager singleton. Now, it calls yroom_manager.get_room(room_id) for each room_id it is connected to when handling a kernel message. This ensures the kernel client always uses the latest YRoom reference it needs.

I also fixed #111 because it was helpful in testing this branch.

Technical details

Keen reviewers will note that there are 2 possible race conditions.

  1. What happens if a room gets deleted right before the kernel client calls get_room()? Will that raise an exception or return a room that's stopping and soon to become unusable?

  2. What happens if a room has been inactive for 9 seconds before the kernel client calls get_room()? Will the room get deleted right after it is returned to the kernel client?

These race conditions are safely mitigated.

  1. The room is removed from the YRoomManager immediately as soon as it starts stopping now. That way, if the room was deleted right before get_room() was called, get_room() will create a new YRoom instance instead of returning the room that's stopping.

  2. YRoomManager.get_room() clears the inactive status on the room it returns. Therefore, the room returned by this method is guaranteed to stay alive for at least 10 seconds.

Therefore, the kernel client can always safely access the rooms connected to it, as long as it doesn't spend >10 seconds processing a single message inside of handle_document_related_message(). I think this is a safe constraint given that most clients will only have a single connected room in almost all cases.

Guidance on testing this PR

Non-notebook rooms

Open a text file then close it. Watch the server logs, and you should see these log statements after 10-20 seconds:

[I 2025-06-12 13:33:13.705 ServerDocsApp] YRoom 'text:file:311e2546-98cb-47c0-8d54-eb5f9ad8ae37' has been inactive for >10 seconds.
[I 2025-06-12 13:33:13.744 ServerDocsApp] Stopping YRoom 'text:file:311e2546-98cb-47c0-8d54-eb5f9ad8ae37'.
[I 2025-06-12 13:33:13.762 ServerDocsApp] Stopped `self._process_message_queue()` background task for YRoom 'text:file:311e2546-98cb-47c0-8d54-eb5f9ad8ae37'.
[I 2025-06-12 13:33:13.762 ServerDocsApp] Stopped `self._watch_file()` background task for YRoom 'text:file:311e2546-98cb-47c0-8d54-eb5f9ad8ae37'.

Notebook rooms

Create a new notebook with this code cell:

import asyncio
await asyncio.sleep(21) # < can be any value greater than 2x poll interval
print("hello")

Run this cell and close the notebook immediately. Watch the logs and wait until the room is stopped. After the room is stopped, open the notebook, and you should see that "hello" is correctly printed in the output. This asserts that the room manager does not stop a notebook room while its kernel is still running (i.e. its execution state is not "idle" or "dead").

Then, change "hello" to "world", re-run the cell, and repeat the steps above. After the room is stopped and the notebook is re-opened, you should see that "world" is correctly printed in the output. This asserts that the kernel client is correctly calling yroom_manager(get_room) to perform operations on the latest YRoom instance, not the previous one that was deleted.

BTW, if you're impatient and want to see rooms get deleted more quickly, you can lower the sleep time in _watch_rooms() to poll faster.

@dlqqq dlqqq added the enhancement New feature or request label Jun 12, 2025
@dlqqq dlqqq marked this pull request as draft June 13, 2025 18:42
@dlqqq
Copy link
Collaborator Author

dlqqq commented Jun 13, 2025

Converting this to draft status because consumers likely will need a stable reference to the YRoom. Looking into how YRoom instances can be stopped & restarted such that each YRoom reference stays valid forever.

@dlqqq
Copy link
Collaborator Author

dlqqq commented Jun 20, 2025

Superseded by #117. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-delete inactive YRooms Cannot attach >1 notebooks to same kernel

1 participant