Skip to content

Conversation

@dlqqq
Copy link
Collaborator

@dlqqq dlqqq commented May 8, 2025

Description

  • Updated YRoom to create a JupyterYDoc using jupyter_ydoc.

  • Introduces YRoomFileAPI, a class that provides an API to a single file for a single YRoom via the Jupyter Server ContentsManager.

  • The YRoomFileAPI accepts the room's ID and JupyterYDoc and loads the content when file_api.load_ydoc_content() is called.

  • Provides the file_api.ydoc_content_loaded awaitable to allow consumers to await. This will only resolve once the content is loaded into the YDoc.

  • Provides the schedule_save() method to allow consumers to schedule saving the YDoc to disk. Automatically waits for the content to be loaded.

  • Added async get_jupyter_ydoc(), async get_ydoc(), and get_awareness() methods on YRoom. Awaiting these methods ensures that the content was loaded before you receive a reference to the JupyterYDoc/YDoc.

  • Adds a unit test that verifies YRoomFileAPI for plaintext files!

Future work

  • Add a unit test that verifies YRoomFileAPI for notebook files.

  • Add unit tests for updating the JupyterYDoc & saving via schedule_save().

@dlqqq dlqqq marked this pull request as draft May 8, 2025 22:59
@dlqqq dlqqq changed the title WIP: Introduce YRoomFileAPI Introduce YRoomFileAPI May 9, 2025
@dlqqq dlqqq marked this pull request as ready for review May 9, 2025 18:15
@dlqqq dlqqq force-pushed the yroom-file-api branch from 4b80d31 to 32d6d8a Compare May 9, 2025 20:08
Comment on lines 70 to 71
self.awareness.observe(self.send_server_awareness)
self.ydoc.observe(lambda event: self.write_sync_update(event.update))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may only want to start these observers after the YDoc content is loaded. So we would want to do something like:

async def _start_observers(self):
    await self.file_api.ydoc_content_loaded
    self.ydoc.observe(...)
    self.awareness.observe(...)

and start this as a separate task in __init__().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to wait for ydoc content loaded? We should add subscribers first. It is just no action or update in ydoc so no subscription is triggered. We need to make sure subscribers are there first before we make any update to ydoc. I think loading ydoc content will update ydoc and generate first few updates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a queue here self.ydoc.observe(lambda event: self.write_sync_update(event.update)) to capture those updates before the clients are added and websocket is established.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought, It might make sense to call file content load method separately and later (not in init method) once connection is established and client is added. Because once ydoc loads content, it has server updates to broadcast to all clients but the client websocket might not yet be added when we initialize YRoom. Are we going to call YRoom initialization in prepare method in YWebsocketHandler before websocket connection happens, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had posted a comment earlier but realized I was wrong. Yeah you're right, we should start the observers immediately, otherwise the new loaded content doesn't get broadcast. 😂

I was trying to stop YRoom consumers from making updates to the YDoc before the content is loaded, but as you've called out, this isn't the right way to do it. I think I'll add an async API for getting the YDoc, Awareness, and JupyterYDoc that ensures the content is loaded. Will work on this now since you're busy with another meeting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because once ydoc loads content, it has server updates to broadcast to all clients but the client websocket might not yet be added when we initialize YRoom. Are we going to call YRoom initialization in prepare method in YWebsocketHandler before websocket connection happens, right?

This should be fine!

  • If clients join before the doc is loaded: they get a SyncUpdate containing the new content.

  • If clients join after: they get the content after completing the first client SS1 + server SS2 handshake.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, clients will always join after, since we don't process the message queue until after the ydoc content is loaded.

Comment on lines +106 to +117
def load_ydoc_content(self) -> None:
"""
Loads the file from disk asynchronously into `self.jupyter_ydoc`.
Consumers should `await file_api.ydoc_content_loaded` before performing
any operations on the YDoc.
"""
# If already loaded/loading, return immediately.
# Otherwise, set loading to `True` and start the loading task.
if self._ydoc_content_loaded.is_set() or self._ydoc_content_loading:
return
self._ydoc_content_loading = True
self._loop.create_task(self._load_ydoc_content())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we can technically call this in YRoomFileAPI.__init__(), but I think as a design principle, instantiating classes shouldn't produce side-effects. Operations should be made explicit for readability.

fileid_manager=fileid_manager,
contents_manager=contents_manager
)
self.file_api.load_ydoc_content()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if load_ydoc_content() is called in this initialize method, it need to be moved to be after observer setup line 77 and 78. because creating task of ydoc means it could happen at any time for now on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed to use a different message queue to handle awareness messages and unblock awareness message handling no matter initial file loading finished or not.

@jzhang20133
Copy link
Collaborator

we discussed to use a different message queue to handle awareness messages and unblock awareness message handling no matter initial file loading finished or not.

@dlqqq
Copy link
Collaborator Author

dlqqq commented May 9, 2025

@jzhang20133 Thank you for leaving so much helpful feedback! Merging.

@dlqqq dlqqq merged commit 44b5024 into jupyter-ai-contrib:main May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants