Skip to content

Conversation

@dlqqq
Copy link
Collaborator

@dlqqq dlqqq commented May 15, 2025

Description

This PR fixes the flood of SyncUpdate messages broadcast to clients and ContentsManager.save() calls.

PR #50 should be merged first before this PR is merged.

Demo

Screen.Recording.2025-05-15.at.4.44.52.PM.mov

Explanation

The dirty attribute on the Jupyter YDoc (provided by jupyter_ydoc) is used to indicate whether there are any unsaved changes in the UI. We set dirty = False whenever we save the file.

The bug reported in #51 was caused by a combination of three issues:

  1. The dirty attribute should be set in the awareness, as it is a transient & non-persistent state. However, it is being saved in the YDoc under the state: pycrdt.Map() shared type.
  2. The observers passed to pycrdt.Doc.observe() fire even when the YDoc state doesn't change! If dirty == False, setting dirty = False triggers the observers, even though the value did not change. This leads to the infinite loop of saves.
  3. Finally, pycrdt.Doc.observe() is different from the observe() method provided on Jupyter YDocs from jupyter_ydoc. The jupyter_ydoc observers include information on which key in the YDoc was updated; the pycrdt observers do not.

The fix is to add a 3rd observer on self._jupyter_ydoc in YRoom. The file saving logic has been moved to self._on_jupyter_ydoc_update(). This simply saves the file only if the update did not apply to the state key (a dictionary storing transient, non-persistent data which really belongs in awareness).

With this approach, we need an observer on both self._ydoc and self._jupyter_ydoc. Only the observer on self._ydoc gets the binary update to broadcast, and only the observer on self._jupyter_ydoc gets the key that was updated. Ideally we would have access to both in a single observer, but this isn't possible with our dependencies on jupyter_ydoc and pycrdt currently.

Alternatives considered

In jupyter_collaboration, the room has an _update_lock: asyncio.Lock attribute that is acquired whenever dirty is being set.

  • When this lock is acquired: the YDoc observer does nothing and ignores any document updates.

  • Otherwise: the YDoc observer triggers a save through the ContentsManager.

This likely one source of data loss bugs in jupyter_collaboration, since sometimes jupyter_collaboration also sets the source of the YDoc while _update_lock is held, preventing that change from being persisted via CM.

References: https://github.com/search?q=repo%3Ajupyterlab%2Fjupyter-collaboration%20update_lock&type=code

@dlqqq dlqqq force-pushed the fix-sync-update-flood branch from 73fc244 to e992ad3 Compare May 15, 2025 23:51
@dlqqq dlqqq marked this pull request as ready for review May 15, 2025 23:54
@Zsailer
Copy link
Collaborator

Zsailer commented May 16, 2025

Nice work, @dlqqq! Merging.

@Zsailer Zsailer merged commit 6f81f13 into jupyter-ai-contrib:main May 16, 2025
2 of 6 checks passed
@dlqqq dlqqq mentioned this pull request May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Debug YRoomWebsocket & ContentsManager getting flooded

3 participants