Fix flood of SyncUpdate messages and ContentsManager calls
#52
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes the flood of
SyncUpdatemessages broadcast to clients andContentsManager.save()calls.PR #50 should be merged first before this PR is merged.
Demo
Screen.Recording.2025-05-15.at.4.44.52.PM.mov
Explanation
The
dirtyattribute on the Jupyter YDoc (provided byjupyter_ydoc) is used to indicate whether there are any unsaved changes in the UI. We setdirty = Falsewhenever we save the file.The bug reported in #51 was caused by a combination of three issues:
dirtyattribute should be set in the awareness, as it is a transient & non-persistent state. However, it is being saved in the YDoc under thestate: pycrdt.Map()shared type.pycrdt.Doc.observe()fire even when the YDoc state doesn't change! Ifdirty == False, settingdirty = Falsetriggers the observers, even though the value did not change. This leads to the infinite loop of saves.pycrdt.Doc.observe()is different from theobserve()method provided on Jupyter YDocs fromjupyter_ydoc. Thejupyter_ydocobservers include information on which key in the YDoc was updated; thepycrdtobservers do not.The fix is to add a 3rd observer on
self._jupyter_ydocinYRoom. The file saving logic has been moved toself._on_jupyter_ydoc_update(). This simply saves the file only if the update did not apply to thestatekey (a dictionary storing transient, non-persistent data which really belongs in awareness).With this approach, we need an observer on both
self._ydocandself._jupyter_ydoc. Only the observer onself._ydocgets the binary update to broadcast, and only the observer onself._jupyter_ydocgets the key that was updated. Ideally we would have access to both in a single observer, but this isn't possible with our dependencies onjupyter_ydocandpycrdtcurrently.Alternatives considered
In
jupyter_collaboration, the room has an_update_lock: asyncio.Lockattribute that is acquired wheneverdirtyis being set.When this lock is acquired: the YDoc observer does nothing and ignores any document updates.
Otherwise: the YDoc observer triggers a save through the
ContentsManager.This likely one source of data loss bugs in
jupyter_collaboration, since sometimesjupyter_collaborationalso sets the source of the YDoc while_update_lockis held, preventing that change from being persisted via CM.References: https://github.com/search?q=repo%3Ajupyterlab%2Fjupyter-collaboration%20update_lock&type=code