Skip to content

Conversation

@ellisonbg
Copy link
Collaborator

@ellisonbg ellisonbg commented Jun 25, 2025

This PR implements notebook output loading functionality that allows jupyter-server-documents to process and manage notebook outputs stored on disk rather than in the notebook file itself.

• Process outputs in notebooks on disk - Added core functionality to load notebook outputs from disk storage when notebooks are opened
• Enhanced OutputsManager - Extended manager with methods to process loaded and saving notebooks, including placeholder output handling and metadata management
• Improved placeholder handling - Better handling of placeholder outputs with support for both dict and Map formats
• Integration with YRoom system - Connected OutputsManager throughout the room management system for seamless output processing

Key Changes

  • OutputsManager now processes notebooks on load/save with process_loaded_notebook() and process_saving_notebook() methods
  • Added placeholder_outputs metadata flag to control output processing behavior
  • Enhanced placeholder output creation with flexible return types (dict vs Map)
  • Integrated OutputsManager across YRoom, YRoomFileAPI, and YRoomManager classes
  • Fixed missing parent argument initialization in app.py

Test Plan

  • Verify notebooks with placeholder_outputs: true load outputs from disk correctly
  • Test notebook saving clears outputs and sets placeholder metadata
  • Confirm stream output limiting works as expected
  • Validate output URLs are generated correctly
  • Test both new and existing notebooks handle outputs appropriately

@ellisonbg
Copy link
Collaborator Author

@dlqqq This mostly works, but I am running into an exception by pycrdt when I try to make the changes to the YDoc. The transaction if failing saying the Ydoc isn't part of a room yet. My sense is that there is a race condition between the YRoom creation/initialization and document loading in this case. No hurry on this, but you are probably the right person to dive into this.

@dlqqq dlqqq added the enhancement New feature or request label Jun 25, 2025
Copy link
Collaborator

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellisonbg This PR looks great! Left some suggestions below.

I wasn't able to reproduce the race condition you're describing on 6 different notebooks. I don't see how this PR could introduce a race condition. The OutputsManager.load_notebook() method is synchronous, so adding a call to that method shouldn't change anything regarding concurrency.

@ellisonbg
Copy link
Collaborator Author

Here is the exception I am seeing:

Task exception was never retrieved
future: <Task finished name='Task-2746' coro=<YRoomFileAPI._load_content() done, defined at /Users/brgrange/git/jupyter-ai-contrib/jupyter-server-documents/jupyter_server_documents/rooms/yroom_file_api.py:187> exception=RuntimeError('Not integrated in a document yet')>
Traceback (most recent call last):
  File "/Users/brgrange/git/jupyter-ai-contrib/jupyter-server-documents/jupyter_server_documents/rooms/yroom_file_api.py", line 209, in _load_content
    jupyter_ydoc.source = file_data['content']
    ^^^^^^^^^^^^^^^^^^^
  File "/Users/brgrange/micromamba/envs/serverdocs/lib/python3.13/site-packages/jupyter_ydoc/ybasedoc.py", line 100, in source
    return self.set(value)
           ~~~~~~~~^^^^^^^
  File "/Users/brgrange/micromamba/envs/serverdocs/lib/python3.13/site-packages/jupyter_ydoc/ynotebook.py", line 257, in set
    self._ycells.extend([self.create_ycell(cell) for cell in cells])
                         ~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/brgrange/micromamba/envs/serverdocs/lib/python3.13/site-packages/jupyter_ydoc/ynotebook.py", line 168, in create_ycell
    if output.get("output_type") == "stream":
       ~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/Users/brgrange/micromamba/envs/serverdocs/lib/python3.13/site-packages/pycrdt/_map.py", line 226, in get
    with self.doc.transaction():
         ^^^^^^^^
  File "/Users/brgrange/micromamba/envs/serverdocs/lib/python3.13/site-packages/pycrdt/_base.py", line 158, in doc
    raise RuntimeError("Not integrated in a document yet")
RuntimeError: Not integrated in a document yet

The notebook I am trying is this one:

https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/04.00-Introduction-To-Matplotlib.ipynb

@ellisonbg ellisonbg changed the title Handle notebooks with outputs on disk Improved outputs handling when reading/writing from/to disk Jun 25, 2025
@ellisonbg
Copy link
Collaborator Author

OK, this is updated and ready for review.

Copy link
Collaborator

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellisonbg Thank you for fixing that bug you found!

I've approved to unblock, but it would be great if you could downgrade the critical log & remove the asyncio.Event added in yroom_file_api.py before merging. 👍


def process_saving_notebook(self, nb: dict) -> dict:
"""
Process a notebook before saving to disk.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about: Mutate notebook dict in place, return updated value.

As for the "before saving" usage info, how about renaming the method to process_notebook_for_saving_to_disk ?

Copy link
Collaborator

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellisonbg
This looks great!, outputs work like magic now, I spent 5 mins looking for them in the notebook file 😆 .

Some minor comments to improve error handling and readability, but good overall.

Comment on lines 238 to 244
placeholder = self.write(
file_id,
cell_id,
output,
display_id,
asdict=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add error handling here?

Suggested change
placeholder = self.write(
file_id,
cell_id,
output,
display_id,
asdict=True,
)
try:
placeholder = self.write(
file_id,
cell_id,
output,
display_id,
asdict=True,
)
except Exception e:
self.log.warning(f"Failed to write output for cell {cell_id}: {e}")
placeholder = output

@ellisonbg
Copy link
Collaborator Author

OK, this is ready for final review.

@3coins
Copy link
Collaborator

3coins commented Jun 26, 2025

Looks good!

@ellisonbg
Copy link
Collaborator Author

I found a few more edge cases that I am fixing.

@ellisonbg
Copy link
Collaborator Author

@3coins can you have a look at my latest commit, review and merge (if ready)?

@3coins 3coins merged commit 19b692f into jupyter-ai-contrib:main Jun 27, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants