Refactor OutputsManager to align with default Jupyter behavior #163

ellisonbg · 2025-10-19T19:26:53Z

Summary

This PR refactors the notebook output handling system to align with standard Jupyter behavior while maintaining the performance benefits of the collaborative server architecture.

Key Improvements

OutputsManager Now Follows Default Jupyter Behavior

The refactored OutputsManager now works exactly like standard Jupyter:

Full outputs saved to notebook files: All cell outputs are written to .ipynb files on disk, ensuring compatibility with standard Jupyter workflows
Fast autosaving preserved: Outputs are still kept out of the YDoc using lightweight placeholders, maintaining the performance advantage of server-side collaboration
No metadata flags required: Works transparently without special notebook metadata settings

This means users get the best of both worlds:

✅ Standard Jupyter file format with full outputs on disk
✅ Fast collaborative editing with minimal YDoc synchronization overhead
✅ No breaking changes to existing workflows

Dynamic Autosave Intervals

The YRoomFileAPI now implements adaptive autosave timing to optimize performance across different file sizes and I/O environments:

Adaptive timing: Poll interval automatically adjusts based on how long saves take
- Small files with fast I/O → shorter intervals → more responsive autosaving
- Large files or slower I/O → longer intervals → reduces unnecessary polling overhead
Configurable parameters:
- min_poll_interval (default: 0.5s) - minimum autosave interval
- poll_interval_multiplier (default: 5.0) - multiplier applied to save duration to calculate next interval
Example: If a save takes 2 seconds, the next poll interval becomes 2s × 5 = 10s, scaling appropriately to the file size and I/O performance

This ensures the autosave system adapts to realistic file I/O conditions, providing optimal responsiveness for small files while avoiding excessive polling overhead for large files or slower storage systems.

Cleaner, Better-Documented Code

Comprehensive documentation: All methods now have detailed docstrings explaining purpose, parameters, and return values
Better code organization: Extracted utility functions (_create_output_url, _create_output_placeholder) and helper methods (_upgrade_notebook_format, _ensure_cell_id, _process_outputs_from_cell)
Simplified logic: Removed complex conditional paths, making the code easier to understand and maintain
Improved error handling: Graceful fallbacks when outputs aren't found

Expanded Test Coverage

Comprehensive test suite (857 lines) covering all output types
Tests for display ID tracking, notebook loading/saving workflows
Edge case and error handling validation

Experimental Features

⚠️ Note: This PR also includes an experimental OptimizedOutputsManager that supports excluding outputs from saved notebook files via an exclude_outputs metadata flag. This feature is disabled by default and not recommended for production use.

Migration Impact

No migration needed - existing notebooks and workflows continue to work unchanged. The refactored OutputsManager is a drop-in replacement with improved code quality and documentation.

The stream_limit logic is being moved in this PR to the writing of outputs, so get_outputs can just return all outputs.

This commit introduces a cleaner architecture for handling notebook outputs and adds an experimental optimized version that supports excluding outputs from saved notebook files. Core changes to OutputsManager: - Extract private utility functions (_create_output_url, _create_output_placeholder) - Add comprehensive docstrings to all methods - Simplify write() method by removing stream_limit logic - Improve error handling in get_outputs() to return empty list instead of raising - Consolidate output processing logic into _process_outputs_from_cell() - Add helper methods: _upgrade_notebook_format(), _ensure_cell_id() - Always write full outputs to notebook files on save (traditional Jupyter behavior) - Remove stream-specific handling and StreamAPIHandler route New OptimizedOutputsManager: - Extends base OutputsManager with exclude_outputs metadata flag support - When exclude_outputs=True: outputs stored only in runtime, not in saved files - When exclude_outputs=False/unset: full outputs included in saved files (default) - Implements stream_limit (500) for large stream outputs with link placeholders - Provides _append_to_stream_file() for efficient stream handling - Stream API handler for accessing accumulated stream outputs Other improvements: - Add __all__ to outputs/__init__.py for cleaner exports - Expand test coverage with comprehensive test suite - Rename private methods for clarity (_process_loaded_excluded_outputs, etc.) - Update yroom_file_api to use process_saving_notebook correctly The OptimizedOutputsManager is currently experimental and disabled by default. StreamAPIHandler route is commented out until the feature is ready for production.

dlqqq

@ellisonbg Thank you for making these changes! I've reviewed the portion of changes concerning the new YRoomFileAPI auto-save behavior. Left some (non-blocking) feedback below.

@3coins can help review the changes related to outputs.

dlqqq · 2025-10-20T22:34:37Z

jupyter_server_documents/rooms/yroom_file_api.py

+    _last_save_duration: float | None
+    """
+    The duration in seconds of the last save operation. Used to calculate the
+    adaptive poll interval.
+    """


This instance attribute doesn't seem necessary, since it is only bound to the local variable save_duration in the save() method.

dlqqq · 2025-10-20T22:36:49Z

jupyter_server_documents/rooms/yroom_file_api.py

    poll_interval = Float(
        default_value=0.5,
-        help="Sets how frequently this class saves the YDoc & checks the file "
-        "for changes. Defaults to every 0.5 seconds.",
+        help="Sets the initial interval for saving the YDoc & checking the file "
+        "for changes. This serves as the starting value before adaptive timing "
+        "takes effect. Defaults to 0.5 seconds.",
+        config=True,
+    )


The poll_interval configurable trait is now only used to set the initial value of self._adaptive_poll_interval, which is reset after the very first auto-save.

I recommend removing poll_interval to avoid leading developers to confuse this with min_poll_interval. The initial value of self._adaptive_poll_interval can be set to self.min_poll_interval.

dlqqq · 2025-10-20T22:38:09Z

jupyter_server_documents/rooms/yroom_file_api.py

+    min_poll_interval = Float(
+        default_value=0.5,
+        help="Minimum autosave interval in seconds. The adaptive timing will "
+        "never go below this value. Defaults to 0.5 seconds.",
+        config=True,
+    )
+
+    poll_interval_multiplier = Float(
+        default_value=5.0,
+        help="Multiplier applied to save duration to calculate the next poll "
+        "interval. For example, if a save takes 1 second and the multiplier is "
+        "5.0, the next poll interval will be 5 seconds (bounded by min/max). "
+        "Defaults to 5.0.",
        config=True,
    )


Minor suggestion: It may be worth adding validation to ensure that min_poll_interval > 0 and poll_interval_multiplier > 0. traitlets provides the @validate decorator for this: https://traitlets.readthedocs.io/en/stable/using_traitlets.html#basic-example-validating-the-parity-of-a-trait

For example:

# outside YRoomFileAPI DEFAULT_MIN_POLL_INTERVAL = 0.5 # within YRoomFileAPI @validate('min_poll_interval') def _validate_min_poll_interval(self, proposal): if proposal['value'] <= 0: self.log.warning("The configured min_poll_interval cannot be <=0. Using default value instead.") return DEFAULT_MIN_POLL_INTERVAL return proposal['value'] # ... similarly for poll_interval_multiplier

3coins · 2025-10-22T03:19:25Z

@ellisonbg
Tested the outputs behavior which works as expected, but have some concerns about the adaptive saves.

If a save takes 10s with a multiplier of default 5.0, the next interval becomes 50s. For slow saves, this could result in intervals of minutes or longer. Should we introduce a max_poll_interval to avoid this?

The algorithm assumes save time is predictive of future save time in a linear way, which may not hold for notebook size changes (adding/removing large outputs), network conditions, system load variations or different file types. Should we add an exponential moving average?

# Instead of using only last save duration
if self._last_save_duration is not None:
    # Smooth with exponential moving average
    alpha = 0.3  # Smoothing factor
    smoothed_duration = (alpha * save_duration +
                        (1 - alpha) * self._last_save_duration)
else:
    smoothed_duration = save_duration

new_interval = smoothed_duration * self.poll_interval_multiplier

If saves fail, the adaptive interval isn't updated, should we add failure backoff?

# Exponential backoff on save failures
      self._adaptive_poll_interval = min(
          self.max_poll_interval,
          self._adaptive_poll_interval * 2
      )
      self.log.error("Save failed, backing off poll interval")

3coins · 2025-10-22T03:31:09Z

Here is a comparison of current algorithm with exponential moving average for some hypothetical notebook save times.

3coins

After speaking with @ellisonbg, it seems like he has done a few experiments, and the save times for very large notebooks (with 100s of plots) on modern hardware is <1s. The configurable multiplier does provide a lever to adjust this, which should help with containing very high save intervals. Some of the feedback that @dlqqq provided still makes sense, but overall looks good.

Zsailer · 2025-10-23T14:54:39Z

I'd like to get this merged today, if possible. @ellisonbg are you able to make @dlqqq's changes?

I have some changes coming, and I want to avoid a painful rebase 😅

dlqqq

Suggestions implemented in #169, which can be merged on top of this.

Approving to unblock. Proceeding to merge as we have 3 approvals.

dlqqq · 2025-10-23T21:26:51Z

@Zsailer This PR is now merged, so you're clear to open your PR! 🎉

ellisonbg · 2025-10-23T21:48:52Z

Thanks everyone!

ellisonbg added 14 commits October 14, 2025 13:23

Add safe msg_id handling in kernel_client.

0acb6a5

Contrain to Python 3.13 in dev environment.

7135209

Create _is_stream_output utility function

e625637

Update stream_limit to 500 to avoid triggering too early

0f14f5d

Use set literal

128d83a

Update get_outputs to remove stream_limit logic

9c14a0c

The stream_limit logic is being moved in this PR to the writing of outputs, so get_outputs can just return all outputs.

Create _append_to_stream_file utility method.

dc3a7b4

Update process_loaded_notebook to handle exclude_outputs

48d6e1e

Modify private _process_loaded methods to handle exclude_outputs

3e45abd

Update process_saving_notebooks to handle exclude_outputs

d7bb136

Add comment about placeholder outputs wrt nbformat

5126f25

Fix write to better handle stream outputs and stream_limit at write time

a98b79f

Remove call to clear in saving logic

9845796

ellisonbg added the enhancement New feature or request label Oct 19, 2025

dlqqq reviewed Oct 20, 2025

View reviewed changes

3coins approved these changes Oct 22, 2025

View reviewed changes

dlqqq approved these changes Oct 23, 2025

View reviewed changes

dlqqq merged commit 63e25bb into main Oct 23, 2025
10 of 11 checks passed

dlqqq deleted the outputs-on-disk branch October 23, 2025 21:20

dlqqq mentioned this pull request Oct 23, 2025

Cleanup and add validation to new adaptive saving strategy #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor OutputsManager to align with default Jupyter behavior #163

Refactor OutputsManager to align with default Jupyter behavior #163

Uh oh!

ellisonbg commented Oct 19, 2025

Uh oh!

dlqqq left a comment

Uh oh!

dlqqq Oct 20, 2025

Uh oh!

dlqqq Oct 20, 2025

Uh oh!

dlqqq Oct 20, 2025

Uh oh!

3coins commented Oct 22, 2025

Uh oh!

3coins commented Oct 22, 2025

Uh oh!

3coins left a comment

Uh oh!

Zsailer commented Oct 23, 2025

Uh oh!

dlqqq left a comment •

edited

Loading

Uh oh!

Uh oh!

dlqqq commented Oct 23, 2025

Uh oh!

ellisonbg commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refactor OutputsManager to align with default Jupyter behavior #163

Refactor OutputsManager to align with default Jupyter behavior #163

Uh oh!

Conversation

ellisonbg commented Oct 19, 2025

Summary

Key Improvements

OutputsManager Now Follows Default Jupyter Behavior

Dynamic Autosave Intervals

Cleaner, Better-Documented Code

Expanded Test Coverage

Experimental Features

Migration Impact

Uh oh!

dlqqq left a comment

Choose a reason for hiding this comment

Uh oh!

dlqqq Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

dlqqq Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

dlqqq Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

3coins commented Oct 22, 2025

Uh oh!

3coins commented Oct 22, 2025

Uh oh!

3coins left a comment

Choose a reason for hiding this comment

Uh oh!

Zsailer commented Oct 23, 2025

Uh oh!

dlqqq left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dlqqq commented Oct 23, 2025

Uh oh!

ellisonbg commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dlqqq left a comment •

edited

Loading