-
Notifications
You must be signed in to change notification settings - Fork 6
Refactor OutputsManager to align with default Jupyter behavior #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The stream_limit logic is being moved in this PR to the writing of outputs, so get_outputs can just return all outputs.
This commit introduces a cleaner architecture for handling notebook outputs and adds an experimental optimized version that supports excluding outputs from saved notebook files. Core changes to OutputsManager: - Extract private utility functions (_create_output_url, _create_output_placeholder) - Add comprehensive docstrings to all methods - Simplify write() method by removing stream_limit logic - Improve error handling in get_outputs() to return empty list instead of raising - Consolidate output processing logic into _process_outputs_from_cell() - Add helper methods: _upgrade_notebook_format(), _ensure_cell_id() - Always write full outputs to notebook files on save (traditional Jupyter behavior) - Remove stream-specific handling and StreamAPIHandler route New OptimizedOutputsManager: - Extends base OutputsManager with exclude_outputs metadata flag support - When exclude_outputs=True: outputs stored only in runtime, not in saved files - When exclude_outputs=False/unset: full outputs included in saved files (default) - Implements stream_limit (500) for large stream outputs with link placeholders - Provides _append_to_stream_file() for efficient stream handling - Stream API handler for accessing accumulated stream outputs Other improvements: - Add __all__ to outputs/__init__.py for cleaner exports - Expand test coverage with comprehensive test suite - Rename private methods for clarity (_process_loaded_excluded_outputs, etc.) - Update yroom_file_api to use process_saving_notebook correctly The OptimizedOutputsManager is currently experimental and disabled by default. StreamAPIHandler route is commented out until the feature is ready for production.
dlqqq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellisonbg Thank you for making these changes! I've reviewed the portion of changes concerning the new YRoomFileAPI auto-save behavior. Left some (non-blocking) feedback below.
@3coins can help review the changes related to outputs.
| _last_save_duration: float | None | ||
| """ | ||
| The duration in seconds of the last save operation. Used to calculate the | ||
| adaptive poll interval. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instance attribute doesn't seem necessary, since it is only bound to the local variable save_duration in the save() method.
| poll_interval = Float( | ||
| default_value=0.5, | ||
| help="Sets how frequently this class saves the YDoc & checks the file " | ||
| "for changes. Defaults to every 0.5 seconds.", | ||
| help="Sets the initial interval for saving the YDoc & checking the file " | ||
| "for changes. This serves as the starting value before adaptive timing " | ||
| "takes effect. Defaults to 0.5 seconds.", | ||
| config=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The poll_interval configurable trait is now only used to set the initial value of self._adaptive_poll_interval, which is reset after the very first auto-save.
I recommend removing poll_interval to avoid leading developers to confuse this with min_poll_interval. The initial value of self._adaptive_poll_interval can be set to self.min_poll_interval.
| min_poll_interval = Float( | ||
| default_value=0.5, | ||
| help="Minimum autosave interval in seconds. The adaptive timing will " | ||
| "never go below this value. Defaults to 0.5 seconds.", | ||
| config=True, | ||
| ) | ||
|
|
||
| poll_interval_multiplier = Float( | ||
| default_value=5.0, | ||
| help="Multiplier applied to save duration to calculate the next poll " | ||
| "interval. For example, if a save takes 1 second and the multiplier is " | ||
| "5.0, the next poll interval will be 5 seconds (bounded by min/max). " | ||
| "Defaults to 5.0.", | ||
| config=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: It may be worth adding validation to ensure that min_poll_interval > 0 and poll_interval_multiplier > 0. traitlets provides the @validate decorator for this: https://traitlets.readthedocs.io/en/stable/using_traitlets.html#basic-example-validating-the-parity-of-a-trait
For example:
# outside YRoomFileAPI
DEFAULT_MIN_POLL_INTERVAL = 0.5
# within YRoomFileAPI
@validate('min_poll_interval')
def _validate_min_poll_interval(self, proposal):
if proposal['value'] <= 0:
self.log.warning("The configured min_poll_interval cannot be <=0. Using default value instead.")
return DEFAULT_MIN_POLL_INTERVAL
return proposal['value']
# ... similarly for poll_interval_multiplier|
@ellisonbg
# Exponential backoff on save failures
self._adaptive_poll_interval = min(
self.max_poll_interval,
self._adaptive_poll_interval * 2
)
self.log.error("Save failed, backing off poll interval") |
3coins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After speaking with @ellisonbg, it seems like he has done a few experiments, and the save times for very large notebooks (with 100s of plots) on modern hardware is <1s. The configurable multiplier does provide a lever to adjust this, which should help with containing very high save intervals. Some of the feedback that @dlqqq provided still makes sense, but overall looks good.
|
I'd like to get this merged today, if possible. @ellisonbg are you able to make @dlqqq's changes? I have some changes coming, and I want to avoid a painful rebase 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions implemented in #169, which can be merged on top of this.
Approving to unblock. Proceeding to merge as we have 3 approvals.
|
@Zsailer This PR is now merged, so you're clear to open your PR! 🎉 |
|
Thanks everyone! |

Summary
This PR refactors the notebook output handling system to align with standard Jupyter behavior while maintaining the performance benefits of the collaborative server architecture.
Key Improvements
OutputsManager Now Follows Default Jupyter Behavior
The refactored
OutputsManagernow works exactly like standard Jupyter:.ipynbfiles on disk, ensuring compatibility with standard Jupyter workflowsThis means users get the best of both worlds:
Dynamic Autosave Intervals
The
YRoomFileAPInow implements adaptive autosave timing to optimize performance across different file sizes and I/O environments:min_poll_interval(default: 0.5s) - minimum autosave intervalpoll_interval_multiplier(default: 5.0) - multiplier applied to save duration to calculate next interval2s × 5 = 10s, scaling appropriately to the file size and I/O performanceThis ensures the autosave system adapts to realistic file I/O conditions, providing optimal responsiveness for small files while avoiding excessive polling overhead for large files or slower storage systems.
Cleaner, Better-Documented Code
_create_output_url,_create_output_placeholder) and helper methods (_upgrade_notebook_format,_ensure_cell_id,_process_outputs_from_cell)Expanded Test Coverage
Experimental Features
OptimizedOutputsManagerthat supports excluding outputs from saved notebook files via anexclude_outputsmetadata flag. This feature is disabled by default and not recommended for production use.Migration Impact
No migration needed - existing notebooks and workflows continue to work unchanged. The refactored OutputsManager is a drop-in replacement with improved code quality and documentation.