-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Problem
When executing notebooks through the Jupyter Scheduler, users have no visibility into the real-time progress of long-running jobs. This creates frustration because:
- Users cannot tell if a job is actively progressing or stuck on a particular cell
- There's no way to estimate how much work remains for a running job
- Users resort to checking logs or stopping jobs unnecessarily due to lack of progress feedback
Proposed solution
I propose we add a completed_cells
field to the Job model to track the number of cells executed. This will be updated after every cell execution by leveraging the code_cells_executed
in the nbclient NotebookClient (ref) and updating the database row for the job with it. We can do this by adapting the ExecutePreprocessor used today (which inherits from the NotebookClient):
class TrackingExecutePreprocessor(ExecutePreprocessor):
"""Custom ExecutePreprocessor that tracks completed cells and updates the database"""
def __init__(self, db_session, job_id, **kwargs):
super().__init__(**kwargs)
self.db_session = db_session
self.job_id = job_id
def preprocess_cell(self, cell, resources, index):
"""
Override to track completed cells in the database.
Calls the superclass implementation and then updates the database.
"""
# Call the superclass implementation
cell, resources = super().preprocess_cell(cell, resources, index)
# Update the database with the current count of completed cells
with self.db_session() as session:
session.query(Job).filter(Job.job_id == self.job_id).update(
{"completed_cells": self.code_cells_executed}
)
session.commit()
return cell, resources
In total, we'd make the following changes:
- Model update to the Job model to add the
completed_cells
field - Implement the
TrackingExecutePreprocessor
as described above - Update the GET
jobs/{job_id}
to expose the completed cells in the response body - Update the PATCH
jobs/{job_id}
to allow manual patching of the completed cells value if needed
This enables users to:
- Monitor job progress in real-time through the API. We can build this into the front-end component as a separate task.
- Make informed decisions about stopping or continuing long-running jobs
As a bonus, stopped jobs will leave the last completed cell in the field, allowing users to identify which cell a job failed on for faster debugging.
Additional context
This feature addresses a common need in notebook execution systems. Similar implementations exist in:
- Papermill: Tracks notebook execution progress with cell-level granularity. Also includes a progress tracker in stdout (although we don't have to do this)
- Google Colab: Shows real-time cell execution progress in the UI