Skip to content

Track completed cell progress during notebook execution #586

@agupta01

Description

@agupta01

Problem

When executing notebooks through the Jupyter Scheduler, users have no visibility into the real-time progress of long-running jobs. This creates frustration because:

  • Users cannot tell if a job is actively progressing or stuck on a particular cell
  • There's no way to estimate how much work remains for a running job
  • Users resort to checking logs or stopping jobs unnecessarily due to lack of progress feedback

Proposed solution

I propose we add a completed_cells field to the Job model to track the number of cells executed. This will be updated after every cell execution by leveraging the code_cells_executed in the nbclient NotebookClient (ref) and updating the database row for the job with it. We can do this by adapting the ExecutePreprocessor used today (which inherits from the NotebookClient):

class TrackingExecutePreprocessor(ExecutePreprocessor):
    """Custom ExecutePreprocessor that tracks completed cells and updates the database"""
    
    def __init__(self, db_session, job_id, **kwargs):
        super().__init__(**kwargs)
        self.db_session = db_session
        self.job_id = job_id
    
    def preprocess_cell(self, cell, resources, index):
        """
        Override to track completed cells in the database.
        Calls the superclass implementation and then updates the database.
        """
        # Call the superclass implementation
        cell, resources = super().preprocess_cell(cell, resources, index)
        
        # Update the database with the current count of completed cells
        with self.db_session() as session:
            session.query(Job).filter(Job.job_id == self.job_id).update(
                {"completed_cells": self.code_cells_executed}
            )
            session.commit()
        
        return cell, resources

In total, we'd make the following changes:

  1. Model update to the Job model to add the completed_cells field
  2. Implement the TrackingExecutePreprocessor as described above
  3. Update the GET jobs/{job_id} to expose the completed cells in the response body
  4. Update the PATCH jobs/{job_id} to allow manual patching of the completed cells value if needed

This enables users to:

  • Monitor job progress in real-time through the API. We can build this into the front-end component as a separate task.
  • Make informed decisions about stopping or continuing long-running jobs

As a bonus, stopped jobs will leave the last completed cell in the field, allowing users to identify which cell a job failed on for faster debugging.

Additional context

This feature addresses a common need in notebook execution systems. Similar implementations exist in:

  • Papermill: Tracks notebook execution progress with cell-level granularity. Also includes a progress tracker in stdout (although we don't have to do this)
  • Google Colab: Shows real-time cell execution progress in the UI

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions