Skip to content

Conversation

@kevineger
Copy link

It's my first time looking at any of this code and the lifecycle of everything is a little challenging to discover at a glance. Sorry if this is overlooking anything obvious! This passes unit tests and I verified manually it solves the issue I reported.

Problem

If a server/kernel is removed, then another was created for the active notebook editor, cell executions against the new kernel would fail. The issue with clear repro steps and the corresponding logs: #17094.

Why

The root cause is improper resource cleanup during the disposal of a VSCodeNotebookController. When a JupyterServerProvider is removed, the extension correctly disposes of the associated controllers. However, the dispose method for the controller did not explicitly dispose of the underlying IKernel instance that it had created and managed.

This left the kernel in an orphaned or "zombie" state. When a new server and controller were subsequently created for the same notebook, it'd try to use the old, still-lingering kernel, which was already in a started state, leading to the "Cannot call start again" error.

Fix

When a controller is disposed, clean up any kernels it was responsible for. This is done by iterating through all currently open NB docs and disposing any kernels who's connection metadata IDs match.


Fixes #17094

@kevineger
Copy link
Author

@microsoft-github-policy-service agree company="Google"

Copy link
Contributor

@DonJayamanne DonJayamanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right solution as this will kill all kernels
E.g. assume you have a kernel for Python 3.14 and the id is 314
Now one can have two different notebooks running code against 314.
Both are two different instances and processes running against Python 3.14 (i.e. Kernel Id = 314).

Just because we close the first notebook we souldn't be killing the kernel instance running against the other notebook.

);
this.isDisposed = true;
// Dispose all kernels associated with this controller
workspace.notebookDocuments.forEach((doc) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controller instances are not shared across notebooks.
We cannot dispose all other kernels associated with the same kernel.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit filters to the associatedDocuments. Is that sufficient? If there's a better way, please let me know!

Fixes microsoft#17094

It's my first time looking at any of this code and the lifecycle of everything is a little challenging to discover at a glance. This change passed unit tests and manually verifying, it solves the issue. I'll lay out the problem and solution independently below to make reviewing easier.

If a server/kernel is removed, then another was created for the active notebook editor, cell executions against the new kernel would fail. The issue with clear repro steps and the corresponding logs: microsoft#17094.

The root cause is improper resource cleanup during the disposal of a `VSCodeNotebookController`. When a `JupyterServerProvider` is removed, the extension correctly disposes of the associated controllers. However, the `dispose` method for the controller did not explicitly dispose of the underlying `IKernel` instance that it had created and managed.

This left the kernel in an orphaned or "zombie" state. When a new server and controller were subsequently created for the same notebook, it'd try to use the old, still-lingering kernel, which was already in a started state, leading to the "Cannot call start again" error.

The fix enhances the dispose method within the `VSCodeNotebookController`. The new logic ensures that when a controller is disposed, it actively cleans up any kernels it was responsible for.

It iterates through all currently open NB docs and siposes any kernels who's connection metadata IDs match.
@kevineger
Copy link
Author

Thanks for taking a look and sorry I was off-base @DonJayamanne. I'm having a bit of a hard time understanding the life-cycle of everything, with all the wrapping and indirection.

Using the same repro as listed in the issue, if I create a 3rd party Jupyter server, execute code, delete it (provider returns []) and then create another:

  • this.controller here when the initial cell execution is kicked off uses the new/updated controller.
  • But kernel.controller.id !== controller.id and so a second execution is started here using the old, disposed controller.

I tried taking out that line but all the CellExecution logic still maintains a tie to the disposed controller.


I'll try and spend some more time building an understanding of all this, but if you have any pointers or recommendations for a fix that'll save me time 👍.

This change modifies the `VSCodeNotebookController`'s `dispose` method to ensure that when a controller is disposed, it disposes the kernels that are directly associated with that specific controller instance.
@kevineger kevineger force-pushed the nb-kernel-lifecycle branch from cf6f4ab to 45a2704 Compare October 22, 2025 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jupyter API: A resolved server cannot be used on a notebook which had a server that was removed. NB controller disposed.

2 participants