-
Couldn't load subscription status.
- Fork 346
fix: dispose a torn down NB controller's kernel #17097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@microsoft-github-policy-service agree company="Google" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the right solution as this will kill all kernels
E.g. assume you have a kernel for Python 3.14 and the id is 314
Now one can have two different notebooks running code against 314.
Both are two different instances and processes running against Python 3.14 (i.e. Kernel Id = 314).
Just because we close the first notebook we souldn't be killing the kernel instance running against the other notebook.
| ); | ||
| this.isDisposed = true; | ||
| // Dispose all kernels associated with this controller | ||
| workspace.notebookDocuments.forEach((doc) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controller instances are not shared across notebooks.
We cannot dispose all other kernels associated with the same kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit filters to the associatedDocuments. Is that sufficient? If there's a better way, please let me know!
Fixes microsoft#17094 It's my first time looking at any of this code and the lifecycle of everything is a little challenging to discover at a glance. This change passed unit tests and manually verifying, it solves the issue. I'll lay out the problem and solution independently below to make reviewing easier. If a server/kernel is removed, then another was created for the active notebook editor, cell executions against the new kernel would fail. The issue with clear repro steps and the corresponding logs: microsoft#17094. The root cause is improper resource cleanup during the disposal of a `VSCodeNotebookController`. When a `JupyterServerProvider` is removed, the extension correctly disposes of the associated controllers. However, the `dispose` method for the controller did not explicitly dispose of the underlying `IKernel` instance that it had created and managed. This left the kernel in an orphaned or "zombie" state. When a new server and controller were subsequently created for the same notebook, it'd try to use the old, still-lingering kernel, which was already in a started state, leading to the "Cannot call start again" error. The fix enhances the dispose method within the `VSCodeNotebookController`. The new logic ensures that when a controller is disposed, it actively cleans up any kernels it was responsible for. It iterates through all currently open NB docs and siposes any kernels who's connection metadata IDs match.
|
Thanks for taking a look and sorry I was off-base @DonJayamanne. I'm having a bit of a hard time understanding the life-cycle of everything, with all the wrapping and indirection. Using the same repro as listed in the issue, if I create a 3rd party Jupyter server, execute code, delete it (provider returns
I tried taking out that line but all the I'll try and spend some more time building an understanding of all this, but if you have any pointers or recommendations for a fix that'll save me time 👍. |
This change modifies the `VSCodeNotebookController`'s `dispose` method to ensure that when a controller is disposed, it disposes the kernels that are directly associated with that specific controller instance.
cf6f4ab to
45a2704
Compare
It's my first time looking at any of this code and the lifecycle of everything is a little challenging to discover at a glance. Sorry if this is overlooking anything obvious! This passes unit tests and I verified manually it solves the issue I reported.
Problem
If a server/kernel is removed, then another was created for the active notebook editor, cell executions against the new kernel would fail. The issue with clear repro steps and the corresponding logs: #17094.
Why
The root cause is improper resource cleanup during the disposal of a
VSCodeNotebookController. When aJupyterServerProvideris removed, the extension correctly disposes of the associated controllers. However, thedisposemethod for the controller did not explicitly dispose of the underlyingIKernelinstance that it had created and managed.This left the kernel in an orphaned or "zombie" state. When a new server and controller were subsequently created for the same notebook, it'd try to use the old, still-lingering kernel, which was already in a started state, leading to the "Cannot call start again" error.
Fix
When a controller is disposed, clean up any kernels it was responsible for. This is done by iterating through all currently open NB docs and disposing any kernels who's connection metadata IDs match.
Fixes #17094