-
Notifications
You must be signed in to change notification settings - Fork 3
Description
It would be interesting to experiment with the following workflow:
- start with an lite kernel in the browser
- if the user executes some code using a library not available in the browser, or requires more resources, automatically start a server kernel and change the current session to use the server kernel
- have some dialog / UI elements to inform the user about this change
Such mechanism would be base on heuristics to decide when to promote the kernels.
The advantages of such approach could be found in environments for which server kernels may take a couple of minutes to start. So users could start working in the browser before switching to a server kernel later, if needed.
In terms of UX:
- this could happen automatically, based on user-configurable heuristics
- allow manually promoting a kernel, for example via a toolbar button
For reference, an episode of the Meta Tech Podcast recently covered this:
- https://engineering.fb.com/2024/06/10/data-infrastructure/serverless-jupyter-notebooks-bento-meta/
- https://engineering.fb.com/2024/09/17/data-infrastructure/inside-bento-jupyter-notebooks-at-meta/
Posting some relevant parts of the transcript here for reference:
So what does it actually look like if I underestimate how much compute is actually required to do a certain workload, some data transformation? I accidentally start training an ML model in my browser. So what happens in this case? Because I think Chrome won't let me allocate the 64 gigabytes of RAM that I have in my laptop here. That is correct. And also some of the other heavy and beefy libraries that we actually do all of that are not going to be available down there. So we built a paired down version of the Python REPL with all of the standard libraries that you get out of the box and of the stuff that we've built internally within our Python environments, obviously not going to work out there. So one of the things that we then did was come up with a bunch of heuristics to actually use that as the fork in the road, if you may, that will then move you from executing directly within the serverless world. So think about one being like a module not found error. So anytime that exception gets raised, it means this thing will never work within the serverless architecture. So that's the trigger that then kicks off the process that will convert, promote your existing notebook so that it will move away from the serverless stuff and actually start using the server-based architecture directly. And all of this is very, very seamless and integrated together very nicely. Another thing would be running out of the memory, that throws a different exception as well. So coming up with all of these exception classes and then using these as the triggers that then move you from one execution paradigm into the other is sort of like the approach that we took. That sounds really smooth. And I guess what helps is that you're not heading towards a dead end. You don't need to rewrite all your code now in a different environment, but it's literally like, okay, I need to reserve a machine now.