Keep model in VRAM? #516
ScissorSnips
started this conversation in
General
Replies: 1 comment
-
In a python app, you have more control since you explicitely load the model. For instance I use a little flask service with a simple API that keeps the model loaded in between calls.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm calling whisper from powershell, with --device cuda
It takes me something like 18 seconds to load the Large model into VRAM
Is there a way the model can stay loaded?
Or is that just how it is, when you call something from the command line/powershell.
Would I be better off writing a python app? Can it load it once then sit in memory until getting called externally to do the transcription?
Beta Was this translation helpful? Give feedback.
All reactions