Keep the whisper cached/ loaded into GPU/ RAM memory to save time #1359

abCods · 2023-05-18T08:01:19Z

abCods
May 18, 2023

Hi,

I am new to the world of the AI and models and this question might sound stupid so apologies for that in advance.

I have noticed that on a GPU enabled system the Whisper performs reasonably well in terms of the actual transcription process. However, most of the time is taken by it in the model initialisation step where it gets loaded into the memory ( in GPU and RAM ). I have heard about the batch processing but that is not suitable to my architecture for now as it requires fundamental changes which is not ideal in the bigger picture.

I was wondering if there is a way we can keep the model cached into the memory in order to save myself from the initialisation cost. To keep the perspective, I am using the model into a JavaScript container using its CLI.

phineas-pta · 2023-05-18T12:49:50Z

phineas-pta
May 18, 2023

unfortunately CLI is a kind of one-time use, you have to find a way to use python as your backend

0 replies

abCods · 2023-05-22T06:28:37Z

abCods
May 22, 2023
Author

@phineas-pta

Since I am not familiar with Python, I would appreciate any advice you might have to offer in the context of memory management ( RAM and CUDA based GPU ) to the first timer.

1 reply

phineas-pta May 22, 2023

CLI is one-time use, no caching no persistence, there's no advice i can offer; you may try tinkering with cuda cache, but next time running cli it'll reload the model again, because of the way it's programmed

you have several options:

learn to use python as backend, and communicate to frontend js via api, you can inspire from Upload audio/video to browser and generate subtitles that display as media is played #1323
paid for openai api (so no worry about memory gpu)
if you absolutely stay with js, you can search for onnx or webassembly version, but there's no gpu support for now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep the whisper cached/ loaded into GPU/ RAM memory to save time #1359

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Keep the whisper cached/ loaded into GPU/ RAM memory to save time #1359

Uh oh!

abCods May 18, 2023

Replies: 2 comments · 1 reply

Uh oh!

phineas-pta May 18, 2023

Uh oh!

Uh oh!

abCods May 22, 2023 Author

Uh oh!

phineas-pta May 22, 2023

abCods
May 18, 2023

Replies: 2 comments 1 reply

phineas-pta
May 18, 2023

abCods
May 22, 2023
Author