Parallelization with single model loaded in GPU/VRAM #2090

flariut · 2024-03-14T18:50:31Z

flariut
Mar 14, 2024

Hi, I'm running a project trying to parallelize tasks with whisper, I'm using CUDA and loading the model in VRAM. Is there a way to parallelize tasks with a single large-v3 model loaded in memory (~12GB VRAM) instead of loading the model for each task I need concurrency? My method right now is using python's multiprocessing for loading the model twice in memory for having at least 2x, I've tried threading and using the same model for multiple transcriptions but an exception of tensor's weights is raised. Thanks in advance!

flariut · 2024-04-05T17:43:48Z

flariut
Apr 5, 2024
Author

Is this possible in this python implementation, or not? it may depend on the implementation? Thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelization with single model loaded in GPU/VRAM #2090

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Parallelization with single model loaded in GPU/VRAM #2090

Uh oh!

Uh oh!

flariut Mar 14, 2024

Replies: 1 comment

Uh oh!

Uh oh!

flariut Apr 5, 2024 Author

flariut
Mar 14, 2024

flariut
Apr 5, 2024
Author