-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Description
Hey all, after a nice conversation with @MahmoudAshraf97 on a different repo I wanted to share some of my benchmark data. This was created using an RTX 4090 on Windows, no flash attention, with 5 beams. I'd love to include data for whisper.cpp as well as huggingface's implementation but unfortunately when the HF implementation uses any beam size above 1 the vram usage skyrockets...and I'm not aware of any python bindings for .cpp that can use cuda acceleration. Hope ya'll find it as interesting as it was for me to test!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
