-
Notifications
You must be signed in to change notification settings - Fork 67
Add support for VLLM as whisper execution engine #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* Refactor data and normalizer * Update transformers * Update requirements * Update requirements * revert datasets for HF
* Update eval script for Fast Conformer NeMo models to support write and post-scoring * Add evaluate helper * Alias manifest utils in data utils * Update eval script for HF models to support write and post-scoring * Add comments Signed-off-by: smajumdar <[email protected]> * Fix detection of dataset id Signed-off-by: smajumdar <[email protected]> * Add checks for empty string in model filtering for eval script Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]>
* Add XL and XXL RNNT and CTC models Signed-off-by: Nithin Rao Koluguri <nithinraok> * update max samples Signed-off-by: Nithin Rao Koluguri <nithinraok> * use single batch size Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok>
* speechbrain initial get_model fn * wav2vec / run_eval.py working * conformer.sh * add .sh * remove pycache * fix batch size * docstring * docstring * updt * speechbrain requirements * speechbrain requirements * fix wer? * manifest * gitignore / remove savedir arg * remove speechbrain/ path * gitignore * update wav2vec * cv * update scripts * fix issue composite wer
…ers_models inference: Loop over transformers models
Tip of tree transformers seems to fix accuracy issue.
Signed-off-by: Kunal Dhawan <[email protected]>
Implement batching for Useful Sensors Moonshine
|
Hey - any updares here ? @Vaibhavs10 @Deep-unlearning |
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Add parakeet v2
|
ping ? |
|
Hi @p88h and @nithinraok, thanks for the ping! I’ll be running the VLLM benchmarks on the datasets shortly and will share the verified WER / RTFx numbers here shortly. Once that’s done I’ll go ahead and add the results to the Whisper leaderboard. |
|
I ran the evaluation with Breakdown per dataset:
The WERs are in line with the I also tested Breakdown per dataset:
This looks like something went wrong, possibly a decoding issue or mismatch in config. Could you help investigate? |
|
Interesting - I haven't played much with the tiny model but that looks like there is some issue indeed. |
Runs VLLM in greedy decoding mode with high batch parallelism. Tested up to batch 128 on an RTX 4080.
For AMI dataset with large-v3 model, this configuration achieves:
WER: 16.0 % RTFx: 63.56
It seems a bit faster than transformers backend, mostly thanks to wider possible batch size (that maxes out at 32 on the same GPU, achieving RTFx of 53.76).
Should scale proportionally on better hardware, and allow even wider batch sizes with more GPU memory.
It achieves slightly higher WER at the moment (16 vs 15.94), at least for this model.
For distil-large-v2, the results were almost identical for WER, again with some performance advantage.
BTW this particular dataset is likely not very representative of AMI in general and the (very low) WER results don't translate well to that whole original dataset (with very long recordings). When testing on 30s chunks, most models perform at ~25-ish WER, rather than ~15.