flash attention 2, batch size, etc.? #2126

silvacarl2 · 2024-04-11T21:26:44Z

silvacarl2
Apr 11, 2024

does anyone have any experience on using flash attention 2 and different batch size or other parameters like that to make whisper as fast as possible?

carl

phineas-pta · 2024-04-11T23:55:03Z

phineas-pta
Apr 11, 2024

https://github.com/Vaibhavs10/insanely-fast-whisper

the author working at huggingface so he should be very well informed

0 replies

silvacarl2 · 2024-04-12T17:29:35Z

silvacarl2
Apr 12, 2024
Author

i checked that. i am not entirely sure what technique can be used to speed up whisper in general.

https://github.com/igorcosta/insanely-fast-whisper-cli simply says this:

import torch
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition",
"openai/whisper-large-v2",
torch_dtype=torch.float16,
model_kwargs={"use_flash_attention_2": True},
device="cuda:0")

outputs = pipe("<FILE_NAME>",
chunk_length_s=30,
batch_size=24,
return_timestamps=True)

outputs["text"]

i was looking for examples that call whisper directly versus going through pipeline.

1 reply

phineas-pta Apr 12, 2024

that's exactly what u looking for right ? batch size, flash attention 2, float16

if u want something else maybe onnx or tensorrt ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flash attention 2, batch size, etc.? #2126

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

flash attention 2, batch size, etc.? #2126

Uh oh!

silvacarl2 Apr 11, 2024

Replies: 2 comments · 1 reply

Uh oh!

phineas-pta Apr 11, 2024

Uh oh!

Uh oh!

silvacarl2 Apr 12, 2024 Author

Uh oh!

phineas-pta Apr 12, 2024

silvacarl2
Apr 11, 2024

Replies: 2 comments 1 reply

phineas-pta
Apr 11, 2024

silvacarl2
Apr 12, 2024
Author