inference using int8 or other methods? #1411

silvacarl2 · 2023-05-31T00:41:36Z

silvacarl2
May 31, 2023

has anyone tried doing whisper inference using either int8 or 4-bit?

Answered by phineas-pta

Jun 12, 2023

https://github.com/guillaumekln/faster-whisper

View full answer

phineas-pta · 2023-06-12T06:42:13Z

phineas-pta
Jun 12, 2023

see #454

0 replies

silvacarl2 · 2023-06-12T14:23:36Z

silvacarl2
Jun 12, 2023
Author

from that post, key take aways that seem correct:

this would only be useful if you was transcribing a large amount of audio

yup. for less audio, the difference is much smaller in speed up.

This library doesn't work with GPUs because PyTorch doesn't support Dynamic Quantization with GPUs on CUDA. Your best bet would >be to try something like TensorRT.

have not tried it yet using TensorRT vs PyTorch, but not sure overall it is worth pursuing anyways bceause the payback might not be there.

The OpenAI whisper model is basically just a Transformer model with Mel spectrogram inputs which are fed into a CNN and then the >Transformer component and then the predicted tokens are outputted. Dynamic Quantisation doesn’t support CNNs right now so only >the linear layers within the Transformer are quantised using this method.

yup. so the only part being affected is Transformer layer, again not seeing very significant performance increases on shorter audio.

0 replies

silvacarl2 · 2023-06-12T18:31:11Z

silvacarl2
Jun 12, 2023
Author

ok, so this is cool but did not make a hige speed difference.

however, if it makes a GPU memory footprint difference, that would be great.

have any seen any examples which can reduce the memory footprint of whisper models?

1 reply

phineas-pta Jun 12, 2023

https://github.com/guillaumekln/faster-whisper

Answer selected by silvacarl2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

inference using int8 or other methods? #1411

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

inference using int8 or other methods? #1411

Uh oh!

Uh oh!

silvacarl2 May 31, 2023

Replies: 3 comments · 1 reply

Uh oh!

phineas-pta Jun 12, 2023

Uh oh!

Uh oh!

silvacarl2 Jun 12, 2023 Author

Uh oh!

silvacarl2 Jun 12, 2023 Author

Uh oh!

phineas-pta Jun 12, 2023

silvacarl2
May 31, 2023

Replies: 3 comments 1 reply

phineas-pta
Jun 12, 2023

silvacarl2
Jun 12, 2023
Author

silvacarl2
Jun 12, 2023
Author