inference using int8 or other methods? #1411
-
has anyone tried doing whisper inference using either int8 or 4-bit? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
see #454 |
Beta Was this translation helpful? Give feedback.
-
from that post, key take aways that seem correct:
yup. for less audio, the difference is much smaller in speed up.
have not tried it yet using TensorRT vs PyTorch, but not sure overall it is worth pursuing anyways bceause the payback might not be there.
yup. so the only part being affected is Transformer layer, again not seeing very significant performance increases on shorter audio. |
Beta Was this translation helpful? Give feedback.
-
ok, so this is cool but did not make a hige speed difference. however, if it makes a GPU memory footprint difference, that would be great. have any seen any examples which can reduce the memory footprint of whisper models? |
Beta Was this translation helpful? Give feedback.
https://github.com/guillaumekln/faster-whisper