Skip to content
Discussion options

You must be logged in to vote

It appears that audio is in int16 dtype, whereas Whisper expects float32 or float16. You may try converting it to a float32 array and dividing it by 32768, similar to what's done in audio.py:

return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
3 replies
@peternasser99
Comment options

@jongwook
Comment options

@peternasser99
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
1 reply
@flobeier
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants