How to send audio to Whisper in a numpy array ? #450
-
I want to send speech to Whisper as a numpy array. The documentation says this is possible, but I do not get correct transcriptions. Probably there is something I don't understand about the required format. I am sending PCM audio in 5 second chunks as a 32 bit float numpy array. This should be padded to 30 seconds and passed to 'transcribe'. Any help would be appreciated. Here is my test code:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Looks like you're missing a Line 49 in 9f70a35 Note that the results are expected to be disjointed and possibly have missing words because of the 5 second chunking. |
Beta Was this translation helpful? Give feedback.
Looks like you're missing a
/ 32768.0
and make sureaudio
has only 1 dimension.whisper/whisper/audio.py
Line 49 in 9f70a35
Note that the results are expected to be disjointed and possibly have missing words because of the 5 second chunking.