Using raw PCM 16 bit signed integer audio instead of file #1705
-
Hi Folks, First of all, this is a very cool library! Kudos! I've been playing around with converting some raw 8k PCM 16 bit signed integer audio in to a transcript. I am reading through audio calling the whisper model transcribe (just base.en) function on the segment. I am storing the segment as just bytes in a plain bytes buffer, which I am then converting to a numpy ndarray via the following:
I am then passing in this numpy ndarray in to the model, but unfortunately the transcript is no where close to what is in the audio. However it works when I write this data out in to wav format (through scipy.io.wavfile) and pass that path in to the model. As a test I am using the load_audio utility provided by whisper to get the numpy ndarray and comparing it against the converted data (from above) using numpy.array_equal, which is returning True (so I can assume the data used to create the original wav file and the data loaded through whisper load_audio is the same). To me it seems like this should work, however I think I am missing a step here to get this in to the right format (unless I am misunderstanding something, which lets be honest is very likely :-) ). How do I convert raw audio data to the proper format used by whisper transcribe so I can avoid writing it out to disk and the reading it back in through whisper load_audio? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Ok, so I did misunderstand something. After some more digging I see that whisper transcribe is expected a sample rate of 16k and not 8k. I resampled my raw data to 16k and now transcribe appears to be working. |
Beta Was this translation helpful? Give feedback.
Ok, so I did misunderstand something. After some more digging I see that whisper transcribe is expected a sample rate of 16k and not 8k. I resampled my raw data to 16k and now transcribe appears to be working.