Using raw PCM 16 bit signed integer audio instead of file #1705

samcasacio · 2023-10-11T19:54:29Z

samcasacio
Oct 11, 2023

Hi Folks,

First of all, this is a very cool library! Kudos!

I've been playing around with converting some raw 8k PCM 16 bit signed integer audio in to a transcript. I am reading through audio calling the whisper model transcribe (just base.en) function on the segment. I am storing the segment as just bytes in a plain bytes buffer, which I am then converting to a numpy ndarray via the following:

data = numpy.frombuffer(segment, numpy.int16).flatten().astype(numpy.float32) / 32768.0

I am then passing in this numpy ndarray in to the model, but unfortunately the transcript is no where close to what is in the audio. However it works when I write this data out in to wav format (through scipy.io.wavfile) and pass that path in to the model. As a test I am using the load_audio utility provided by whisper to get the numpy ndarray and comparing it against the converted data (from above) using numpy.array_equal, which is returning True (so I can assume the data used to create the original wav file and the data loaded through whisper load_audio is the same). To me it seems like this should work, however I think I am missing a step here to get this in to the right format (unless I am misunderstanding something, which lets be honest is very likely :-) ).

How do I convert raw audio data to the proper format used by whisper transcribe so I can avoid writing it out to disk and the reading it back in through whisper load_audio?

Answered by samcasacio

Oct 11, 2023

Ok, so I did misunderstand something. After some more digging I see that whisper transcribe is expected a sample rate of 16k and not 8k. I resampled my raw data to 16k and now transcribe appears to be working.

View full answer

samcasacio · 2023-10-11T20:50:14Z

samcasacio
Oct 11, 2023
Author

Ok, so I did misunderstand something. After some more digging I see that whisper transcribe is expected a sample rate of 16k and not 8k. I resampled my raw data to 16k and now transcribe appears to be working.

3 replies

trappedinspacetime Feb 3, 2024

How did you do that? I want to record audio from mic to a buffer and feed whisper with its path.
Would you please tell me how to do that?

samcasacio Feb 4, 2024
Author

Hi @trappedinspacetime,

So I am getting data in 8k PCM16 format from a network socket, which makes it a straight forward read until the transmit is completed. I am using audioop (there are many libraries available in Python that can do this too) to convert the sample rate to 16k, then assembling the numpy ndarray as mentioned by my original comment. Whisper can operate on an numpy ndarray directly (you don't need to create a file first).

For your case, I would recommend just reading the audio data from your microphone in to a buffer and using that for Whisper. You will still have to get your data in the right format for Whisper before you can run the model, similar to how I'm doing it above. Otherwise, you can record a file from the microphone then follow the example provided in the source code.

trappedinspacetime Feb 5, 2024

@samcasacio thank you for responding.
I'm trying to code a voice assistant. After hotword detected I record 3 seconds of a 16000khz mono wav

                  sample_rate = 16000
                  duration = 5  # Adjust the duration as needed
      
                  # Record audio
                  audio_data = sd.rec(int(sample_rate * duration), samplerate=sample_rate, channels=1, dtype=np.int16)
                  sd.wait()
      
                  # Convert audio data to a BytesIO object
                  audio_bytesio = io.BytesIO()
                  audio_bytesio.write(audio_data.tobytes())
                  audio_bytesio.seek(0)
      
                  # Define the whispercpp command
                  whispercpp_command = [
                      "whispercpp",
                      "-m", "/home/kenn/Desktop/2022-10/whisper.cpp/models/ggml-base.bin",
                      "-nt",
                      "-l", "tr",
                      "-f", "-",
                      "-otxt"
                  ]

I get error: failed to open WAV file from stdin
I don't know how to pass variable vaw file to `whisper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using raw PCM 16 bit signed integer audio instead of file #1705

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using raw PCM 16 bit signed integer audio instead of file #1705

Uh oh!

Uh oh!

samcasacio Oct 11, 2023

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

samcasacio Oct 11, 2023 Author

Uh oh!

trappedinspacetime Feb 3, 2024

Uh oh!

samcasacio Feb 4, 2024 Author

Uh oh!

trappedinspacetime Feb 5, 2024

samcasacio
Oct 11, 2023

Replies: 1 comment 3 replies

samcasacio
Oct 11, 2023
Author

samcasacio Feb 4, 2024
Author