Use a thirty-party voice activity detection package with audio chunk feed programmatically in next.js v15? #82730

chris-opendata · 2025-08-18T06:05:08Z

chris-opendata
Aug 18, 2025

Summary

I am working in next.js v15 and typescript env. Here is my user case:

a MediaStreamAudioSourceNode object,
a AudioWorkletNode object,
a Web Worker for ricky123 VAD process.

The local mic audio (or remote peer audio) is streamed to the MediaStreamAudioSourceNode object, which is connected to the AudioWorkletNode object for preprocessing, which feeds the audio chunks to the WebWorker for ricky123 VAD in a browser. The code segments are as follows.

In main thread UI component,

......
// After new, initialising
this.ricky123VadWorker.postMessage({ type: 'init' });
// Data feed from the audio Worklet.
audioWorkletNode.port.onmessage = (event) => {
    console.log("Receives audio data from the Worklet.");
    // Post to the VAD for detection.
    this.ricky123VadWorker.postMessage({ type: "data", data: event.data });
};
// The VAD worker responds back.
this.ricky123VadWorker.onmessage = (event) => {
    console.log("Receives event from the VAD worker.");
    if (event.data.type === 'initComplete') {
        console.log("VAD worker: initComplete.");
    }
    ......
};

In ricky123 VAD Web Worker,

import { MicVAD } from '@ricky0123/vad-web';
self.onmessage = async (event) => {
    switch (type) {
        case "init":
            vad = await MicVAD.new({
                            baseAssetPath: '/vad-models/',
                            stream: undefined, // we're feeding chunks manually
                            // Setup VAD event callbacks
                            // onSpeechStart: () => {
                            //     self.postMessage({ type: 'speech-start' })
                            // },
                            onSpeechEnd: (audio) => {
                                // The VAD model determines that a segment of speech has ended,
                                // and triggers onSpeechEnd and provides the captured audio data for that segment.
                                self.postMessage({ type: 'speech', payload: { data: audio } });
                            },
                            // onFrameProcessed: (probabilities) => {
                            //     // You can also get a probability for each audio frame of a segment.
                            //     self.postMessage({ type: 'frameProcessed', probabilities });
                            // },
                        });
  
            console.log("Initialize VAD object.");
            vad.start();
            self.postMessage({ type: "initComplete" });
    ........

Running the code, it hangs on await MicVAD.new and thus console.log("Initialize VAD object.") is never called. I have copied silero_vad_v5.onnx into public/vad-models.

My questions:

Could anyone who have used ricky123 VAD successfully in the same approach enlighten me the issue?
Or, could anyone who have used any other VAD package in net.js v15 for client-side VAD help me with a working example?

Thank you

Additional information

No response

Example

No response

icyJoseph · 2025-08-18T08:00:08Z

icyJoseph
Aug 18, 2025
Maintainer

Looks like they've got it here?

Look at the next.config.js files, and usage of the vad-web library.

1 reply

chris-opendata Aug 18, 2025
Author

Thank you, Joseph, for the working examples.

chris-opendata · 2025-08-20T05:18:09Z

chris-opendata
Aug 20, 2025
Author

After painstakingly tries, I got it working finally. Here are the code segments for creating the VAD:

      import { NonRealTimeVAD } from "@ricky0123/vad-web";
      import * as ort from 'onnxruntime-web';
      ort.env.wasm.wasmPaths = {
        wasm: '/vad-models/ort-wasm-simd-threaded.wasm', // Relative to /public
        mjs: '/vad-models/ort-wasm-simd-threaded.mjs' // Relative to /public
      };
      const vadOptions = {
                          positiveSpeechThreshold: 0.5,
                          negativeSpeechThreshold: 0.35,
                          redemptionFrames: 8,
                          frameSamples: 1536,
                          minSpeechFrames: 3,
                          preSpeechPadFrames: 1,
                          submitUserSpeechOnPause: false,
                      };
      
      async initVAD(baseAssetPath: string) {
          const modelPath = baseAssetPath + 'silero_vad_v5.onnx';
          const modelFetcher = async (): Promise<ArrayBuffer> => {
                                  const response = await fetch(modelPath);
                                  return await response.arrayBuffer();
                              };
          this.vad = new NonRealTimeVAD(
                          modelFetcher,
                          ort,
                          vadOptions, // Or customize thresholds, frameSamples, etc.
                      );
      }

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use a thirty-party voice activity detection package with audio chunk feed programmatically in next.js v15? #82730

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Use a thirty-party voice activity detection package with audio chunk feed programmatically in next.js v15? #82730

Uh oh!

chris-opendata Aug 18, 2025

Summary

Additional information

Example

Replies: 2 comments · 1 reply

Uh oh!

icyJoseph Aug 18, 2025 Maintainer

Uh oh!

chris-opendata Aug 18, 2025 Author

Uh oh!

chris-opendata Aug 20, 2025 Author

chris-opendata
Aug 18, 2025

Replies: 2 comments 1 reply

icyJoseph
Aug 18, 2025
Maintainer

chris-opendata Aug 18, 2025
Author

chris-opendata
Aug 20, 2025
Author