Interleaving transcription results for multitalker parakeet with parallel_speaker_strategy=True #15152

namgyaaal · 2025-12-05T14:57:11Z

namgyaaal
Dec 5, 2025

I'm testing the new "nvidia/multitalker-parakeet-streaming-0.6b-v1" on example audio files. Is there a built-in way for the utterances between different speakers to be interleaved from the seglst? With an audio file of two people speaking back-and-forth it'll have two entries in the output .json, while I would like there to be entries in order of utterances split by speaker change.

Current solution for me right now is to modify perform_parallel_streaming_stt_spk in multispk_transcribe_utils.py to return asr_state.seglsts so that I can keep track of additions per streaming buffer iteration. This is used to construct what I need at the end of transcription, but I'm not sure if this ideal.

Thanks for any help.

Answered by tango4j

Dec 6, 2025

Hi.

If you want to see frequent speaker change in the transcription, check out the following line:

NeMo/examples/asr/asr_cache_aware_streaming/speech_to_text_multitalker_streaming_infer.py

Line 97 in 66ffb38

sent_break_sec: float = 30.0

Setting a very small value for sent_break_sec=0.1 would break the sentence very often. Then you will see more interweaved transcriptions.

However - often times there is no "speaker change" because real life conversations have lots of overlapped speech. That's because the multitalker ASR model does not use the concept of "speaker change".

View full answer

tango4j · 2025-12-06T01:48:43Z

tango4j
Dec 6, 2025
Collaborator

Hi.

If you want to see frequent speaker change in the transcription, check out the following line:

NeMo/examples/asr/asr_cache_aware_streaming/speech_to_text_multitalker_streaming_infer.py

Line 97 in 66ffb38

sent_break_sec: float = 30.0

Setting a very small value for sent_break_sec=0.1 would break the sentence very often. Then you will see more interweaved transcriptions.

However - often times there is no "speaker change" because real life conversations have lots of overlapped speech. That's because the multitalker ASR model does not use the concept of "speaker change".

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interleaving transcription results for multitalker parakeet with parallel_speaker_strategy=True #15152

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Interleaving transcription results for multitalker parakeet with parallel_speaker_strategy=True #15152

Uh oh!

namgyaaal Dec 5, 2025

Replies: 1 comment

Uh oh!

tango4j Dec 6, 2025 Collaborator

namgyaaal
Dec 5, 2025

tango4j
Dec 6, 2025
Collaborator