Diarization support for Nvidia Parakeet TDT v3 #14842

bentdixon · 2025-09-29T17:43:47Z

bentdixon
Sep 29, 2025

Parakeet TDT v3 is the best transcription model I've used to-date, but its inability to easily synchronize with a diarization model makes it hard to use for real-world scenarios where speaker identification is important. Hacking together different models to achieve both transcription and diarization of the same audio is possible, but time-consuming and requires sacrifices in accuracy and code readability.

I currently combine Parakeet TDT v3 and the Multi-Scale Diarization Decoder outputs by aligning the timestamps of the segments greedily to find the best alignment for each segment, but this process leaves granularity and simplicity to be desired. The same can be done with the Sortformer Diarizer and its streaming variant.

It would be great to add the model path of the Multi-Scale Diarization Decoder or Sortformer as a parameter for Parakeet TDT v3 that would enable diarization using the Parakeet timestamps. I think that with these changes, the model would see massive adoption in real-world scenarios where audio needs to be both transcribed and diarized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diarization support for Nvidia Parakeet TDT v3 #14842

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Diarization support for Nvidia Parakeet TDT v3 #14842

Uh oh!

bentdixon Sep 29, 2025

Replies: 0 comments

bentdixon
Sep 29, 2025