You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parakeet TDT v3 is the best transcription model I've used to-date, but its inability to easily synchronize with a diarization model makes it hard to use for real-world scenarios where speaker identification is important. Hacking together different models to achieve both transcription and diarization of the same audio is possible, but time-consuming and requires sacrifices in accuracy and code readability.
I currently combine Parakeet TDT v3 and the Multi-Scale Diarization Decoder outputs by aligning the timestamps of the segments greedily to find the best alignment for each segment, but this process leaves granularity and simplicity to be desired. The same can be done with the Sortformer Diarizer and its streaming variant.
It would be great to add the model path of the Multi-Scale Diarization Decoder or Sortformer as a parameter for Parakeet TDT v3 that would enable diarization using the Parakeet timestamps. I think that with these changes, the model would see massive adoption in real-world scenarios where audio needs to be both transcribed and diarized.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Parakeet TDT v3 is the best transcription model I've used to-date, but its inability to easily synchronize with a diarization model makes it hard to use for real-world scenarios where speaker identification is important. Hacking together different models to achieve both transcription and diarization of the same audio is possible, but time-consuming and requires sacrifices in accuracy and code readability.
I currently combine Parakeet TDT v3 and the Multi-Scale Diarization Decoder outputs by aligning the timestamps of the segments greedily to find the best alignment for each segment, but this process leaves granularity and simplicity to be desired. The same can be done with the Sortformer Diarizer and its streaming variant.
It would be great to add the model path of the Multi-Scale Diarization Decoder or Sortformer as a parameter for Parakeet TDT v3 that would enable diarization using the Parakeet timestamps. I think that with these changes, the model would see massive adoption in real-world scenarios where audio needs to be both transcribed and diarized.
Beta Was this translation helpful? Give feedback.
All reactions