Possible solution for Speaker Diarization #763
Replies: 7 comments 12 replies
-
Hey, @mu4farooqi WIll try your solution on my examples and come back with results. I also tried forced alignment, but it brokes after certain point.
Such approach has shown quite good results, looking forward to compare them with your approach. |
Beta Was this translation helpful? Give feedback.
-
Hey @mu4farooqi [NeMo I 2023-01-13 16:30:54 features:267] PADDING: 16 TypeError Traceback (most recent call last) 8 frames TypeError: new() missing 1 required positional argument: 'task' Have you seen this and do you have any suggestions? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Wow.. thanks
…On Sat, Jan 14, 2023 at 5:31 PM Umar Farooqi ***@***.***> wrote:
As I’m installing nemo from master branch. They may have made a breaking
change. I’ll try to fix it later tonight.
—
Reply to this email directly, view it on GitHub
<#763 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QE4XLEHV6CWQDDSDU23WSMZNBANCNFSM6AAAAAATMM6DDQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
The error above was when I ran in colab
…On Sat, Jan 14, 2023, 11:20 PM Umar Farooqi ***@***.***> wrote:
I'm not sure in which OS/Environment you are trying my code. Because I
just ran my code in the colab
<https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831S0ADwz5_wIS-r>,
and it worked without any problem.
Can you please provide more details.
—
Reply to this email directly, view it on GitHub
<#763 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHU7QEY7FOBKDZCEZGZ77YLWSOCLBANCNFSM6AAAAAATMM6DDQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Fascinating stuff! I have been trying to take a practical approach to solving diarization for specific podcast-style scenarios. There are some paid services out there for podcast recordings / transcription, but I haven't been finding much that was open source and useful for this scenario. I found however that there are Discord bots that will easily record every speaker to separate audio files, and so I have been working on my project (https://github.com/jh-modjeski/trys) to transcribe each recording and stitch the transcripts together. With the recordings being separate for each speaker, we know who exactly is saying what. I have also been working with word timestamps to embed interjections into another's speaker's line of transcription while also recognizing cross talk as something that should be printed on a separate line. This doesn't solve diarization for a single audio source, but if people find diarization useful from my project, maybe there will be more interest in solving the harder problems that you're working on! My project is pretty messy and pales in comparison to what you're doing with diarization, TBH. I'm not a python developer and I've literally thrown it together with ChatGPT over a weekend + some free time this week. I'm pretty happy that it works as well as it does though, and now that it's functional, I'm trying to consider how best to rewrite the code properly. Let me know if you check it out; I'd really like to get more eyes on it. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the colab notebook.
|
Beta Was this translation helpful? Give feedback.
-
www.lexicaps.com seamlessly adds diarization to Whispers transcription. No 3rd party packages. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Over the weekend, I tried to come up with a consistent approach to diarize whisper transcripts predictably. I have written a post about it. Alternatively you can check the colab.
Beta Was this translation helpful? Give feedback.
All reactions