What is the best way to transcribe a conversation? #1191
-
I have a recording of a therapy session, about 30-min conversation between a patient and her therapist. I used whisper to transcribe but the result is a long blob of text, not in the dialog format. I also tried using the prompt of "This is a conversation between ...". But it did not change the result much. Is whisper capable of transcribing conversation into dialog format? By dialog format I mean something like |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
No. Whisper can't currently do this. The technical term for this is called:
You can find discussion on this in other topics:
Currently, the closest answer was given by jongwook: where you can "hack"
Besides that, you may need to use/program external tools—like Pyannote—to try to take:
which will generate timings, which would help tag certain lines as Speaker A + Speaker B. (See those previously linked topics for much more details.) |
Beta Was this translation helpful? Give feedback.
-
Yes, it does support speaker
- Speaker Diarization.
https://github.com/lablab-ai/Whisper-transcription_and_diarization-speaker-identification-
…On Tue, 4 Apr 2023, 11:54 Fan Yang, ***@***.***> wrote:
Thanks so much for the information. I'll look into them.
—
Reply to this email directly, view it on GitHub
<#1191 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJRADHDEBFJHJGVIP3L3SZDW7OLOPANCNFSM6AAAAAAWR7X7IA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Interested to see if this has changed with the release of Whisper 3 |
Beta Was this translation helpful? Give feedback.
No. Whisper can't currently do this.
The technical term for this is called:
You can find discussion on this in other topics:
Currently, the closest answer was given by jongwook:
where you can "hack"
initial_prompt
with hyphens, in order to nudge Whisper to potentially output dashes between speakers.