What is the best way to transcribe a conversation? #1191

fanyangxyz · 2023-04-04T00:26:29Z

fanyangxyz
Apr 4, 2023

I have a recording of a therapy session, about 30-min conversation between a patient and her therapist. I used whisper to transcribe but the result is a long blob of text, not in the dialog format. I also tried using the prompt of "This is a conversation between ...". But it did not change the result much. Is whisper capable of transcribing conversation into dialog format? By dialog format I mean something like
"Person A: ..."
"Person B: ..."

Answered by Tex2002ans

Apr 4, 2023

I used whisper to transcribe but the result is a long blob of text, not in the dialog format. [...] Is whisper capable of transcribing conversation into dialog format?

"Person A: ..."
"Person B: ..."

No. Whisper can't currently do this.

The technical term for this is called:

Speaker Diarization.

You can find discussion on this in other topics:

Currently, the closest answer was given by jongwook:

#117 (comment)

where you can "hack" initial_prompt with hyphens, in order to nudge Whisper to potentially output dashes between speakers.

[...] You can also use this for "prompt engineering", to in…

View full answer

Tex2002ans · 2023-04-04T03:20:49Z

Tex2002ans
Apr 4, 2023

I used whisper to transcribe but the result is a long blob of text, not in the dialog format. [...] Is whisper capable of transcribing conversation into dialog format?

"Person A: ..."
"Person B: ..."

No. Whisper can't currently do this.

The technical term for this is called:

Speaker Diarization.

You can find discussion on this in other topics:

Currently, the closest answer was given by jongwook:

prompt vs prefix in DecodingOptions #117 (comment)

where you can "hack" initial_prompt with hyphens, in order to nudge Whisper to potentially output dashes between speakers.

[...] You can also use this for "prompt engineering", to inform the model to become more likely to output certain jargon (" So we were just talking about DALL·E") or do a crude form of speaker turn tracking (e.g. " - Hey how are you doing? - I'm doing good. How are you?", note that the token for " -" is suppressed by default and will need to be enabled manually.)

Besides that, you may need to use/program external tools—like Pyannote—to try to take:

Original Audio + Whisper's transcripts

which will generate timings, which would help tag certain lines as Speaker A + Speaker B.

(See those previously linked topics for much more details.)

2 replies

fanyangxyz Apr 4, 2023
Author

Thanks so much for the information. I'll look into them.

Majdoddin Jul 20, 2023

@fanyangxyz @Tex2002ans
www.lexicaps.com adds diarization to Whispers transcription. No 3rd party packages.
Announcement: #1537
Repo: https://github.com/Majdoddin/lexicaps

Frosti7 · 2023-04-04T04:53:18Z

Frosti7
Apr 4, 2023

Yes, it does support speaker - Speaker Diarization. https://github.com/lablab-ai/Whisper-transcription_and_diarization-speaker-identification-

…

On Tue, 4 Apr 2023, 11:54 Fan Yang, ***@***.***> wrote: Thanks so much for the information. I'll look into them. — Reply to this email directly, view it on GitHub <#1191 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJRADHDEBFJHJGVIP3L3SZDW7OLOPANCNFSM6AAAAAAWR7X7IA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

Joshfindit · 2023-11-07T04:03:16Z

Joshfindit
Nov 7, 2023

Interested to see if this has changed with the release of Whisper 3

1 reply

envious Nov 7, 2023

Same

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the best way to transcribe a conversation? #1191

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What is the best way to transcribe a conversation? #1191

Uh oh!

fanyangxyz Apr 4, 2023

Replies: 3 comments · 3 replies

Uh oh!

Uh oh!

Tex2002ans Apr 4, 2023

Uh oh!

fanyangxyz Apr 4, 2023 Author

Uh oh!

Majdoddin Jul 20, 2023

Uh oh!

Frosti7 Apr 4, 2023

Uh oh!

Joshfindit Nov 7, 2023

Uh oh!

envious Nov 7, 2023

fanyangxyz
Apr 4, 2023

Replies: 3 comments 3 replies

Tex2002ans
Apr 4, 2023

fanyangxyz Apr 4, 2023
Author

Frosti7
Apr 4, 2023

Joshfindit
Nov 7, 2023