When i compare whisper and other STT model on TED dataset, i get weird result #1127

SeunghyunSEO · 2023-03-21T07:01:15Z

SeunghyunSEO
Mar 21, 2023

Hello, I am an ASR researcher and I attempted to compare the Whisper model with other models on the TED dataset.

While transcribing some samples using Whisper, I encountered some unexpected results, such as the following transcription:

our first topic is family. and my question is, who do you think that you're the most like? dan short who am i the most like? in appearance? vanessa karnes yes. dan short both appearance and character? vanessa karnes yeah, both. okay, so appearance, i look mostly like my mom, i think. okay. i have more of her skin tone, i have her eyes, and on her side of the family, most of the people are pretty skinny. and i'm a rather skinny guy. my dad's side is german and they tend to be a little bit bigger. so, yeah, i definitely got my mom's side.

It appears that Whisper outputted the names of both the interviewer and the interviewee, with "Dan Short" being the male interviewee and "Vanessa Karnes" being the female interviewer.

However, when using another model, this did not occur.

our first topic is family and my question is who do you think that you're the most like whom i the most like an appearance yes both appearance and character yeah okay so appearance i look mostly like my mom i think i have more of her like skin tone i have her eyes and on her side of the family most of the people are pretty skinny and i'm a rather skinny guy my dad my dad's side is german and they tend to be a little bit bigger so yeah i definitely got my mom's side

I am curious if there are other features that could infer the speaker in a dialogue or if this is simply because the labels were not preprocessed during training. Could you please provide some insight on this matter?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When i compare whisper and other STT model on TED dataset, i get weird result #1127

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

When i compare whisper and other STT model on TED dataset, i get weird result #1127

Uh oh!

SeunghyunSEO Mar 21, 2023

Replies: 0 comments

SeunghyunSEO
Mar 21, 2023