The Large V2 model of Whisper incorrectly predicts YouTube-related phrases, such as "don't forget to subscribe and like the video" or "subscribe to our channel", in some audio recordings, especially those with silence at the end. These sentences are not present in the actual audio content. #1101

yigitkonur · 2023-03-16T11:38:32Z

yigitkonur
Mar 16, 2023

I have been working with the Whisper Large V2 model for transcribing audio recordings. In several cases, particularly when there is silence at the end of the audio, the model produces predictions that include YouTube-related phrases that are not actually part of the content. This issue seems to occur only at the end of the content.

Examples of incorrectly predicted phrases include:

"don't forget to subscribe and like the video"
"subscribe to our channel"

This might be related to the presence of a large amount of YouTube information in the training data.

Steps to reproduce

Use the Whisper Large V2 model to transcribe audio recordings with silence at the end.
Observe the predictions and compare them with the actual content of the audio.

Expected behavior

The model should not predict YouTube-related phrases when they are not present in the audio content, even when there is silence at the end of the recording.

Actual behavior

The model incorrectly predicts YouTube-related phrases that are not part of the audio content, particularly when there is silence at the end of the recording.

Possible solution

A potential solution could involve re-examining the training data to reduce the influence of such YouTube-related phrases, or refining the model to be more context-aware when predicting sentences in silent sections of the audio.

Additional context

This issue has been observed only in the Large V2 model and primarily occurs at the end of the audio content.

nicholasgcotton · 2023-03-16T14:27:48Z

nicholasgcotton
Mar 16, 2023

I can confirm this was also an issue with v1. see for example: #29 (comment)

0 replies

glangford · 2023-03-16T15:48:59Z

glangford
Mar 16, 2023

See also these related discussions, just fyi

0 replies

Hentaisocial · 2023-03-17T08:52:55Z

Hentaisocial
Mar 17, 2023

0 replies

chopen82 · 2023-03-17T20:01:16Z

chopen82
Mar 17, 2023

Confirmed. I've seen it in Polish language too.

0 replies

MishaNyaCopilot · 2023-05-24T20:56:21Z

MishaNyaCopilot
May 24, 2023

Same for Russian language. When silence it gives some random words, not exactly in Russian. It can be Korean, English, some random letters too. Mostly it tells something like "Dont forget to subscribe" or "Subtitles were made by... (some random name)". It also recognizes music, example: "Calm music" etc
It happens both with largev2 api and cuda medium.
Prompting to api doesnt help

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Large V2 model of Whisper incorrectly predicts YouTube-related phrases, such as "don't forget to subscribe and like the video" or "subscribe to our channel", in some audio recordings, especially those with silence at the end. These sentences are not present in the actual audio content. #1101

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The Large V2 model of Whisper incorrectly predicts YouTube-related phrases, such as "don't forget to subscribe and like the video" or "subscribe to our channel", in some audio recordings, especially those with silence at the end. These sentences are not present in the actual audio content. #1101

Uh oh!

yigitkonur Mar 16, 2023

Steps to reproduce

Expected behavior

Actual behavior

Possible solution

Additional context

Replies: 5 comments

Uh oh!

nicholasgcotton Mar 16, 2023

Uh oh!

glangford Mar 16, 2023

Uh oh!

Hentaisocial Mar 17, 2023

Uh oh!

chopen82 Mar 17, 2023

Uh oh!

MishaNyaCopilot May 24, 2023

yigitkonur
Mar 16, 2023

nicholasgcotton
Mar 16, 2023

glangford
Mar 16, 2023

Hentaisocial
Mar 17, 2023

chopen82
Mar 17, 2023

MishaNyaCopilot
May 24, 2023