Testing music, it seems to hallucinate audio during times with no words, but with noise. #1192

sfxworks · 2023-04-04T07:53:02Z

sfxworks
Apr 4, 2023

Aside from a few errors, there are parts where, for example, there is noise in the background, but generates entire chunks of words as to repeat the last segments. This is very apparent from 02:51.000 to 03:00.000 and other similar areas. This video is a good example of a noisy video with music in an attempt to transcribe lyrics. It really put the medium model to the test.

whisper celldweller-nfs.opus --model medium                                ✔  2m 11s  
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:05.000]  We are unrestricted by rules.
[00:05.000 --> 00:10.000]  We are challenged by the unexpected.
[00:30.000 --> 00:35.000]  Push it through my soul.
[00:46.000 --> 00:49.000]  I'm pacing inside this empty room.
[00:49.000 --> 00:53.000]  Don't you wait till my life's withdrawn.
[00:53.000 --> 00:57.000]  Uneasy, I'm waiting here anxiously.
[00:57.000 --> 01:01.000]  It's a waste and I won't pay it for the Aeon.
[01:02.000 --> 01:04.000]  Endless night.
[01:05.000 --> 01:07.000]  No more night.
[01:07.000 --> 01:09.000]  My head against this tomb.
[01:10.000 --> 01:12.000]  And pushing through.
[01:15.000 --> 01:17.000]  I'm pushing through.
[01:20.000 --> 01:22.000]  Face down and push it through.
[01:23.000 --> 01:25.000]  Trapped and isolated.
[01:28.000 --> 01:31.000]  Time is dislocated.
[01:32.000 --> 01:35.000]  Eternity is what a moment seems.
[01:35.000 --> 01:39.000]  When I can't feel anything.
[01:47.000 --> 01:50.000]  Erasing my final memories.
[01:50.000 --> 01:54.000]  They won't stop till my whole life's gone.
[01:55.000 --> 01:58.000]  Uneasy, I wait for normality.
[01:59.000 --> 02:02.000]  It's a waste and I won't pay it for the Aeon.
[02:03.000 --> 02:05.000]  Endless night.
[02:06.000 --> 02:08.000]  No more night.
[02:08.000 --> 02:10.000]  My head against this tomb.
[02:11.000 --> 02:13.000]  And pushing through.
[02:17.000 --> 02:19.000]  I'm pushing through.
[02:20.000 --> 02:22.000]  Face down and push it through.
[02:23.000 --> 02:25.000]  Trapped and isolated.
[02:28.000 --> 02:30.000]  Time is dislocated.
[02:32.000 --> 02:35.000]  Eternity is what a moment seems.
[02:35.000 --> 02:38.000]  When I can't feel anything.
[02:49.000 --> 02:51.000]  It's a waste and I won't pay it for the Aeon.
[02:51.000 --> 02:53.000]  Endless night.
[02:53.000 --> 02:55.000]  No more night.
[02:55.000 --> 02:57.000]  My head against this tomb.
[02:57.000 --> 02:59.000]  And pushing through.
[03:01.000 --> 03:03.000]  The night is gone without a trace.
[03:03.000 --> 03:06.000]  And I'm lost two times in the race.
[03:06.000 --> 03:09.000]  I'm lonely, where's the chain?
[03:09.000 --> 03:11.000]  Cause I'll push you through.
[03:11.000 --> 03:13.000]  Face down.
[03:19.000 --> 03:21.000]  No more night.
[03:21.000 --> 03:23.000]  My head against this tomb.
[03:23.000 --> 03:25.000]  And pushing through.
[03:25.000 --> 03:27.000]  Trapped and isolated.
[03:27.000 --> 03:29.000]  Eternity is what a moment seems.
[03:29.000 --> 03:31.000]  When I can't feel anything.
[03:31.000 --> 03:33.000]  It's a waste and I won't pay it for the Aeon.
[03:33.000 --> 03:35.000]  Endless night.
[03:35.000 --> 03:37.000]  No more night.
[03:37.000 --> 03:39.000]  My head against this tomb.
[03:39.000 --> 03:41.000]  And pushing through.
[03:41.000 --> 03:43.000]  Trapped and isolated.
[03:43.000 --> 03:45.000]  And I'm lost two times in the race.
[03:45.000 --> 03:47.000]  I'm lonely, where's the chain?
[03:47.000 --> 03:49.000]  Cause I'll push you through.
[03:49.000 --> 03:51.000]  Face down.
[03:51.000 --> 03:53.000]  No more night.
[03:53.000 --> 03:55.000]  My head against this tomb.
[03:55.000 --> 03:57.000]  And pushing through.
[03:57.000 --> 03:59.000]  Trapped and isolated.
[03:59.000 --> 04:01.000]  Eternity is what a moment seems.
[04:01.000 --> 04:03.000]  When I can't feel anything.
[04:03.000 --> 04:05.000]  It's a waste and I won't pay it for the Aeon.
[04:05.000 --> 04:07.000]  Endless night.
[04:07.000 --> 04:09.000]  No more night.
[04:09.000 --> 04:11.000]  My head against this tomb.
[04:11.000 --> 04:13.000]  And pushing through.
[04:13.000 --> 04:15.000]  Trapped and isolated.
[04:15.000 --> 04:17.000]  Eternity is what a moment seems.
[04:17.000 --> 04:19.000]  When I can't feel anything.
[04:19.000 --> 04:21.000]  It's a waste and I won't pay it for the Aeon.
[04:21.000 --> 04:23.000]  Endless night.
[04:23.000 --> 04:25.000]  No more night.
[04:25.000 --> 04:27.000]  My head against this tomb.
[04:27.000 --> 04:29.000]  And pushing through.
[04:29.000 --> 04:31.000]  Trapped and isolated.
[04:31.000 --> 04:33.000]  And I'm lost two times in the race.
[04:33.000 --> 04:35.000]  I'm lonely, where's the chain?
[04:35.000 --> 04:37.000]  Cause I'll push you through.
[04:37.000 --> 04:39.000]  Trapped and isolated.
[04:39.000 --> 04:41.000]  Eternity is what a moment seems.
[04:41.000 --> 04:43.000]  When I can't feel anything.
[04:43.000 --> 04:45.000]  But all the things I've been missing.
[04:45.000 --> 04:47.000]  In that lost Aeon.
[04:59.000 --> 05:01.000]  Endless night.
[05:01.000 --> 05:03.000]  No more night.
[05:05.000 --> 05:07.000]  My head against this tomb.
[05:07.000 --> 05:09.000]  And pushing through.
[05:09.000 --> 05:11.000]  Trapped and isolated.
[05:13.000 --> 05:15.000]  And I'm lost two times in the race.
[05:15.000 --> 05:17.000]  I'm lonely, where's the chain?
[05:17.000 --> 05:19.000]  Cause I'll push you through.
[05:19.000 --> 05:21.000]  Trapped and isolated.
[05:21.000 --> 05:23.000]  Eternity is what a moment seems.
[05:23.000 --> 05:25.000]  When I am lost inside this dream.
[05:25.000 --> 05:27.000]  When I can't speak and I can't scream.
[05:27.000 --> 05:29.000]  When I can't feel anything.
[05:43.000 --> 05:46.000]  Face down and pushing through.

brendan-jarvis · 2023-04-04T09:13:00Z

brendan-jarvis
Apr 4, 2023

Usually running Whisper with --word_timestamps True and --condition_on_previous_text False can help here, but this is quite a difficult track to caption.

Whisper seemed to do okay until hallucinating some profane self-hatred from 4:03 - 4:18 😥

My results - click to expand!

python -m whisper 'NFS： Most Wanted - Music Video [L3ZguNkBbao].webm' --model medium.en --condition_on_previous_text False --word_timestamps True
[00:02.180 --> 00:04.640]  We are unrestricted by rules.
[00:07.620 --> 00:10.020]  We are challenged by the unexpected.
[00:30.000 --> 00:33.860]  Push it through my soul.
[00:38.520 --> 00:49.420]  I'm pacing inside this empty room.
[00:49.760 --> 00:53.120]  Don't you wait till my light's withdrawn.
[00:53.640 --> 00:57.240]  Uneasy, I'm waiting here anxiously.
[00:57.240 --> 01:01.580]  It's a waste and I won't pay you for a yard.
[01:02.440 --> 01:06.860]  Endless night, no full night.
[01:07.020 --> 01:09.280]  My head against this doom.
[01:10.320 --> 01:12.020]  I'm pushing through.
[01:14.580 --> 01:17.420]  I'm pushing through.
[01:19.280 --> 01:22.580]  Face down and push it through.
[01:23.760 --> 01:24.960]  Trapped and isolated.
[01:28.040 --> 01:31.260]  Time is dislocated.
[01:32.380 --> 01:39.060]  Eternity is what the moment seems when I can't feel anything.
[01:42.800 --> 01:50.760]  Erasing my final memories.
[01:51.160 --> 01:54.540]  They won't stop till my whole life's gone.
[01:54.980 --> 01:58.680]  Uneasy I wait for non-melody.
[01:59.240 --> 02:02.760]  It's so wasteful, waiting for A.I.
[02:04.200 --> 02:05.300]  Endless night.
[02:06.520 --> 02:08.200]  No full night.
[02:08.960 --> 02:10.620]  My head against the stone.
[02:11.400 --> 02:13.320]  And pushing through.
[02:16.520 --> 02:18.760]  Ah, pushing through.
[02:20.700 --> 02:23.860]  Face down and pushing through.
[02:24.540 --> 02:26.580]  Trapped and isolated.
[02:29.460 --> 02:31.700]  Time is dislocated.
[02:33.620 --> 02:36.980]  Eternity is what a moment seems.
[02:37.340 --> 02:39.660]  When I can't feel anything.
[03:00.480 --> 03:02.280]  The light is gone.
[03:02.800 --> 03:03.600]  We've got a chase.
[03:04.060 --> 03:04.920]  Enough is lost.
[03:05.160 --> 03:06.320]  It's time to embrace.
[03:07.140 --> 03:08.900]  But we will answer, see.
[03:09.380 --> 03:10.980]  Cause I'll push you through.
[03:11.460 --> 03:12.780]  Night time.
[03:27.300 --> 03:28.980]  Where the fuck are we?
[03:29.080 --> 03:30.140]  We're hesitate.
[03:30.920 --> 03:32.680]  Cause I'll push you through.
[03:33.120 --> 03:36.040]  If time's the song I want.
[03:36.260 --> 03:38.820]  Wait for its reprise.
[03:39.200 --> 03:44.380]  I am done wishing farewells and goodbyes.
[03:44.380 --> 03:50.580]  I won't let this place overshadow me.
[03:51.400 --> 03:55.640]  But it's right, I'm awake with another ai-ag.
[03:59.900 --> 04:02.220]  Walk off the street and it's racist and freakin' out.
[04:02.440 --> 04:03.320]  Stay on the one you're with.
[04:03.760 --> 04:05.220]  I'm a fucking idiot.
[04:06.520 --> 04:06.620]  I'm a fucking idiot.
[04:06.840 --> 04:08.360]  I'm a fucking idiot.
[04:10.440 --> 04:14.460]  I'm a fucking idiot.
[04:16.380 --> 04:18.700]  I'm a fucking idiot.
[04:22.420 --> 04:23.080]  Initially.
[04:22.680 --> 04:25.120]  I'm going on a dirty flight.
[04:26.380 --> 04:28.600]  All the stone.
[04:29.240 --> 04:30.680]  Blood and sweat marry.
[04:32.380 --> 04:34.220]  You hear the voice that whispers fears.
[04:34.500 --> 04:36.380]  But my heart is pounding through my ears.
[04:37.240 --> 04:39.060]  All I see in my mind.
[04:39.760 --> 04:41.520]  All I've left behind.
[04:42.800 --> 04:45.620]  But all the things I've been missing.
[04:45.920 --> 04:47.920]  In that last day.
[04:48.700 --> 04:52.400]  All I see in my mind.
[04:52.780 --> 04:53.640]  In that last day.
[04:53.640 --> 04:55.380]  All I see in my mind.
[04:59.200 --> 05:01.300]  Endless night.
[05:02.020 --> 05:04.220]  Long food life.
[05:04.960 --> 05:06.660]  My head against this doom.
[05:07.380 --> 05:09.320]  Empty view.
[05:10.400 --> 05:11.680]  Drifting isolated.
[05:14.640 --> 05:17.000]  Time dislocated.
[05:19.420 --> 05:22.300]  Eternity is what a moment seems.
[05:22.660 --> 05:24.980]  When I am lost inside this dream.
[05:25.340 --> 05:27.600]  When I can't speak and I can't scream.
[05:28.000 --> 05:30.380]  When I can't feel anything.
[05:35.260 --> 05:46.440]  Face down and pushing through.
[05:48.700 --> 06:03.080]  When I am lost inside this dream.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing music, it seems to hallucinate audio during times with no words, but with noise. #1192

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Testing music, it seems to hallucinate audio during times with no words, but with noise. #1192

Uh oh!

sfxworks Apr 4, 2023

Replies: 1 comment

Uh oh!

brendan-jarvis Apr 4, 2023

sfxworks
Apr 4, 2023

brendan-jarvis
Apr 4, 2023