do whisper need preprocessing, enhancing, or denoising audio before transcribing? #2125

FLabbaf97 · 2024-04-11T17:03:28Z

FLabbaf97
Apr 11, 2024

Hello.
I want to transcribe short voices with Whisper large v3. It works fine for some of the files, But in some cases, the transcription is not very accurate. I realized my audio files have a high noise level. (average -30 db). some of this noise is from the background, some are from the microphone. Do you think enhancing audio quality and denoising will help? does whisper basicaly need denoising?

The problem is that I couldn't find a single method that enhances the quality of all of my files. every method works for some files, but reduce the speech quality on other files. I will be happy if you also know any method that wisely enhance speech quality.

Thank you for your support.

Answered by EtienneAb3d

Apr 15, 2024

@itaipee
Rather than ordinary de-noiser, you can use neural voice extraction tools, like Facebook Demucs.
It works very well !
See several processing tools combinations (Demucs, Silence/Noise removal, voice compression, etc) with this experimental code:
https://github.com/EtienneAb3d/WhisperHallu

View full answer

itaipee · 2024-04-14T08:11:35Z

itaipee
Apr 14, 2024

Generally speaking , in speech recognition , de-noising rarely works. First ,because many methods work for some noises but not for others . Second , the de-noising modify the spectral representation of the audio . Dumping the noise , usually means that the certain frequency bands are distorted , and even when for human , the speech sounds much more clear , for the ASR , the spectral representation is less resemble the pattern of the expected audio.

An exception for that, ,is when the ASR is trained on de-noised audio , so it becomes "familiar" with spectrum representations of de-noised audio.
It can be considered as some sort of augmentation.
However, I doubted Whisper was trained that way.

15 replies

EtienneAb3d Apr 15, 2024

@itaipee
Rather than ordinary de-noiser, you can use neural voice extraction tools, like Facebook Demucs.
It works very well !
See several processing tools combinations (Demucs, Silence/Noise removal, voice compression, etc) with this experimental code:
https://github.com/EtienneAb3d/WhisperHallu

Answer selected by FLabbaf97

itaipee Apr 16, 2024

thanks @EtienneAb3d I will certainty look Whisper-Hallu it looks interesting , and i wonder if it works well on actual audio.

P.S. isn't Demucs used for music ? does it work well for "cleaning" speech , without any distortion for the speech recognition system?

gongouveia Apr 16, 2024

@itaipee Yes, demucs is originally for music separation, however in this version it was manipulated for voice enhancement and background noise removal (https://github.com/facebookresearch/denoiser ).
The version i Cited is based on demucsvv2 using and LSTM in and the version above is based on DemucsV4 using a transformer.

EtienneAb3d Apr 16, 2024

@itaipee

P.S. isn't Demucs used for music ? does it work well for "cleaning" speech , without any distortion for the speech recognition system?

Demucs is splitting sound in 4 stems: Voice, Bass, Drums, .. and Others. The last one is very efficiently filled with all being something else than what is properly in the first 3 parts. In my tests, it works very well for the de-noise purpose.

FLabbaf97 Apr 22, 2024
Author

Thank you. Demucs worked very well for denoising my audio in most cases.
However, in some cases, using Demucs diminishes the speech quality. Do you have any suggestion on how to approach this problem? I want to automate everything so I don't have to check the audio quality manually.
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do whisper need preprocessing, enhancing, or denoising audio before transcribing? #2125

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 15 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

do whisper need preprocessing, enhancing, or denoising audio before transcribing? #2125

Uh oh!

FLabbaf97 Apr 11, 2024

Replies: 1 comment · 15 replies

Uh oh!

itaipee Apr 14, 2024

Uh oh!

EtienneAb3d Apr 15, 2024

Uh oh!

itaipee Apr 16, 2024

Uh oh!

gongouveia Apr 16, 2024

Uh oh!

EtienneAb3d Apr 16, 2024

Uh oh!

FLabbaf97 Apr 22, 2024 Author

FLabbaf97
Apr 11, 2024

Replies: 1 comment 15 replies

itaipee
Apr 14, 2024

FLabbaf97 Apr 22, 2024
Author