do whisper need preprocessing, enhancing, or denoising audio before transcribing? #2125
-
Hello. The problem is that I couldn't find a single method that enhances the quality of all of my files. every method works for some files, but reduce the speech quality on other files. I will be happy if you also know any method that wisely enhance speech quality. Thank you for your support. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 15 replies
-
Generally speaking , in speech recognition , de-noising rarely works. First ,because many methods work for some noises but not for others . Second , the de-noising modify the spectral representation of the audio . Dumping the noise , usually means that the certain frequency bands are distorted , and even when for human , the speech sounds much more clear , for the ASR , the spectral representation is less resemble the pattern of the expected audio. An exception for that, ,is when the ASR is trained on de-noised audio , so it becomes "familiar" with spectrum representations of de-noised audio. |
Beta Was this translation helpful? Give feedback.
@itaipee
Rather than ordinary de-noiser, you can use neural voice extraction tools, like Facebook Demucs.
It works very well !
See several processing tools combinations (Demucs, Silence/Noise removal, voice compression, etc) with this experimental code:
https://github.com/EtienneAb3d/WhisperHallu