You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I'm trying to improve my fine-tuning process.
unlike standard corpora , like WSJ , audio books , podcasts , my audio data is quite "dirty" , contains lot of noise , cross talks and other fun things that ruin the audio quality.
Anyway , using fine-tuning , i was able to improve Whisper accuracy on my audio , but when i tried to gain further improvement , with standard audio augmentation , it did not help. makes a bit sense , if the audio is already noisy , i don't need to make it more noisy.
I want to try Spec-Augmentation , so I tried to add this lines ( only reference i found) :
model = WhisperForConditionalGeneration.from_pretrained(model_path)
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
# these 3 lines suppose to add the spec-augmentation
model.config.apply_spec_augment = True
model.config.mask_time_prob = 0.05
model.config.mask_feature_prob = 0.05
It did not improve the accuracy , and also the changes , in WER and loss, were almost identical before and after.
so my question:
1: Am i using correctly the spec-aug code?
2: how can I prob its effects
3: i think i read that spec-aug' works best with lots of steps and epoch. In fine-tuning, by nature , i have fewer steps . I'm doing 2 epoch on 200 hours of audio. so far , in all my test , I never saw gain in using more that 1-2 epochs. maybe in spec-aug i do need to go wild and use 10 epoch ?
P.S
With standard augmentation ( see below) i did not gain improvement but i did see a changes in the accuracy , in some accuracy ,WER was dropped by 0.3% , in other it was increased by 0.5% and so on ( related to the fine-tuning without augmentation) .
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello. I'm trying to improve my fine-tuning process.
unlike standard corpora , like WSJ , audio books , podcasts , my audio data is quite "dirty" , contains lot of noise , cross talks and other fun things that ruin the audio quality.
Anyway , using fine-tuning , i was able to improve Whisper accuracy on my audio , but when i tried to gain further improvement , with standard audio augmentation , it did not help. makes a bit sense , if the audio is already noisy , i don't need to make it more noisy.
I want to try Spec-Augmentation , so I tried to add this lines ( only reference i found) :
It did not improve the accuracy , and also the changes , in WER and loss, were almost identical before and after.
so my question:
1: Am i using correctly the spec-aug code?
2: how can I prob its effects
3: i think i read that spec-aug' works best with lots of steps and epoch. In fine-tuning, by nature , i have fewer steps . I'm doing 2 epoch on 200 hours of audio. so far , in all my test , I never saw gain in using more that 1-2 epochs. maybe in spec-aug i do need to go wild and use 10 epoch ?
P.S
With standard augmentation ( see below) i did not gain improvement but i did see a changes in the accuracy , in some accuracy ,WER was dropped by 0.3% , in other it was increased by 0.5% and so on ( related to the fine-tuning without augmentation) .
Beta Was this translation helpful? Give feedback.
All reactions