Replies: 1 comment
-
I think if you use tokenizer with predict_timestamps=True during training, it should be like this: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been experimenting with the helpful tutorial on tuning Whisper with Peft:
peft/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
I have come across something suspect: when I use the evaluation code in the notebook to evaluate my tuned model, I get much better results than from creating a pipeline. I get a WER of .34 for the former, .47 for the latter.
I have tried all sorts of things but I have narrowed it down to the following argument when calling model.generate()
decoder_input_ids=batch["labels"][:, :4].to("cuda")
I believe the first 3 elements of the labels are the usual special IDs specifying task/language etc. These are the same for all inputs. However the 4th element differs for each input, and I believe it is actually the first token in the ground-truth transcript. Therefore the model is cheating by getting a prompt.
@pacman100 Does this sound right? I am not sure if I have interpreted the input IDs correctly. If not, what is the reason for keeping the first 4 tokens from the labels during inference?
Thanks in advance for any responses!
Beta Was this translation helpful? Give feedback.
All reactions