Replies: 1 comment 4 replies
-
Hmm, thanks for lifting up a good point. I'm not sure as well how to best handle this. Another alternative is to temporarily set the tokenizer padding. Does lm_eval throw any more warnings when you set that collator? If anyone coming across knows how to deal with this, do let us know! |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When I set
do_causal_lm_eval: true
andeval_causal_lm_metrics: ['chrf']
andeval_sample_packing: false
in the config file, I kept get warnings about padding.warning:A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
Normally we train with right padding and inference with left padding, so I wonder if this is an issue.
I also try to fix this by using a new data collator called "LeftCollator" that explicitly set padding to left when processing the evaluation dataset.
Specifically, in
/src/axolotl/core/trainer_builder.py
, I set the collator to the "LeftCollator" whenis_eval
is true. I wonder if this is the correct practice.Beta Was this translation helpful? Give feedback.
All reactions