-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Open
Labels
bugSomething isn't workingSomething isn't workingpendingThis problem is yet to be addressedThis problem is yet to be addressed
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
[Bug] TypeError when saving predictions with HuggingFace Dataset
Description
When running evaluation with a local dataset, saving predictions fails with:
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer
Line 179 in ‘LLaMA-Factory/src/llamafactory/train/sft/trainer.py`:
decoded_inputs = self.processing_class.batch_decode(dataset["input_ids"], skip_special_tokens=False)Root Cause
It seems that with local datasets, LLaMA-Factory converts them to HuggingFace Dataset format internally. So dataset["input_ids"] returns a Column object instead of a Python list, and batch_decode() expects a list.
Solution
Convert Column to list:
input_ids_list = dataset["input_ids"].to_pylist()
decoded_inputs = self.processing_class.batch_decode(input_ids_list, skip_special_tokens=False)Question
Is this a bug? If it's a bug, I'd be happy to submit a fix.
Thanks!
Reproduction
Reproduce
- Use a local JSON dataset
- Run evaluation with
"stage: sft
do_predict: true
finetuning_type: lora
adapter_name_or_path:...
" - Error occurs when saving predictions
Others
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingpendingThis problem is yet to be addressedThis problem is yet to be addressed