Skip to content

TypeError when saving predictions with HuggingFace Dataset #10250

@pyxnpyx

Description

@pyxnpyx

Reminder

  • I have read the above rules and searched the existing issues.

System Info

[Bug] TypeError when saving predictions with HuggingFace Dataset

Description

When running evaluation with a local dataset, saving predictions fails with:

TypeError: argument 'ids': 'list' object cannot be interpreted as an integer

Line 179 in ‘LLaMA-Factory/src/llamafactory/train/sft/trainer.py`:

decoded_inputs = self.processing_class.batch_decode(dataset["input_ids"], skip_special_tokens=False)

Root Cause

It seems that with local datasets, LLaMA-Factory converts them to HuggingFace Dataset format internally. So dataset["input_ids"] returns a Column object instead of a Python list, and batch_decode() expects a list.

Solution

Convert Column to list:

input_ids_list = dataset["input_ids"].to_pylist()
decoded_inputs = self.processing_class.batch_decode(input_ids_list, skip_special_tokens=False)

Question

Is this a bug? If it's a bug, I'd be happy to submit a fix.
Thanks!

Reproduction

Reproduce

  1. Use a local JSON dataset
  2. Run evaluation with
    "stage: sft
    do_predict: true
    finetuning_type: lora
    adapter_name_or_path:...
    "
  3. Error occurs when saving predictions

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions