Error on running the example NLG evaluation: TypeError: argument 'ids': 'list' object cannot be interpreted as an integer

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

- `llamafactory` version: 0.9.5.dev0
- Platform: Linux-4.18.0-553.58.1.el8_10.x86_64-x86_64-with-glibc2.28
- Python version: 3.11.14
- PyTorch version: 2.8.0+cu128
- Transformers version: 5.0.0
- Datasets version: 4.0.0
- Accelerate version: 1.11.0
- PEFT version: 0.18.1
- TRL version: 0.24.0
- DeepSpeed version: 0.18.4
- vLLM version: 0.11.0
- Git commit: 142d815018c9c4df881ce28b543dad54939f86b2
- Default data directory: detected

### Reproduction

I ran into an error when I tried to run NLG evaluation using the provided example configuration file (examples\extras\nlg_eval\llama3_lora_predict.yaml), but with a different model (Qwen3-VL-4B). I would greatly appreciate the help with this.

- The following file is the configuration .yaml file used. It has the same contents as examples\extras\nlg_eval\llama3_lora_predict.yaml, but using Qwen3-VL-4B and its template as can be seen in the screenshot below. The Qwen3-VL-4B model has been previously fine-tuned via SFT using LoRA, so the LoRA adapters exist in the specified directory.
```
# The batch generation can be SLOW using this config.
# For faster inference, we recommend to use `scripts/vllm_infer.py`.

### model
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
adapter_name_or_path: saves/qwen3-vl-4b/lora/sft
trust_remote_code: true

### method
stage: sft
do_predict: true
finetuning_type: lora

### dataset
eval_dataset: identity,alpaca_en_demo
template: qwen3_vl_nothink
cutoff_len: 2048
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/qwen3-vl-4b/lora/predict
overwrite_output_dir: true
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### eval
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000

```
Comparison with examples\extras\nlg_eval\llama3_lora_predict.yaml:
<img width="2342" height="1239" alt="Image" src="https://github.com/user-attachments/assets/c51bee48-3a7f-4590-8e1d-f8b05599def2" />

- The command is run as a batch script submitted to a Slurm manager. The following is the script: [qwen3vl_4b_eval_example.sh](https://github.com/user-attachments/files/25712315/qwen3vl_4b_eval_example.sh)

### Error Message

- The error message is pasted below.
```
Traceback (most recent call last):
  File "/home/josiahso/.conda/envs/outfit-grading/bin/llamafactory-cli", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/josiahso/LLaMA-Factory/src/llamafactory/cli.py", line 24, in main
    launcher.launch()
  File "/home/josiahso/LLaMA-Factory/src/llamafactory/launcher.py", line 157, in launch
    run_exp()
  File "/home/josiahso/LLaMA-Factory/src/llamafactory/train/tuner.py", line 125, in run_exp
    _training_function(config={"args": args, "callbacks": callbacks})
  File "/home/josiahso/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in _training_function
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/home/josiahso/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 175, in run_sft
    trainer.save_predictions(dataset_module["eval_dataset"], predict_results, generating_args.skip_special_tokens)
  File "/home/josiahso/LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 218, in save_predictions
    decoded_inputs = self.processing_class.batch_decode(dataset["input_ids"], skip_special_tokens=False)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/josiahso/.conda/envs/outfit-grading/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2957, in batch_decode
    result = self.decode(
             ^^^^^^^^^^^^
  File "/home/josiahso/.conda/envs/outfit-grading/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2923, in decode
    return self._decode(
           ^^^^^^^^^^^^^
  File "/home/josiahso/.conda/envs/outfit-grading/lib/python3.11/site-packages/transformers/tokenization_utils_tokenizers.py", line 929, in _decode
    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer
```

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on running the example NLG evaluation: TypeError: argument 'ids': 'list' object cannot be interpreted as an integer #10239

Reminder

System Info

Reproduction

Error Message

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error on running the example NLG evaluation: TypeError: argument 'ids': 'list' object cannot be interpreted as an integer #10239

Description

Reminder

System Info

Reproduction

Error Message

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions