Skip to content

vqa数据lora微调Qwen3.5-27B, 之前整理的vqa数据微调qwen3vl的,怎么微调3.5报错了,怎么改 #10235

@zkailinzhang

Description

@zkailinzhang

Reminder

  • I have read the above rules and searched the existing issues.

System Info

最新的,transforerm也是新的

Reproduction

Running tokenizer on dataset (num_proc=16):   0%|          | 0/2392 [00:03<?, ? examples/s]
Running tokenizer on dataset (num_proc=16):   0%|          | 0/2392 [00:03<?, ? examples/s]
Running tokenizer on dataset (num_proc=16):   0%|          | 0/2392 [00:03<?, ? examples/s]
Running tokenizer on dataset (num_proc=16):   0%|          | 0/2392 [00:04<?, ? examples/s]
[rank0]: multiprocess.pool.RemoteTraceback: 
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 586, in _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3759, in _map_single
[rank0]:     for i, batch in iter_outputs(shard_iterable):
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3712, in iter_outputs
[rank0]:     yield i, apply_function(example, i, offset=offset)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3635, in apply_function
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
[rank0]:     input_ids, labels = self._encode_data_example(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
[rank0]:     messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/data/mm_plugin.py", line 1666, in process_messages
[rank0]:     self._validate_input(processor, images, videos, audios)
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]:     raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/launcher.py", line 185, in <module>
[rank0]:     run_exp()
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/train/tuner.py", line 125, in run_exp
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/train/tuner.py", line 93, in _training_function
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft
[rank0]:     dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/data/loader.py", line 318, in get_dataset
[rank0]:     train_dict["train"] = _get_preprocessed_dataset(
[rank0]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/appdata//LlamaFactory/src/llamafactory/data/loader.py", line 255, in _get_preprocessed_dataset
[rank0]:     dataset = dataset.map(
[rank0]:               ^^^^^^^^^^^^
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 562, in wrapper
[rank0]:     out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]:                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3397, in map
[rank0]:     for rank, done, content in iflatmap_unordered(
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 626, in iflatmap_unordered
[rank0]:     [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/multiprocess/pool.py", line 774, in get
[rank0]:     raise self._value
[rank0]: ValueError: Processor was not found, please check and update your model file.

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions