-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Open
Labels
bugSomething isn't workingSomething isn't workingpendingThis problem is yet to be addressedThis problem is yet to be addressed
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
最新的,transforerm也是新的
Reproduction
Running tokenizer on dataset (num_proc=16): 0%| | 0/2392 [00:03<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/2392 [00:03<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/2392 [00:03<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/2392 [00:04<?, ? examples/s]
[rank0]: multiprocess.pool.RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]: result = (True, func(*args, **kwds))
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 586, in _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3759, in _map_single
[rank0]: for i, batch in iter_outputs(shard_iterable):
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3712, in iter_outputs
[rank0]: yield i, apply_function(example, i, offset=offset)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3635, in apply_function
[rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
[rank0]: input_ids, labels = self._encode_data_example(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
[rank0]: messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/data/mm_plugin.py", line 1666, in process_messages
[rank0]: self._validate_input(processor, images, videos, audios)
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/data/mm_plugin.py", line 181, in _validate_input
[rank0]: raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/launcher.py", line 185, in <module>
[rank0]: run_exp()
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/train/tuner.py", line 125, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/train/tuner.py", line 93, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft
[rank0]: dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/data/loader.py", line 318, in get_dataset
[rank0]: train_dict["train"] = _get_preprocessed_dataset(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/appdata//LlamaFactory/src/llamafactory/data/loader.py", line 255, in _get_preprocessed_dataset
[rank0]: dataset = dataset.map(
[rank0]: ^^^^^^^^^^^^
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 562, in wrapper
[rank0]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3397, in map
[rank0]: for rank, done, content in iflatmap_unordered(
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 626, in iflatmap_unordered
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home//.conda/envs/qwen35lf/lib/python3.12/site-packages/multiprocess/pool.py", line 774, in get
[rank0]: raise self._value
[rank0]: ValueError: Processor was not found, please check and update your model file.
Others
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingpendingThis problem is yet to be addressedThis problem is yet to be addressed