Replies: 2 comments 6 replies
-
Could you provide your dataset config yaml? (feel free to redact dataset name) Since you mentioned it was multi-node, can you ensure the path is the same and exists on each node? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Is the dataset_prepared_path you have configured in a shared directory between nodes? My hunch is this is a race condition across nodes. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
[rank2]: Traceback (most recent call last):
[rank2]: File "/miniconda3/envs/axolotl_202506/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank2]: return _run_code(code, main_globals, None,
[rank2]: File "/miniconda3/envs/axolotl_202506/lib/python3.10/runpy.py", line 86, in _run_code
[rank2]: exec(code, run_globals)
[rank2]: File "/src/axolotl/cli/train.py", line 131, in
[rank2]: fire.Fire(do_cli)
[rank2]: File "/miniconda3/envs/axolotl_202506/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
[rank2]: component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank2]: File "/miniconda3/envs/axolotl_202506/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
[rank2]: component, remaining_args = _CallAndUpdateTrace(
[rank2]: File "/miniconda3/envs/axolotl_202506/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[rank2]: component = fn(*varargs, **kwargs)
[rank2]: File "/src/axolotl/cli/train.py", line 105, in do_cli
[rank2]: return do_train(parsed_cfg, parsed_cli_args)
[rank2]: File "/src/axolotl/cli/train.py", line 59, in do_train
[rank2]: dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
[rank2]: File "/src/axolotl/common/datasets.py", line 60, in load_datasets
[rank2]: train_dataset, eval_dataset, total_num_steps, prompters = prepare_datasets(
[rank2]: File "/src/axolotl/utils/data/utils.py", line 50, in wrapper
[rank2]: return func(*args, **kwargs)
[rank2]: File "/src/axolotl/utils/data/sft.py", line 67, in prepare_datasets
[rank2]: return _prepare_standard_dataset(cfg, tokenizer, processor, preprocess_iterable)
[rank2]: File "/src/axolotl/utils/data/sft.py", line 105, in _prepare_standard_dataset
[rank2]: loader.cleanup()
[rank2]: File "/src/axolotl/utils/data/lock.py", line 57, in cleanup
[rank2]: count = int(self.counter_path.read_text().strip())
[rank2]: ValueError: invalid literal for int() with base 10: ''
Beta Was this translation helpful? Give feedback.
All reactions