Skip to content

[BUG] Ray Demo Unused Keys #10251

@SnowCharmQ

Description

@SnowCharmQ

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.5.dev0
  • Platform: Linux-5.10.134-19.100.al8.x86_64-x86_64-with-glibc2.39
  • Python version: 3.12.12
  • PyTorch version: 2.10.0+cu128 (GPU)
  • Transformers version: 5.2.0
  • Datasets version: 4.0.0
  • Accelerate version: 1.11.0
  • PEFT version: 0.18.1
  • GPU type: NVIDIA L20Y
  • GPU number: 8
  • GPU memory: 79.19GB
  • TRL version: 0.24.0
  • DeepSpeed version: 0.18.4
  • Git commit: fc5b85c
  • Default data directory: detected

Reproduction

When I test the Ray implementation for distributed training, I encounter the following issues:

Traceback (most recent call last):
  File "/root/miniconda3/envs/demo/bin/llamafactory-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/cli.py", line 24, in main
    launcher.launch()
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/launcher.py", line 157, in launch
    run_exp()
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/train/tuner.py", line 123, in run_exp
    _ray_training_function(ray_args, config={"args": args, "callbacks": callbacks})
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/train/tuner.py", line 314, in _ray_training_function
    ray.get([worker._training_function.remote(config=config) for worker in workers])
  File "/root/miniconda3/envs/demo/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/demo/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/demo/lib/python3.12/site-packages/ray/_private/worker.py", line 2981, in get
    values, debugger_breakpoint = worker.get_objects(
                                  ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/demo/lib/python3.12/site-packages/ray/_private/worker.py", line 1012, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::Worker._training_function() (pid=382911, ip=10.144.205.242, actor_id=feb329387fe251e2405dbe9904000000, repr=<llamafactory.train.tuner.Worker object at 0x7f6f850480e0>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/train/tuner.py", line 248, in _training_function
    _training_function(config)
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/train/tuner.py", line 60, in _training_function
    model_args, data_args, training_args, finetuning_args, generating_args = get_train_args(args)
                                                                             ^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/hparams/parser.py", line 290, in get_train_args
    model_args, data_args, training_args, finetuning_args, generating_args = _parse_train_args(args)
                                                                             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/hparams/parser.py", line 244, in _parse_train_args
    return _parse_args(parser, args, allow_extra_keys=allow_extra_keys)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ali-sh-1/dataset/zeus/ylqiu/codes/demo_exp/LlamaFactory/src/llamafactory/hparams/parser.py", line 91, in _parse_args
    return parser.parse_dict(args, allow_extra_keys=allow_extra_keys)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/demo/lib/python3.12/site-packages/transformers/hf_argparser.py", line 383, in parse_dict
    raise ValueError(f"Some keys are not used by the HfArgumentParser: {sorted(unused_keys)}")
ValueError: Some keys are not used by the HfArgumentParser: ['placement_strategy', 'ray_run_name', 'ray_storage_path', 'resources_per_worker']

It seems these keys are unused by transformers lib, and simply removing these keys can fix this problem.

The following script is used for reproducing the issue:

export USE_RAY=1
llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions