Skip to content

Issue when run the training script. "ValueError: You can't train a model that has been loaded with device_map='auto' in any distributed mode. Please rerun your script specifying --num_processes=1 or by launching with python {{myscript.py}}."Β #32

@litsh

Description

@litsh

I am running the train.sh under an environment that installed all packages by

pip install -r requirements.txt

But it gives error like below:

Traceback (most recent call last):
  File "train_huatuo.py", line 265, in <module>
    train(args)
  File "train_huatuo.py", line 145, in train
    model, optimizer, train_dataloader,  lr_scheduler = accelerator.prepare(model, optimizer, train_dataloader, lr_scheduler)
  File "/fdudata/tsli/HuatuoGPT-II/huatuo2/lib/python3.8/site-packages/accelerate/accelerator.py", line 1250, in prepare
    raise ValueError(
ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`.

And I have changed the "--num_processes" flag to 1. But it still gives the same error. Is there any suggestion for solving this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions