How to use LADE in single-node multi-process way?

I've tried to load LADE distributively with 
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 \
USE_LADE=1 LOAD_LADE=1 DIST_WORKERS=4 \
python -m torch.distributed.launch minimal.py
```
However, when I try to monitor GPU usage with `watch nvidia-smi`, I've found that only gpu:0 was used. I want to use Llama-2-70b-hf and it can't be loaded in only one GPU. What can I do to use all the GPUs? Is there any problem in my launch command?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use LADE in single-node multi-process way? #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use LADE in single-node multi-process way? #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions