-
-
Notifications
You must be signed in to change notification settings - Fork 174
Open
Description
Hello developers,
I was using cake to deploy distributed LLAMA3 8B Instruct model to 2 GPUs and I got the error below:
CUDA_VISIABLE_DEVICES=0 ./target/release/cake-cli --model ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/ --mode worker --name worker0 --topology ./schedule/topology.yml --address 0.0.0.0:17490
[2024-11-12T12:25:57Z INFO ] [Worker] dtype=F16 device=Cuda(CudaDevice(DeviceId(1))) mem=196.5 MiB
[2024-11-12T12:25:57Z INFO ] loading topology from ./schedule/topology.yml
[2024-11-12T12:25:57Z INFO ] loading configuration from /home/wdxu/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/config.json
[2024-11-12T12:25:57Z INFO ] loading tensors from /home/wdxu/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/model.safetensors.index.json ...
[2024-11-12T12:25:57Z INFO ] loading model.layers.0 ...
Error: cannot find tensor model.layers.0.model.layers.0.self_attn.q_proj.weight
Here's the topology.yml
worker0:
host: 'localhost:17490'
description: 'NVIDIA 3060'
layers:
- 'model.layers.0-15'
worker1:
host: 'localhost:17590'
description: 'NVIDIA 3060'
layers:
- 'model.layers.16-31'
I use these commands to start the service.
CUDA_VISIABLE_DEVICES=0 ./target/release/cake-cli --model ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/ \
--mode worker \
--name worker0 \
--topology ./schedule/topology.yml \
--address 0.0.0.0:17490
CUDA_VISIABLE_DEVICES=1 ./target/release/cake-cli --model ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/ \
--mode worker \
--name worker1 \
--topology ./schedule/topology.yml \
--address 0.0.0.0:17590
Please let me know if I did wrong steps. Thanks!
Metadata
Metadata
Assignees
Labels
No labels