Skip to content

Error: cannot find tensor model.layers.0.model.layers.0.self_attn.q_proj.weight #34

@VincentXWD

Description

@VincentXWD

Hello developers,
I was using cake to deploy distributed LLAMA3 8B Instruct model to 2 GPUs and I got the error below:

CUDA_VISIABLE_DEVICES=0 ./target/release/cake-cli --model ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/          --mode worker          --name worker0          --topology ./schedule/topology.yml          --address 0.0.0.0:17490
[2024-11-12T12:25:57Z INFO ] [Worker] dtype=F16 device=Cuda(CudaDevice(DeviceId(1))) mem=196.5 MiB
[2024-11-12T12:25:57Z INFO ] loading topology from ./schedule/topology.yml
[2024-11-12T12:25:57Z INFO ] loading configuration from /home/wdxu/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/config.json
[2024-11-12T12:25:57Z INFO ] loading tensors from /home/wdxu/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/model.safetensors.index.json ...
[2024-11-12T12:25:57Z INFO ] loading model.layers.0 ...
Error: cannot find tensor model.layers.0.model.layers.0.self_attn.q_proj.weight

Here's the topology.yml

worker0:
  host: 'localhost:17490'
  description: 'NVIDIA 3060'
  layers:
    - 'model.layers.0-15'
worker1:
  host: 'localhost:17590'
  description: 'NVIDIA 3060'
  layers:
    - 'model.layers.16-31'

I use these commands to start the service.

CUDA_VISIABLE_DEVICES=0 ./target/release/cake-cli --model ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/ \
         --mode worker \
         --name worker0 \
         --topology ./schedule/topology.yml \
         --address 0.0.0.0:17490



CUDA_VISIABLE_DEVICES=1 ./target/release/cake-cli --model ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a/ \
         --mode worker \
         --name worker1 \
         --topology ./schedule/topology.yml \
         --address 0.0.0.0:17590

Please let me know if I did wrong steps. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions