You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Device: 4 x Raspberry Pi 5 8GB
Distributed Llama version: 0.16.0
Model: qwen3_30b_a3b_q40
Benchmark
Evaluation
Prediction
4 x Raspberry Pi 5 8GB
14.33 tok/s
13.04 tok/s
b4rtaz@raspberrypi2:~/distributed-llama $ ./dllama inference --prompt "<|im_start|>user
Please explain me where is Poland as I have 1 year<|im_end|>
<|im_start|>assistant
" --steps 128 --model models/qwen3_30b_a3b_q40/dllama_model_qwen3_30b_a3b_q40.m --tokenizer models/qwen3_30b_a3b_q40/dllama_tokenizer_qwen3_30b_a3b_q40.t --buffer-float-type q80 --nthreads 4 --max-seq-len 4096 --workers 10.0.0.1:9999 10.0.0.3:9999 10.0.0.4:9999
📄 AddBos: 0
📄 BosId: 151643 (<|endoftext|>)
📄 EosId: 151645 (<|im_end|>)
📄 RegularVocabSize: 151643
📄 SpecialVocabSize: 26
Tokenizer vocab size (151669) does not match the model vocab size (151936)
💡 Arch: Qwen3 MoE
💡 HiddenAct: Silu
💡 Dim: 2048
💡 HeadDim: 128
💡 QDim: 4096
💡 KvDim: 512
💡 HiddenDim: 6144
💡 VocabSize: 151936
💡 nLayers: 48
💡 nHeads: 32
💡 nKvHeads: 4
💡 OrigSeqLen: 262144
💡 nExperts: 128
💡 nActiveExperts: 8
💡 MoeHiddenDim: 768
💡 SeqLen: 4096
💡 NormEpsilon: 0.000001
💡 RopeType: Falcon
💡 RopeTheta: 10000000
📀 RequiredMemory: 5513 MB
⭕ Socket[0]: connecting to 10.0.0.1:9999 worker
⭕ Socket[0]: connected
⭕ Socket[1]: connecting to 10.0.0.3:9999 worker
⭕ Socket[1]: connected
⭕ Socket[2]: connecting to 10.0.0.4:9999 worker
⭕ Socket[2]: connected
⭕ Network is initialized
🧠 CPU: neon dotprod fp16
💿 Loading weights...
💿 Weights loaded
🚁 Network is in non-blocking mode
<|im_start|>user
Please explain me where is Poland as I have 1 year<|im_end|>
<|im_start|>assistant
🔷️ Eval 996 ms Sync 330 ms | Sent 12084 kB Recv 20085 kB | (19 tokens)
🔶 Pred 49 ms Sync 37 ms | Sent 636 kB Recv 1057 kB | Of
🔶 Pred 50 ms Sync 94 ms | Sent 636 kB Recv 1057 kB | course
🔶 Pred 60 ms Sync 37 ms | Sent 636 kB Recv 1057 kB | !
🔶 Pred 60 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | Let
🔶 Pred 59 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | me
🔶 Pred 49 ms Sync 27 ms | Sent 636 kB Recv 1057 kB | explain
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | where
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | Poland
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | is
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | ,
🔶 Pred 53 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | in
...
🔶 Pred 70 ms Sync 15 ms | Sent 636 kB Recv 1057 kB | zech
🔶 Pred 53 ms Sync 24 ms | Sent 636 kB Recv 1057 kB | Republic
🔶 Pred 69 ms Sync 14 ms | Sent 636 kB Recv 1057 kB | **
🔶 Pred 59 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | –
🔶 Pred 55 ms Sync 20 ms | Sent 636 kB Recv 1057 kB | to
🔶 Pred 64 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | the
🔶 Pred 53 ms Sync 36 ms | Sent 636 kB Recv 1057 kB | south
🔶 Pred 62 ms Sync 18 ms | Sent 636 kB Recv 1057 kB |
🔶 Pred 61 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | 3
Evaluation
nBatches: 32
nTokens: 19
tokens/s: 14.33 (69.80 ms/tok)
Prediction
nTokens: 109
tokens/s: 13.04 (76.69 ms/tok)
⭕ Network is closed
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
qwen3_30b.mov
Setup
Device:
4 x Raspberry Pi 5 8GB
Distributed Llama version:
0.16.0
Model:
qwen3_30b_a3b_q40
Benchmark
Beta Was this translation helpful? Give feedback.
All reactions