Skip to content

Commit f28666f

Browse files
committed
Add speculative decoding example
1 parent 2bd590c commit f28666f

File tree

2 files changed

+18
-1
lines changed

2 files changed

+18
-1
lines changed

examples/speculative_decoding.ps1

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
./vendor/llama.cpp/build/bin/Release/llama-server `
2+
--model './vendor/llama.cpp/models/Qwen2.5-Coder-32B-Instruct.IQ3_XXS.gguf' `
3+
--alias 'Qwen2.5-Coder-32B-Instruct' `
4+
--ctx-size 16384 `
5+
--threads 16 `
6+
--n-gpu-layers 99 `
7+
--cache-type-k 'q4_0' `
8+
--cache-type-v 'q4_0' `
9+
--flash-attn `
10+
--top-k 1 `
11+
--temp 0.1 `
12+
--model-draft './vendor/llama.cpp/models/Qwen2.5-Coder-0.5B-Instruct.IQ4_XS.gguf' `
13+
--ctx-size-draft 16384 `
14+
--n-gpu-layers-draft 99 `
15+
--draft-p-min 0.5 `
16+
--draft-min 3 `
17+
--draft-max 16

vendor/llama.cpp

0 commit comments

Comments
 (0)