Skip to content

Commit 5c71cce

Browse files
author
wangzaijun
committed
fix
1 parent 82f35db commit 5c71cce

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

test/start_scripts/draft.sh

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# 使能 cpu cache 功能,扩大kv cache 复用的可能性。
2+
LOADWORKER=18 python -m lightllm.server.api_server \
3+
--model_dir /mtc/models/qwen3-8b --tp 2 --dp 1 --enable_cpu_cache --cpu_cache_storage_size 66 --cpu_cache_token_page_size 128 \
4+
--batch_max_tokens 4096 --chunked_prefill_size 2048 \
5+
--max_total_token_num 20000 \
6+
--mode "ppl_int8kv_flashdecoding" | tee log.txt
7+
8+
9+
# 精度评测命令
10+
HF_ALLOW_CODE_EVAL=1 HF_DATASETS_OFFLINE=0 lm_eval --model local-completions \
11+
--model_args '{"model":"Qwen/Qwen3-8B", "base_url":"http://localhost:8000/v1/completions", "max_length": 16384}' --tasks gsm8k --batch_size 500 --confirm_run_unsafe_code
12+
13+
14+
15+
# H200 single node deepseek R1 tp mode
16+
LOADWORKER=18 python -m lightllm.server.api_server \
17+
--model_dir /mtc/DeepSeek-R1 \
18+
--tp 8 \
19+
--enable_fa3 \
20+
--batch_max_tokens 4096 --chunked_prefill_size 2048 \
21+
--max_total_token_num 20000 \
22+
--enable_cpu_cache --cpu_cache_storage_size 66 --cpu_cache_token_page_size 128
23+
24+
# if you want to enable microbatch overlap, you can uncomment the following lines
25+
#--enable_prefill_microbatch_overlap \
26+
#--enable_decode_microbatch_overlap \
27+
# 精度测试。
28+
HF_ALLOW_CODE_EVAL=1 HF_DATASETS_OFFLINE=0 lm_eval --model local-completions --model_args '{"model":"deepseek-ai/DeepSeek-R1", "base_url":"http://localhost:8000/v1/completions", "max_length": 16384}' --tasks gsm8k --batch_size 500 --confirm_run_unsafe_code

0 commit comments

Comments
 (0)