Skip to content

Commit 2c34a55

Browse files
plusNew001root
andauthored
[XPU]change xpu ci model (#4117)
* change xpu ci model * change xpu ci model * change xpu ci model * change xpu ci model * Update model path and XPU settings in run_ci_xpu.sh * Increase health check timeout to 10 minutes Increased the timeout duration for health checks from 5 minutes to 10 minutes in two places. * Implement test for OpenAI chat completion Add a test function for the OpenAI client chat response. * Change script to use pytest for running tests * Update health check timeout to 15 minutes Increase the timeout for health checks from 10 minutes to 15 minutes. * Add pytest installation to CI script * Modify base response in test_45t function Updated the base response message for the test. * Add V0 and V1 mode test echo statements --------- Co-authored-by: root <[email protected]>
1 parent 83720da commit 2c34a55

File tree

2 files changed

+44
-28
lines changed

2 files changed

+44
-28
lines changed

scripts/run_ci_xpu.sh

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,31 +4,34 @@ echo "$DIR"
44

55
#安装lsof工具
66
apt install -y lsof
7+
78
#先kill一遍
89
ps -efww | grep -E 'api_server' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
910
ps -efww | grep -E '8188' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
1011
lsof -t -i :8188 | xargs kill -9 || true
11-
12-
export model_path=${MODEL_PATH}/data/eb45t_4_layer
12+
#设置模型路径
13+
export model_path=${MODEL_PATH}/ERNIE-4.5-300B-A47B-Paddle
1314

1415
echo "pip requirements"
1516
python -m pip install -r requirements.txt
17+
1618
echo "uninstall org"
1719
python -m pip uninstall paddlepaddle-xpu -y
1820
python -m pip uninstall fastdeploy-xpu -y
19-
# 由于主框架更新存在问题,暂时锁死版本
21+
2022
python -m pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/
21-
# python -m pip install https://paddle-whl.bj.bcebos.com/nightly/xpu-p800/paddlepaddle-xpu/paddlepaddle_xpu-3.0.0.dev20250901-cp310-cp310-linux_x86_64.whl
23+
2224
echo "build whl"
2325
bash custom_ops/xpu_ops/download_dependencies.sh develop
2426
export CLANG_PATH=$(pwd)/custom_ops/xpu_ops/third_party/xtdk
2527
export XVLLM_PATH=$(pwd)/custom_ops/xpu_ops/third_party/xvllm
2628
bash build.sh || exit 1
29+
2730
echo "pip others"
2831
python -m pip install openai -U
2932
python -m pip uninstall -y triton
3033
python -m pip install triton==3.3.0
31-
34+
python -m pip install pytest
3235
unset http_proxy
3336
unset https_proxy
3437
unset no_proxy
@@ -39,19 +42,23 @@ rm -f core*
3942
# pkill -9 python #流水线不执行这个
4043
#清空消息队列
4144
ipcrm --all=msg
42-
export XPU_VISIBLE_DEVICES="0,1,2,3"
45+
46+
echo "============================开始V0模式测试!============================"
47+
export ENABLE_V1_KVCACHE_SCHEDULER=1
48+
export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
49+
4350
python -m fastdeploy.entrypoints.openai.api_server \
4451
--model ${model_path} \
4552
--port 8188 \
46-
--tensor-parallel-size 4 \
53+
--tensor-parallel-size 8 \
4754
--num-gpu-blocks-override 16384 \
4855
--max-model-len 32768 \
4956
--max-num-seqs 128 \
5057
--quantization wint4 > server.log 2>&1 &
5158

5259
sleep 60
5360
# 探活
54-
TIMEOUT=$((5 * 60))
61+
TIMEOUT=$((15 * 60))
5562
INTERVAL=10 # 检查间隔(秒)
5663
ENDPOINT="http://0.0.0.0:8188/health"
5764
START_TIME=$(date +%s) # 记录开始时间戳
@@ -85,7 +92,7 @@ done
8592
cat server.log
8693

8794
# 执行服务化推理
88-
python tests/ci_use/XPU_45T/run_45T.py
95+
python -m pytest tests/ci_use/XPU_45T/run_45T.py
8996
exit_code=$?
9097
echo exit_code is ${exit_code}
9198

@@ -109,20 +116,21 @@ rm -f core*
109116
# pkill -9 python #流水线不执行这个
110117
#清空消息队列
111118
ipcrm --all=msg
119+
echo "============================开始V1模式测试!============================"
112120
export ENABLE_V1_KVCACHE_SCHEDULER=1
113-
export XPU_VISIBLE_DEVICES="0,1,2,3"
121+
export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
114122
python -m fastdeploy.entrypoints.openai.api_server \
115123
--model ${model_path} \
116124
--port 8188 \
117-
--tensor-parallel-size 4 \
125+
--tensor-parallel-size 8 \
118126
--num-gpu-blocks-override 16384 \
119127
--max-model-len 32768 \
120128
--max-num-seqs 128 \
121129
--quantization wint4 > server.log 2>&1 &
122130

123131
sleep 60
124132
# 探活
125-
TIMEOUT=$((5 * 60))
133+
TIMEOUT=$((15 * 60))
126134
INTERVAL=10 # 检查间隔(秒)
127135
ENDPOINT="http://0.0.0.0:8188/health"
128136
START_TIME=$(date +%s) # 记录开始时间戳
@@ -153,7 +161,7 @@ done
153161
cat server.log
154162

155163
# 执行服务化推理
156-
python tests/ci_use/XPU_45T/run_45T.py
164+
python -m pytest tests/ci_use/XPU_45T/run_45T.py
157165
kv_block_test_exit_code=$?
158166
echo kv_block_test_exit_code is ${kv_block_test_exit_code}
159167

tests/ci_use/XPU_45T/run_45T.py

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,27 @@
1414

1515
import openai
1616

17-
ip = "0.0.0.0"
18-
service_http_port = "8188" # 服务配置的
19-
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
2017

21-
# 非流式对话
22-
response = client.chat.completions.create(
23-
model="default",
24-
messages=[
25-
{"role": "user", "content": "你好,你是谁?"},
26-
],
27-
temperature=1,
28-
top_p=0,
29-
max_tokens=64,
30-
stream=False,
31-
)
32-
print(response)
18+
def test_45t():
19+
ip = "0.0.0.0"
20+
service_http_port = "8188" # 服务配置的
21+
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
22+
base_response = "你好!我是一个基于人工智能技术构建的助手,可以帮你解答问题、提供建议、辅助创作,或者陪你聊天解闷~😊 无论是学习、工作还是生活中的疑问,都可以随时告诉我哦!你今天有什么想聊的吗?"
23+
# 非流式对话
24+
response = client.chat.completions.create(
25+
model="default",
26+
messages=[
27+
{"role": "user", "content": "你好,你是谁?"},
28+
],
29+
temperature=1,
30+
top_p=0,
31+
max_tokens=64,
32+
stream=False,
33+
)
34+
print(response.choices[0].message.content)
35+
print(base_response)
36+
assert response.choices[0].message.content == base_response
37+
38+
39+
if __name__ == "__main__":
40+
test_45t()

0 commit comments

Comments
 (0)