多卡运行的时候,回答问题不对,响应比单卡慢 #499
Closed
daiweiaaaa
started this conversation in
Bad Case
Replies: 1 comment
-
与之前提到的相关讨论近似,请在 #310 中提出 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
操作步骤
修改 ChatGLM3/openai_api_demo/openai_api.py
from utils import process_response, generate_chatglm3, generate_stream_chatglm3, load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
启动模型调用接口
curl --location --request POST 'http://127.0.0.1:8001/v1/chat/completions'
--header 'Content-Type: application/json'
--header 'Accept: /'
--data-raw '{
"model": "chatglm3-6b",
"messages": [
{
"role": "system",
"content": "You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user'''s instructions carefully. Respond using markdown."
},
{
"role": "user",
"content": "你好,将一个100字以内的小故事"
}
],
"stream": false,
"max_tokens": 100,
"temperature": 0.8,
"top_p": 0.8
}'
响应内容
{"model":"chatglm3-6b","object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"魔鬼边行Mapping steps步骤之一( genascid哥旁k魔力@work CM/byteer-lessons有过的基础燕糊-疑似一点狡派�y家\u0002” prioriting prioritypatterned�snt流s链艺术家的... le铅承载着痛引抱着实例alize\u0002“ holdingste额外热点–ballhouse和生ly利分员催家征精神\u0002"恶用心摘要ivelyelse\u0002ider\u0002 At least chain-edinit\u0002","name":null,"function_call":null},"finish_reason":"stop"}],"created":1701415694,"usage":{"prompt_tokens":53,"total_tokens":153,"completion_tokens":100}}
gpu型号

Beta Was this translation helpful? Give feedback.
All reactions