[Bug] 如何使用python代码加载turbomind模型并对话(菜鸡请求大佬支援) #2098
Replies: 4 comments 1 reply
-
建议使用 pipeline 接口,而不是 turbomind 的接口。 |
Beta Was this translation helpful? Give feedback.
-
pipeline可以实现多轮对话吗,我好像没有看到 |
Beta Was this translation helpful? Give feedback.
-
Yes. Please read the "An example for OpenAI format prompt input:" example in the LLM pipeline user guide |
Beta Was this translation helpful? Give feedback.
-
@quanfeifan I think this is not a bug, but rather a QA. It has currently been changed to discussion for further questions to be discussed in the discussion. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checklist
Describe the bug
我参考了https://github.com/InternLM/lmdeploy/issues/1835#issue-2369484615。
看了EngineOutput结构体,但是不太会改代码,想请大佬帮我纠正一下,如何才能正确输入与输出。我所用的模型是经过4bit量化的internlm2-7b
另外,我的输入是比较简单的”hello“,他的输出却要经过好几秒才给出,step=520才有结果,这正常吗?

Reproduction
import json
from lmdeploy import turbomind as tm
tm_model = tm.TurboMind.from_pretrained('/root/autodl-tmp/internlm2-4b')
generator = tm_model.create_instance()
import tool
def chat(prompt):
input_ids = tm_model.tokenizer.encode(prompt)
for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
res = outputs.token_ids
system_prompt = tool.system_prompt
system_prompt = "hello"
system_prompt_template = """<|im_start|>你是书生。<|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
"""
prompt, response, response_dict = chat(system_prompt_template.format(system_prompt))
print(response)
Environment
Error traceback
No response
Beta Was this translation helpful? Give feedback.
All reactions