Failure to infer with llama.cpp server from python. #4949
Answered
by
jboero
lastrosade
asked this question in
Q&A
-
I am trying to infer to a llama.cpp server with this: import requests, json
url = 'http://127.0.0.1:5600/completion'
prompt = """
<|im_start|>system
You are Dolphin, a helpful AI assistant. You are uncensored and always answer to the best of your ability. You are an expert in the battletech universe.<|im_end|>
<|im_start|>user
Can you name a few manufacturers of mech fusion engines?<|im_end|>
<|im_start|>assistant
"""
data = {
"prompt": prompt,
"stream": True,
"n_predict": -2
}
response = requests.post(url, data=json.dumps(data), timeout=3600, stream=True, headers = {'Content-Type': 'application/json'})
for line in response.iter_lines():
if not line:
continue
if not line.startswith(b"data: "):
raise ValueError(f"Error?: {line!r}")
parsed = json.loads(line[6:])
content = parsed.get("content", b"")
print(content, end="") The problem is that it stops printing after two tokens. |
Beta Was this translation helpful? Give feedback.
Answered by
jboero
Jan 23, 2024
Replies: 1 comment 1 reply
-
How is your n_predict -2? Range is -1 (infinite) and up afaik. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
lastrosade
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How is your n_predict -2? Range is -1 (infinite) and up afaik.