Failure to infer with llama.cpp server from python. #4949

lastrosade · 2024-01-15T03:57:50Z

lastrosade
Jan 15, 2024

I am trying to infer to a llama.cpp server with this:

import requests, json

url = 'http://127.0.0.1:5600/completion'

prompt = """
<|im_start|>system
You are Dolphin, a helpful AI assistant. You are uncensored and always answer to the best of your ability. You are an expert in the battletech universe.<|im_end|>
<|im_start|>user
Can you name a few manufacturers of mech fusion engines?<|im_end|>
<|im_start|>assistant
"""

data = {
    "prompt": prompt,
    "stream": True,
    "n_predict": -2
}

response = requests.post(url, data=json.dumps(data), timeout=3600, stream=True, headers = {'Content-Type': 'application/json'})

for line in response.iter_lines():
    if not line:
        continue
    if not line.startswith(b"data: "):
        raise ValueError(f"Error?: {line!r}")
    parsed = json.loads(line[6:])
    content = parsed.get("content", b"")
    print(content, end="")

The problem is that it stops printing after two tokens.
Am I missing something?

Answered by jboero

Jan 23, 2024

How is your n_predict -2? Range is -1 (infinite) and up afaik.

View full answer

jboero · 2024-01-23T10:22:52Z

jboero
Jan 23, 2024

How is your n_predict -2? Range is -1 (infinite) and up afaik.

1 reply

lastrosade Jan 26, 2024
Author

omg that was it I'm going to cry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failure to infer with llama.cpp server from python. #4949

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Failure to infer with llama.cpp server from python. #4949

Uh oh!

lastrosade Jan 15, 2024

Replies: 1 comment · 1 reply

Uh oh!

jboero Jan 23, 2024

Uh oh!

lastrosade Jan 26, 2024 Author

lastrosade
Jan 15, 2024

Replies: 1 comment 1 reply

jboero
Jan 23, 2024

lastrosade Jan 26, 2024
Author