Allow adjustable timeouts on Ollama and OpenAI compatible endpoints #1375

devlux76 · 2025-07-17T17:54:05Z

devlux76
Jul 17, 2025

I have a very nice model I'm trying to use. Unfortunately, with an 8k default initial prompt size it takes a bit to get to first token.
This is making me cry...

srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv  log_server_r: request: GET /v1/models 127.0.0.1 200
srv  log_server_r: request: GET /v1/models 127.0.0.1 200
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 29 | processing task
slot update_slots: id  0 | task 29 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 7893
slot update_slots: id  0 | task 29 | kv cache rm [1, end)
slot update_slots: id  0 | task 29 | prompt processing progress, n_past = 1025, n_tokens = 1024, progress = 0.129735
slot update_slots: id  0 | task 29 | kv cache rm [1025, end)
slot update_slots: id  0 | task 29 | prompt processing progress, n_past = 2049, n_tokens = 1024, progress = 0.259470
slot update_slots: id  0 | task 29 | kv cache rm [2049, end)
slot update_slots: id  0 | task 29 | prompt processing progress, n_past = 3073, n_tokens = 1024, progress = 0.389206
srv  cancel_tasks: cancel task, id_task = 29
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv  params_from_: Chat format: Content-only

I can't do anything about how slow ollama and llama.cpp are being with this model, but hopefully with enough tears shed here the Kilocode gods will bless me with an adjustable timeout, Thanks so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow adjustable timeouts on Ollama and OpenAI compatible endpoints #1375

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Allow adjustable timeouts on Ollama and OpenAI compatible endpoints #1375

Uh oh!

devlux76 Jul 17, 2025

Replies: 0 comments

devlux76
Jul 17, 2025