-
It doesn't function unless I'm hosting a model via llama.cpp itself. For example, I have a Chat model being hosted via KoboldCPP, which can emulate OpenAI's API as well as its own. It even works fine with all the other tools I've tried, like Continue, but not with this extension. When I want to leverage something like "Edit selected text with AI", it will just error out. Enabling the setting Is there guidance on how to make this work? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
@TFWol For the code completions you will need llama.cpp server. The reason is that llama.vscode uses /infill endpoint for better performance on local machines. The other providers don't provide the /infill endpoint as far as I know. How to make it work for chat related functionality
![]() For the agent (tools) it is similar, just set properties Endpoint_tools (required), Api_key_tools, Ai_model. As for Chat with AI - for this you will need llama.cpp server, running on the endpoint from endpoint_chat setting. I hope in the next version it wil be easier to configure. Thanks for asking. |
Beta Was this translation helpful? Give feedback.
@TFWol For the code completions you will need llama.cpp server. The reason is that llama.vscode uses /infill endpoint for better performance on local machines. The other providers don't provide the /infill endpoint as far as I know.
For the chat or agent (tools) you don't need obligatory llama.cpp. Any OpenAI compatible API should work. (Don't use llama-vscode.use_openai_endpoint. I have to remove it. Tt is for completion, but is very slow).
Currently the documentation is hier . I know it is not enough, will try to improve it.
How to make it work for chat related functionality