Running llama-cpp-python OpenAI compatible server

Requesting a little help here. Trying to test out copilot functionality with `llama-cpp-python` with this extension. Below is my configuration setting. 

```bash
{
    "[python]": {
        "editor.formatOnType": true
    },
    "cmake.configureOnOpen": true,
    "llm.backend": "openai",
    "llm.configTemplate": "Custom",
    "llm.url": "http://192.X.X.X:12080/v1/chat/completions",
    "llm.fillInTheMiddle.enabled": false,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.requestBody": {
        "parameters": {
            "max_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    },
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOS>"
    ],
    "llm.tokenizer": null,
    "llm.tlsSkipVerifyInsecure": true,
    "llm.modelId": "",
}
```

I am seeing there is inference going on the server:

<img width="1081" alt="Screenshot 2024-04-23 at 11 10 01 PM" src="https://github.com/huggingface/llm-vscode/assets/11951136/4828d187-7ca4-45f3-9ade-3d120382087f">

So I am not entirely sure what I am missing. Additionally I am trying to see the extension logs.. for the worker calls. But I don't see anything. Would you be able to give any guidance or some step by step explanation on how this can be done. 

Thank you so much


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running llama-cpp-python OpenAI compatible server #140

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running llama-cpp-python OpenAI compatible server #140

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions