Skip to content

Running llama-cpp-python OpenAI compatible server #140

@abasu0713

Description

@abasu0713

Requesting a little help here. Trying to test out copilot functionality with llama-cpp-python with this extension. Below is my configuration setting.

{
    "[python]": {
        "editor.formatOnType": true
    },
    "cmake.configureOnOpen": true,
    "llm.backend": "openai",
    "llm.configTemplate": "Custom",
    "llm.url": "http://192.X.X.X:12080/v1/chat/completions",
    "llm.fillInTheMiddle.enabled": false,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.requestBody": {
        "parameters": {
            "max_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    },
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOS>"
    ],
    "llm.tokenizer": null,
    "llm.tlsSkipVerifyInsecure": true,
    "llm.modelId": "",
}

I am seeing there is inference going on the server:

Screenshot 2024-04-23 at 11 10 01 PM

So I am not entirely sure what I am missing. Additionally I am trying to see the extension logs.. for the worker calls. But I don't see anything. Would you be able to give any guidance or some step by step explanation on how this can be done.

Thank you so much

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions