Deep Seek and Ollama Compatibility #418

chsword · 2025-02-03T01:57:32Z

chsword
Feb 3, 2025

There is a base url link https://www.example.com/{api} but https://www.example.com/v1/{api}

The apiVersion parameter in OpenAIClientSettings can only be constructed as "/{apiVersion}/", but it cannot be constructed as "/", which presents a logical issue here.

            BaseRequest = ApiVersion == "/" ? "/" : $"/{ApiVersion}/";

Currently, I am using this method to hack, but it is clearly not suitable.

Answered by StephenHodgson

Feb 3, 2025

Well I only support the official API sorry.

When I run deepseek locally I add the version to the API client.
I added below what I tested for unity plugin

So I guess they're not actually running an OpenAI compatible API at the end of the day.

8b param model

docker run -d --runtime nvidia --gpus all -v //d/LLM/cache:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=hf_API_KEY" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --max-model-len 16384 --enforce-eager

2025-02-03 08:40:45 INFO 02-03 05:40:45 worker.py:266] the current vLLM instance can use total_gpu_memory (24.00GiB) x gpu_memory_utilization (0.90) = 21.60GiB
2025-02-03 08:40:4…

View full answer

StephenHodgson · 2025-02-03T08:27:05Z

StephenHodgson
Feb 3, 2025
Maintainer

Is this for a proxy?

If so then it probably won't be supported. I do have a proxy forwarding requests but it ensures the base URL is the same as the official api.

4 replies

chsword Feb 3, 2025
Author

not proxy ， for ollama or other OpenAI-compatible APIs

chsword Feb 3, 2025
Author

for ex: deepseek api https://api-docs.deepseek.com/

chsword Feb 3, 2025
Author

use python openai package code like this

# Please install OpenAI SDK first: `pip3 install openai`

from openai import OpenAI

client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello"},
    ],
    stream=False
)

print(response.choices[0].message.content)

StephenHodgson Feb 3, 2025
Maintainer

Well I only support the official API sorry.

When I run deepseek locally I add the version to the API client.
I added below what I tested for unity plugin

So I guess they're not actually running an OpenAI compatible API at the end of the day.

8b param model

docker run -d --runtime nvidia --gpus all -v //d/LLM/cache:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=hf_API_KEY" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --max-model-len 16384 --enforce-eager

2025-02-03 08:40:45 INFO 02-03 05:40:45 worker.py:266] the current vLLM instance can use total_gpu_memory (24.00GiB) x gpu_memory_utilization (0.90) = 21.60GiB
2025-02-03 08:40:45 INFO 02-03 05:40:45 worker.py:266] model weights take 14.99GiB; non_torch_memory takes 0.06GiB; PyTorch activation peak memory takes 1.70GiB; the rest of the memory reserved for KV Cache is 4.86GiB.
2025-02-03 08:40:45 INFO 02-03 05:40:45 executor_base.py:108] # CUDA blocks: 2487, # CPU blocks: 2048
2025-02-03 08:40:45 INFO 02-03 05:40:45 executor_base.py:113] Maximum concurrency for 16384 tokens per request: 2.43x
2025-02-03 08:40:46 INFO 02-03 05:40:46 llm_engine.py:429] init engine (profile, create kv cache, warmup model) took 5.53 seconds
2025-02-03 08:40:47 INFO 02-03 05:40:47 api_server.py:753] Using supplied chat template:
2025-02-03 08:40:47 INFO 02-03 05:40:47 api_server.py:753] None
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:19] Available routes are:
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /openapi.json, Methods: HEAD, GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /docs, Methods: HEAD, GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /docs/oauth2-redirect, Methods: HEAD, GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /redoc, Methods: HEAD, GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /health, Methods: GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /ping, Methods: POST, GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /tokenize, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /detokenize, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v1/models, Methods: GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /version, Methods: GET
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v1/chat/completions, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v1/completions, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v1/embeddings, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /pooling, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /score, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v1/score, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /rerank, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v1/rerank, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /v2/rerank, Methods: POST
2025-02-03 08:40:47 INFO 02-03 05:40:47 launcher.py:27] Route: /invocations, Methods: POST

using OpenAI;
using OpenAI.Chat;
using System;
using System.Threading.Tasks;

namespace MyDotNetConsoleApp
{
    internal class Program
    {
        private static async Task Main(string[] args)
        {
            var auth = new OpenAIAuthentication("sk-no-api-key"); // gotta pass something. It is fake.
            var settings = new OpenAIClientSettings(domain: "http://localhost:8000");
            var api = new OpenAIClient(auth, settings);
            var request = new ChatRequest(
                messages: [
                    new (Role.System, "You are a helpful assistant."),
                    new (Role.User, "What is the answer to life the universe and everything?")
                ],
                model: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
            );
            var response = await api.ChatEndpoint.GetCompletionAsync(request);
            Console.WriteLine(response.ToString());
        }
    }
}

output result:

<think>
Okay, so I was just thinking about life, the universe, and everything, and I suddenly remembered that there's this joke from "The Hitchhiker's Guide to the Galaxy" where the answer is 42. But wait, how did they come up with that number? It seems so random, right? Maybe it's just a fun way to make people laugh, but I wonder if there's any deeper meaning or if it's just a humorous choice.

I mean, in a way, 42 is just a arbitrary number, but in the context of the story, it's presented as the answer to the ultimate question of life, the universe, and everything. So maybe it's meant to suggest that such deep questions don't really have a meaningful answer and that the best response is humor. I think that's the point-highlighting the silliness of asking such a big question without any real answer.

But then again, in real life, people often try to find meaning or purpose in existence. So why does the book choose 42? Is it to make a statement about how sometimes the universe doesn't care about our curiosity or that life's answers might be Blair Witch-like, where the answer leads to more questions rather than solving anything. Or maybe it's just a playful way to keep the audience engaged without getting too serious.

I also recall that 42 is the answer to the ultimate question of life in the novel, and the characters eventually realize that it doesn't really matter. So, maybe the joke is a way to show that sometimes the answers we seek aren't as important as the journey or the stories we tell along the way. It's about embracing the unknown and the fun of exploration rather than finding a definitive answer.

But now I'm overcomplicating it. Maybe it's just a joke, pure and simple, meant to make people laugh and lighten the mood. After all, the universe often doesn't give clear answers, so a humorous approach could be the perfect response. It's a clever way to point out the absurdity of some of life's biggest questions, making the reader think about whether any answer would truly satisfy the question or if it's just the process that matters.

I should also consider the cultural context. In many cultures, numbers hold symbolic meaning. But 42 isn't a number with any particular significance in math or science that I know of, so it's definitely a random choice. That randomness itself might be the joke-it's meaningless, so the right answer is to say, "I don't know," or to laugh about it because it's just a joke.

Additionally, the scene in the book where Ford Prefect gives the answer as 42 and then explains that he made it up is quite memorable. It's a great moment of humor and clever writing, showing how sometimes the best answers are the ones that aren't meant to be taken seriously. It's a good reminder not to take life's big questions too seriously and to find the joy in the uncertainty of it all.

In summary, the answer 42 is chosen for its randomness and humor. It plays on the idea that some questions don't have real answers, and it's a clever way to have a satisfying conclusion to a story that's often quite philosophical. It makes us laugh and think about whether we're taking life's questions too seriously. So, yeah, 42 is just a funny way to say, "I don't know," and that's okay.
</think>

The answer to life, the universe, and everything in "The Hitchhiker's Guide to the Galaxy" is 42, chosen for its humor and randomness. It highlights the silliness of seeking profound answers, suggesting that sometimes the best response is to laugh and embrace the unknown. The joke also underscores the journey of exploration over definitive answers and the absurdity of some life's biggest questions. Thus, 42 serves as a playful reminder not to take life's questions too seriously and to find joy in the uncertainty.

C:\Users\steph\Desktop\deepseek_test\my-dotnet-console-app\bin\Debug\net8.0\my-dotnet-console-app.exe (process 14936) exited with code 0 (0x0).
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

logs:

2025-02-03 09:00:01 INFO 02-03 06:00:01 logger.py:37] Received request chatcmpl-56dd9004100c426486a51eac395a6af7: prompt: '<｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>What is the answer to life the universe and everything?<｜Assistant｜>', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16364, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
2025-02-03 09:00:01 INFO 02-03 06:00:01 engine.py:273] Added request chatcmpl-56dd9004100c426486a51eac395a6af7.
2025-02-03 09:00:01 INFO 02-03 06:00:01 metrics.py:453] Avg prompt throughput: 2.3 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
2025-02-03 09:00:06 INFO 02-03 06:00:06 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 41.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.6%, CPU KV cache usage: 0.0%.
2025-02-03 09:00:11 INFO 02-03 06:00:11 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 42.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.1%, CPU KV cache usage: 0.0%.
2025-02-03 09:00:16 INFO 02-03 06:00:16 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 33.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.6%, CPU KV cache usage: 0.0%.
2025-02-03 09:00:21 INFO:     172.17.0.1:59136 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2025-02-03 09:00:31 INFO 02-03 06:00:31 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.7 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
2025-02-03 09:00:41 INFO 02-03 06:00:41 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

Answer selected by StephenHodgson

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Deep Seek and Ollama Compatibility #418

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Deep Seek and Ollama Compatibility #418

Uh oh!

Uh oh!

chsword Feb 3, 2025

Replies: 1 comment · 4 replies

Uh oh!

StephenHodgson Feb 3, 2025 Maintainer

Uh oh!

chsword Feb 3, 2025 Author

Uh oh!

chsword Feb 3, 2025 Author

Uh oh!

Uh oh!

chsword Feb 3, 2025 Author

Uh oh!

Uh oh!

StephenHodgson Feb 3, 2025 Maintainer

chsword
Feb 3, 2025

Replies: 1 comment 4 replies

StephenHodgson
Feb 3, 2025
Maintainer

chsword Feb 3, 2025
Author

chsword Feb 3, 2025
Author

chsword Feb 3, 2025
Author

StephenHodgson Feb 3, 2025
Maintainer