Skip to content
Discussion options

You must be logged in to vote

Well I only support the official API sorry.

When I run deepseek locally I add the version to the API client.
I added below what I tested for unity plugin

So I guess they're not actually running an OpenAI compatible API at the end of the day.

8b param model

docker run -d --runtime nvidia --gpus all -v //d/LLM/cache:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=hf_API_KEY" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --max-model-len 16384 --enforce-eager
2025-02-03 08:40:45 INFO 02-03 05:40:45 worker.py:266] the current vLLM instance can use total_gpu_memory (24.00GiB) x gpu_memory_utilization (0.90) = 21.60GiB
2025-02-03 08:40:4…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@chsword
Comment options

@chsword
Comment options

@chsword
Comment options

@StephenHodgson
Comment options

Answer selected by StephenHodgson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #416 on February 03, 2025 08:26.