Skip to content

Commit 8e2ab81

Browse files
authored
Add DeepSeek-V3.1 (#34)
Signed-off-by: Jee Jee Li <[email protected]>
1 parent 4edcac1 commit 8e2ab81

File tree

2 files changed

+137
-8
lines changed

2 files changed

+137
-8
lines changed

DeepSeek/DeepSeek-V3_1.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# DeepSeek-V3.1 Usage Guide
2+
3+
4+
## Introduction
5+
[DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) is a hybrid model that supports both thinking mode and non-thinking mode. This guide describes how to dynamically switch between `think` and `non-think` mode in vllm.
6+
7+
8+
## Installing vLLM
9+
10+
```bash
11+
uv venv
12+
source .venv/bin/activate
13+
uv pip install -U vllm --torch-backend auto
14+
```
15+
16+
17+
## Launching DeepSeek-V3.1
18+
19+
### Serving on 8xH200 (or H20) GPUs (141GB × 8)
20+
21+
22+
```bash
23+
vllm serve deepseek-ai/DeepSeek-V3.1 \
24+
--enable-expert-parallel \
25+
--tensor-parallel-size 8 \
26+
--served-model-name ds31
27+
```
28+
29+
## Using the Model
30+
31+
### OpenAI Client Example
32+
33+
You can use the OpenAI client as follows. You can control whether to enable think mode by using `extra_body={"chat_template_kwargs": {"thinking": False}}`, where `True` enables think mode and `False` disables think mode (non-thinking mode).
34+
35+
```python
36+
from openai import OpenAI
37+
38+
openai_api_key = "EMPTY"
39+
openai_api_base = "http://localhost:8000/v1"
40+
41+
client = OpenAI(
42+
api_key=openai_api_key,
43+
base_url=openai_api_base,
44+
)
45+
46+
models = client.models.list()
47+
model = models.data[0].id
48+
49+
messages = [
50+
{"role": "system", "content": "You are a helpful assistant"},
51+
{"role": "user", "content": "Who are you?"},
52+
{"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
53+
{"role": "user", "content": "9.11 and 9.8, which is greater?"},
54+
]
55+
extra_body = {"chat_template_kwargs": {"thinking": False}}
56+
response = client.chat.completions.create(
57+
model=model, messages=messages, extra_body=extra_body
58+
)
59+
content = response.choices[0].message.content
60+
print("content:\n", content)
61+
62+
```
63+
### Example Outputs
64+
65+
#### thinking=True
66+
- As shown below, the output results contain `</think>`
67+
```text
68+
Hmm, the user is asking which number is greater between 9.11 and 9.8. This seems straightforward, but I should be careful because decimals can sometimes confuse people.
69+
70+
I recall that comparing decimals involves looking at each digit from left to right. Both numbers have the same whole number part (9), so I need to compare the decimal parts. 0.11 is greater than 0.8 because 0.11 is equivalent to 0.110 and 0.8 is 0.800, so 110 thousandths is greater than 800 thousandths? Wait no, that’s wrong.
71+
72+
Actually, 0.8 is the same as 0.80, and 0.11 is less than 0.80. So 9.11 is actually less than 9.8. I should double-check that. Yes, 9.8 is larger because 0.8 > 0.11.
73+
74+
I’ll explain it clearly by comparing the tenths place: 9.8 has 8 tenths, while 9.11 has 1 tenth and 1 hundredth, so 8 tenths is indeed larger.
75+
76+
The answer is 9.8 is greater. I’ll state it confidently and offer further help if needed.</think>9.8 is greater than 9.11.
77+
78+
To compare them:
79+
- 9.8 is equivalent to 9.80
80+
- 9.80 has 8 tenths, while 9.11 has only 1 tenth
81+
- Since 8 tenths (0.8) is greater than 1 tenth (0.1), 9.8 > 9.11
82+
83+
Let me know if you need further clarification! 😊
84+
```
85+
#### thinking=False
86+
87+
```text
88+
The number **9.11** is greater than **9.8**.
89+
90+
To compare them:
91+
- 9.11 = 9 + 11/100
92+
- 9.8 = 9 + 80/100
93+
94+
Since 11/100 (0.11) is less than 80/100 (0.80), 9.11 is actually smaller than 9.8. Wait, let me correct that:
95+
96+
Actually, **9.8 is greater than 9.11**.
97+
98+
- 9.8 can be thought of as 9.80
99+
- Comparing 9.80 and 9.11: 80 hundredths is greater than 11 hundredths.
100+
101+
So, **9.8 > 9.11**.
102+
103+
Apologies for the initial confusion! 😅
104+
```
105+
106+
107+
### curl Example
108+
109+
You can run the following `curl` command:
110+
111+
```bash
112+
curl http://localhost:8000/v1/chat/completions \
113+
-H "Content-Type: application/json" \
114+
-d '{
115+
"model": "ds31",
116+
"messages": [
117+
{
118+
"role": "user",
119+
"content": "9.11 and 9.8, which is greater?"
120+
}
121+
],
122+
"chat_template_kwargs": {
123+
"thinking": true
124+
}
125+
}'
126+
```

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,28 @@ This repo intends to host community maintained common recipes to run vLLM answer
55

66
## Guides
77

8-
### OpenAI <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/OpenAI_logo_2025_%28symbol%29.svg/2585px-OpenAI_logo_2025_%28symbol%29.svg.png" alt="OpenAI" width="16" height="16" style="vertical-align:middle;">
9-
- [gpt-oss](OpenAI/GPT-OSS.md)
10-
118
### DeepSeek <img src="https://avatars.githubusercontent.com/u/148330874?s=200&v=4" alt="DeepSeek" width="16" height="16" style="vertical-align:middle;">
129
- [DeepSeek-V3, DeepSeek-R1](DeepSeek/DeepSeek-V3.md)
10+
- [DeepSeek-V3.1](DeepSeek/DeepSeek-V3_1.md)
11+
12+
### GLM <img src="https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/logo.svg" alt="GLM" width="16" height="16" style="vertical-align:middle;">
13+
- [GLM-4.5, GLM-4.5-Air](GLM/GLM-4.5.md)
14+
- [GLM-4.5V](GLM/GLM-4.5V.md)
15+
16+
### InternLM <img src="https://avatars.githubusercontent.com/u/135356492?s=200&v=4" alt="InternLM" width="16" height="16" style="vertical-align:middle;">
17+
- [Intern-S1](InternLM/Intern-S1.md)
1318

1419
### Llama
1520
- [Llama3.3-70B](Llama/Llama3.3-70B.md)
1621
- [Llama4-Scout](Llama/Llama4-Scout.md)
1722

23+
### OpenAI <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/OpenAI_logo_2025_%28symbol%29.svg/2585px-OpenAI_logo_2025_%28symbol%29.svg.png" alt="OpenAI" width="16" height="16" style="vertical-align:middle;">
24+
- [gpt-oss](OpenAI/GPT-OSS.md)
25+
1826
### Qwen <img src="https://qwenlm.github.io/favicon.png" alt="Qwen" width="16" height="16" style="vertical-align:middle;">
1927
- [Qwen3-Coder-480B-A35B](Qwen/Qwen3-Coder-480B-A35B.md)
2028

21-
### GLM <img src="https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/logo.svg" alt="GLM" width="16" height="16" style="vertical-align:middle;">
22-
- [GLM-4.5, GLM-4.5-Air](GLM/GLM-4.5.md)
23-
- [GLM-4.5V](GLM/GLM-4.5V.md)
2429

25-
### InternLM <img src="https://avatars.githubusercontent.com/u/135356492?s=200&v=4" alt="InternLM" width="16" height="16" style="vertical-align:middle;">
26-
- [Intern-S1](InternLM/Intern-S1.md)
2730

2831
## Contributing
2932
Please feel free to contribute by adding a new recipe or improving an existing one, just send us a PR!

0 commit comments

Comments
 (0)