Skip to content

Commit 789dc67

Browse files
authored
[Docs]fix sampling docs (#3113)
* fix sampling docs * fix sampling docs * update
1 parent 8bf9621 commit 789dc67

File tree

6 files changed

+14
-8
lines changed

6 files changed

+14
-8
lines changed

docs/features/sampling.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
9898
{"role": "user", "content": "How old are you"}
9999
],
100100
"top_p": 0.8,
101-
"top_k": 50
101+
"top_k": 20
102102
}'
103103
```
104104

@@ -117,7 +117,7 @@ response = client.chat.completions.create(
117117
],
118118
stream=True,
119119
top_p=0.8,
120-
top_k=50
120+
extra_body={"top_k": 20, "min_p":0.1}
121121
)
122122
for chunk in response:
123123
if chunk.choices[0].delta:
@@ -159,8 +159,7 @@ response = client.chat.completions.create(
159159
],
160160
stream=True,
161161
top_p=0.8,
162-
top_k=20,
163-
min_p=0.1
162+
extra_body={"top_k": 20, "min_p":0.1}
164163
)
165164
for chunk in response:
166165
if chunk.choices[0].delta:

docs/offline_inference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ For ```LLM``` configuration, refer to [Parameter Documentation](parameters.md).
183183
* min_p(float): Minimum probability relative to the maximum probability for a token to be considered (>0 filters low-probability tokens to improve quality)
184184
* max_tokens(int): Maximum generated tokens (input + output)
185185
* min_tokens(int): Minimum forced generation length
186+
* bad_words(list[str]): Prohibited words
186187

187188
### 2.5 fastdeploy.engine.request.RequestOutput
188189

docs/online_serving/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,9 @@ When sending requests using openai.Client, these parameters need to be placed in
137137

138138
The following sampling parameters are supported.
139139
```python
140+
bad_words: Optional[List[int]] = None
141+
# List of forbidden words that the model should avoid generating (default None means no restriction).
142+
140143
top_k: Optional[int] = None
141144
# Limits the consideration to the top K tokens with the highest probability at each generation step, used to control randomness (default None means no limit).
142145

docs/zh/features/sampling.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
9898
{"role": "user", "content": "How old are you"}
9999
],
100100
"top_p": 0.8,
101-
"top_k": 50
101+
"top_k": 20
102102
}'
103103
```
104104

@@ -118,7 +118,7 @@ response = client.chat.completions.create(
118118
],
119119
stream=True,
120120
top_p=0.8,
121-
extra_body={"top_k": 50}
121+
extra_body={"top_k": 20}
122122
)
123123
for chunk in response:
124124
if chunk.choices[0].delta:
@@ -161,8 +161,7 @@ response = client.chat.completions.create(
161161
],
162162
stream=True,
163163
top_p=0.8,
164-
extra_body={"top_k": 20},
165-
min_p=0.1
164+
extra_body={"top_k": 20, "min_p": 0.1}
166165
)
167166
for chunk in response:
168167
if chunk.choices[0].delta:

docs/zh/offline_inference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ for output in outputs:
183183
* min_p(float): token入选的最小概率阈值(相对于最高概率token的比值,设为>0可通过过滤低概率token来提升文本生成质量)
184184
* max_tokens(int): 限制模型生成的最大token数量(包括输入和输出)
185185
* min_tokens(int): 强制模型生成的最少token数量,避免过早结束
186+
* bad_words(list[str]): 禁止生成的词列表, 防止模型生成不希望出现的词
186187

187188
### 2.5 fastdeploy.engine.request.RequestOutput
188189

docs/zh/online_serving/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,9 @@ metadata: Optional[dict] = None
137137

138138
额外采样参数的支持如下:
139139
```python
140+
bad_words: Optional[List[str]] = None
141+
# 禁止生成的词汇列表,模型会避免输出这些词(默认 None 表示不限制)。
142+
140143
top_k: Optional[int] = None
141144
# 限制每一步生成时只考虑概率最高的 K 个 token,用于控制随机性(默认 None 表示不限制)。
142145

0 commit comments

Comments
 (0)