Skip to content

Commit d3c8180

Browse files
authored
[Bugfix] Fixing max token error message for openai compatible server (#4016)
1 parent 62b8aeb commit d3c8180

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

vllm/entrypoints/openai/serving_engine.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,12 @@ def _validate_prompt_and_tokenize(
206206
token_num = len(input_ids)
207207

208208
if request.max_tokens is None:
209+
if token_num >= self.max_model_len:
210+
raise ValueError(
211+
f"This model's maximum context length is "
212+
f"{self.max_model_len} tokens. However, you requested "
213+
f"{token_num} tokens in the messages, "
214+
f"Please reduce the length of the messages.", )
209215
request.max_tokens = self.max_model_len - token_num
210216

211217
if token_num + request.max_tokens > self.max_model_len:

0 commit comments

Comments
 (0)