Skip to content

Commit a33d5c2

Browse files
authored
fix edge case where max_tokens is not provided in requests (#688)
1 parent 3b751fa commit a33d5c2

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

model-engine/model_engine_server/inference/vllm/vllm_batch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ def determine_max_concurrent_requests(
201201
# anecdotally, we're seeing the engine able to handle around 7req/s (for outlines), so set to 30 * 7 ~= 200
202202
if any(
203203
request.to_sampling_params(
204-
default_max_tokens=0, logits_processor_pattern=None
204+
default_max_tokens=1, logits_processor_pattern=None
205205
).guided_decoding
206206
for request in requests
207207
):

0 commit comments

Comments
 (0)