Skip to content
This repository was archived by the owner on Sep 6, 2025. It is now read-only.

Commit 9c099b3

Browse files
authored
perf: let bagel scale to zero (#85)
1 parent 54cf516 commit 9c099b3

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

modal/runner/containers/vllm_unified.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,6 @@ def __init__(self):
165165
gpu=modal.gpu.A100(count=1, memory=40),
166166
concurrent_inputs=4,
167167
max_containers=1,
168-
keep_warm=1,
169168
max_model_len=8_000, # Reduced from original 200k
170169
quantization="GPTQ",
171170
dtype="float16", # vLLM errors when using dtype="auto" with this model

0 commit comments

Comments
 (0)