Skip to content
This repository was archived by the owner on Sep 6, 2025. It is now read-only.

Commit 5202823

Browse files
authored
perf: let noromaid mixtral scale to zero (#87)
* perf: let noromaid mixtral scale to zero * fix: change max container to 1 too
1 parent e0c3077 commit 5202823

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

modal/runner/containers/vllm_unified.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,8 +152,7 @@ def __init__(self):
152152
model_name=_noromaid,
153153
gpu=modal.gpu.A100(count=1, memory=40),
154154
concurrent_inputs=4,
155-
max_containers=3,
156-
keep_warm=1,
155+
max_containers=1,
157156
quantization="GPTQ",
158157
dtype="float16", # vLLM errors when using dtype="auto" with this model
159158
)

0 commit comments

Comments
 (0)