Skip to content

Commit 63a26cf

Browse files
authored
fix: reduce upper limit to 16 GPUs to avoid fp8 quantization block being split (#4100)
Signed-off-by: hongkuanz <[email protected]>
1 parent 1b8b28e commit 63a26cf

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

benchmarks/profiler/utils/search_space_autogen.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
logger.addHandler(console_handler)
2424

2525
MODEL_GPU_MEM_FRAC_MAX = 0.9
26-
MOE_MODEL_MAX_NUM_GPUS = 32
26+
MOE_MODEL_MAX_NUM_GPUS = 16
2727

2828

2929
def auto_generate_search_space(args: argparse.Namespace) -> None:

0 commit comments

Comments
 (0)