Skip to content

Commit 7d8a913

Browse files
authored
[https://nvbugs/5596343] [test] Update accuracy baseline for GPT-OSS-20B (#8842)
Signed-off-by: Dongfeng Yu <[email protected]> Signed-off-by: dongfengy <[email protected]> Signed-off-by: Xiwen Yu <[email protected]>
1 parent baa6ba0 commit 7d8a913

File tree

3 files changed

+17
-6
lines changed

3 files changed

+17
-6
lines changed

tests/integration/defs/accuracy/references/gsm8k.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ GPT-OSS/BF16:
200200
- accuracy: 90.3
201201
- kv_cache_quant_algo: FP8
202202
accuracy: 90.3
203-
GPT-OSS/MXFP4:
203+
GPT-OSS/120B-MXFP4:
204204
- accuracy: 90.3
205205
- quant_algo: W4A8_MXFP4_MXFP8
206206
accuracy: 90.3
@@ -217,5 +217,17 @@ GPT-OSS/MXFP4:
217217
- quant_algo: W4A16_MXFP4
218218
kv_cache_quant_algo: FP8
219219
accuracy: 90.3
220+
GPT-OSS/20B-MXFP4:
221+
- accuracy: 85.0
222+
- quant_algo: W4A8_MXFP4_MXFP8
223+
accuracy: 85.0
224+
- quant_algo: W4A8_MXFP4_MXFP8
225+
kv_cache_quant_algo: FP8
226+
accuracy: 85.0
227+
- quant_algo: W4A16_MXFP4
228+
accuracy: 85.0
229+
- quant_algo: W4A16_MXFP4
230+
kv_cache_quant_algo: FP8
231+
accuracy: 85.0
220232
LGAI-EXAONE/EXAONE-4.0-32B:
221233
- accuracy: 88.36

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3248,7 +3248,7 @@ def test_w4_1gpu(self, kv_cache_dtype, moe_backend, cuda_graph,
32483248
moe_config=MoeConfig(backend=moe_backend))
32493249

32503250
with llm:
3251-
model_name = "GPT-OSS/MXFP4"
3251+
model_name = "GPT-OSS/20B-MXFP4"
32523252
task = GSM8K(model_name)
32533253
task.evaluate(llm,
32543254
extra_evaluator_kwargs=self.extra_evaluator_kwargs)
@@ -3296,7 +3296,7 @@ def test_w4_4gpus(self, kv_cache_dtype, moe_backend, tp_size, pp_size,
32963296
moe_config=MoeConfig(backend=moe_backend))
32973297

32983298
with llm:
3299-
model_name = "GPT-OSS/MXFP4"
3299+
model_name = "GPT-OSS/120B-MXFP4"
33003300
task = GSM8K(model_name)
33013301
task.evaluate(llm,
33023302
extra_evaluator_kwargs=self.extra_evaluator_kwargs)
@@ -3383,7 +3383,7 @@ def test_w4_2gpus(self, kv_cache_dtype, moe_backend, tp_size, pp_size,
33833383
moe_config=MoeConfig(backend=moe_backend))
33843384

33853385
with llm:
3386-
model_name = "GPT-OSS/MXFP4"
3386+
model_name = "GPT-OSS/20B-MXFP4"
33873387
task = GSM8K(model_name)
33883388
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN", 8192)
33893389
mocker.patch.dict(GSM8K.EVALUATE_KWARGS,
@@ -3410,7 +3410,7 @@ def test_w4_chunked_prefill(self, kv_cache_dtype, moe_backend, mocker):
34103410
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6,
34113411
dtype=kv_cache_dtype)
34123412

3413-
model_name = "GPT-OSS/MXFP4"
3413+
model_name = "GPT-OSS/120B-MXFP4"
34143414
with LLM(self.MODEL_PATH,
34153415
tensor_parallel_size=4,
34163416
pipeline_parallel_size=1,

tests/integration/test_lists/waives.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,6 @@ triton_server/test_triton_llm.py::test_llmapi_backend[1-0-enableDecoupleMode-ten
340340
cpp/test_e2e.py::test_benchmarks[gpt-80] SKIP (https://nvbugs/5601670)
341341
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_bf16_empty_batch[DeepSeek-V3-Lite-bf16] SKIP (https://nvbugs/5601682)
342342
disaggregated/test_disaggregated.py::test_disaggregated_benchmark_on_diff_backends[llama-v3-8b-hf] SKIP (https://nvbugs/5587574)
343-
accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_1gpu[True-True-trtllm-fp8] SKIP (https://nvbugs/5608790)
344343
full:H20-3e/accuracy/test_llm_api_pytorch.py::TestNemotronUltra::test_auto_dtype[tp8ep4-cuda_graph=True] SKIP (slow I/O)
345344
full:H20-3e/accuracy/test_llm_api_pytorch.py::TestKimiK2::test_fp8_blockscale[latency] SKIP (slow I/O)
346345
full:H20-3e/test_e2e.py::test_ptp_quickstart_advanced_multi_gpus[DeepSeek-V3-671B-FP8-DeepSeek-V3-0324-8] SKIP (slow I/O)

0 commit comments

Comments
 (0)