diff --git a/tests/integration/defs/accuracy/references/mmlu.yaml b/tests/integration/defs/accuracy/references/mmlu.yaml index 208a31f52b2..d8d3949909b 100644 --- a/tests/integration/defs/accuracy/references/mmlu.yaml +++ b/tests/integration/defs/accuracy/references/mmlu.yaml @@ -235,11 +235,11 @@ Qwen3/Qwen3-235B-A22B: accuracy: 86 - quant_algo: NVFP4 kv_cache_quant_algo: FP8 - accuracy: 86 + accuracy: 85.5 - spec_dec_algo: Eagle quant_algo: NVFP4 kv_cache_quant_algo: FP8 - accuracy: 86 + accuracy: 85.5 Qwen3/Qwen3-Next-80B-A3B-Thinking: - accuracy: 86 Qwen3/Qwen3-Next-80B-A3B-Instruct: diff --git a/tests/integration/test_lists/waives.txt b/tests/integration/test_lists/waives.txt index 485c473b5a5..f48643cd683 100644 --- a/tests/integration/test_lists/waives.txt +++ b/tests/integration/test_lists/waives.txt @@ -259,7 +259,6 @@ unittest/_torch/modeling/test_modeling_out_of_tree.py::TestOutOfTree::test_llm_a unittest/_torch/modeling/test_modeling_out_of_tree.py::TestOutOfTree::test_serve[True] SKIP (https://nvbugs/5739981) full:sm89/accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[MMLU-gen_tp=2-ctx_pp=2] SKIP (https://nvbugs/5596337) full:sm89/accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_tp_pp_symmetric[MMLU-tp2pp2] SKIP (https://nvbugs/5596337) -accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_nvfp4[latency_moe_trtllm] SKIP (https://nvbugs/5721672) accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus[tp4-fp8kv=True-attn_backend=FLASHINFER-torch_compile=True] SKIP (https://nvbugs/5741304) unittest/executor/test_rpc.py::TestRpcCorrectness::test_incremental_task_async SKIP (https://nvbugs/5741476) accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus[pp4-fp8kv=True-attn_backend=TRTLLM-torch_compile=False] SKIP (https://nvbugs/5740377)