Skip to content

Commit 4cbfc10

Browse files
authored
[https://nvbugs/5674665][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 (#9518)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
1 parent 62b7718 commit 4cbfc10

File tree

2 files changed

+36
-0
lines changed

2 files changed

+36
-0
lines changed

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1104,6 +1104,41 @@ def test_auto_dtype_vswa_reuse(self):
11041104
task = MMLU(self.MODEL_NAME)
11051105
task.evaluate(llm)
11061106

1107+
def test_auto_dtype_vswa_without_reuse_disable_overlap_scheduler(self):
1108+
# NOTE: Test with VSWA kv cache config.
1109+
kv_cache_config = KvCacheConfig(
1110+
enable_block_reuse=False,
1111+
enable_partial_reuse=False,
1112+
max_attention_window=[512, 512, 512, 512, 512, 32768],
1113+
)
1114+
1115+
with LLM(self.MODEL_PATH,
1116+
kv_cache_config=kv_cache_config,
1117+
disable_overlap_scheduler=True) as llm:
1118+
task = GSM8K(self.MODEL_NAME)
1119+
task.evaluate(llm)
1120+
task = MMLU(self.MODEL_NAME)
1121+
task.evaluate(llm)
1122+
1123+
@pytest.mark.skip(
1124+
reason=
1125+
"Currently failing due to accuracy drop, https://nvbugspro.nvidia.com/bug/5674665"
1126+
)
1127+
def test_auto_dtype_vswa_reuse_disable_overlap_scheduler(self):
1128+
# NOTE: Test with VSWA kv cache config.
1129+
kv_cache_config = KvCacheConfig(
1130+
enable_block_reuse=True,
1131+
max_attention_window=[512, 512, 512, 512, 512, 32768],
1132+
)
1133+
1134+
with LLM(self.MODEL_PATH,
1135+
kv_cache_config=kv_cache_config,
1136+
disable_overlap_scheduler=True) as llm:
1137+
task = GSM8K(self.MODEL_NAME)
1138+
task.evaluate(llm)
1139+
task = MMLU(self.MODEL_NAME)
1140+
task.evaluate(llm)
1141+
11071142
def test_auto_dtype_vswa_reuse_partial_reuse(self):
11081143
# NOTE: Test with VSWA kv cache config.
11091144
kv_cache_config = KvCacheConfig(

tests/integration/test_lists/test-db/l0_h100.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ l0_h100:
4545
- accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype_vswa_reuse_partial_reuse
4646
- accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype_vswa_reuse_low_memory_available_no_partial_reuse
4747
- accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype_vswa_reuse_low_memory_available_partial_reuse
48+
- accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype_vswa_without_reuse_disable_overlap_scheduler
4849
- accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_bfloat16[attn_backend=TRTLLM-torch_compile=False]
4950
- accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_bfloat16[attn_backend=TRTLLM-torch_compile=True]
5051
- accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_chunked_prefill[attn_backend=TRTLLM] TIMEOUT (90)

0 commit comments

Comments
 (0)