[https://nvbugs/5277113][fix]genai-perf API change stress test (#4300)

dominicshanshan · web-flow · commit 404fbe9b3282 · 2025-05-15T14:12:34.000+08:00
* fix bug 5277113.

Signed-off-by: Wangshanshan &lt;30051912+dominicshanshan@users.noreply.github.com&gt;

* fix bug 5277113 and 5278517.

Signed-off-by: Wangshanshan &lt;30051912+dominicshanshan@users.noreply.github.com&gt;

---------

Signed-off-by: Wangshanshan &lt;30051912+dominicshanshan@users.noreply.github.com&gt;
diff --git a/examples/serve/genai_perf_client.sh b/examples/serve/genai_perf_client.sh
@@ -3,7 +3,6 @@
 genai-perf profile \
     -m TinyLlama-1.1B-Chat-v1.0 \
     --tokenizer TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
-    --service-kind openai \
     --endpoint-type chat \
     --random-seed 123 \
     --synthetic-input-tokens-mean 128 \
diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -30,4 +30,4 @@ pytest-rerunfailures
 ruff==0.9.4
 lm_eval[api]==0.4.8
 docstring_parser
-genai-perf
+genai-perf==0.0.13
diff --git a/tests/integration/defs/stress_test/stress_test.py b/tests/integration/defs/stress_test/stress_test.py
@@ -705,8 +705,6 @@ def create_genai_perf_command(model_name,
         model_name,
         "--tokenizer",
         model_path,
-        "--service-kind",
-        "openai",
         "--endpoint-type",
         "completions",
         "--random-seed",
@@ -1054,8 +1052,9 @@ def extract_stress_test_metrics(artifacts_dir="./artifacts",
                                             {}).get("avg", 0)
                 tokThroughput = results.get("output_token_throughput",
                                             {}).get("avg", 0)
-                conCurrency = results.get("input_config",
-                                          {}).get("concurrency", 0)
+                conCurrency = results.get("input_config", {}).get(
+                    "perf_analyzer", {}).get("stimulus",
+                                             {}).get("concurrency", 0)
 
                 # Try to determine model name from directory structure first
                 if first_dir in model_name_map:
diff --git a/tests/integration/test_lists/waives.txt b/tests/integration/test_lists/waives.txt
@@ -480,11 +480,6 @@ test_e2e.py::test_ptp_quickstart_advanced_8gpus[Nemotron-Ultra-253B-nemotron-nas
 examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder] SKIP (https://nvbugs/5144931)
 examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoderplus] SKIP (https://nvbugs/5144931)
 unittest/_torch -k "not (modeling or multi_gpu or auto_deploy)" SKIP (https://nvbugs/5271015)
-stress_test/stress_test.py::test_run_stress_test[llama-v3-8b-instruct-hf_tp1-stress_time_300s_timeout_450s-MAX_UTILIZATION-pytorch-stress-test] SKIP (https://nvbugs/5277113)
-stress_test/stress_test.py::test_run_stress_test[llama-v3-8b-instruct-hf_tp1-stress_time_300s_timeout_450s-GUARANTEED_NO_EVICT-pytorch-stress-test] SKIP (https://nvbugs/5277113)
-stress_test/stress_test.py::test_run_stress_test[llama-v3-8b-instruct-hf_tp1-stress_time_300s_timeout_450s-MAX_UTILIZATION-trt-stress-test] SKIP (https://nvbugs/5277113)
-stress_test/stress_test.py::test_run_stress_test[llama-v3-8b-instruct-hf_tp1-stress_time_300s_timeout_450s-GUARANTEED_NO_EVICT-trt-stress-test] SKIP (https://nvbugs/5277113)
-test_e2e.py::test_trtllm_serve_example SKIP (https://nvbugs/5278517)
 examples/test_whisper.py::test_llm_whisper_general[large-v3-disable_gemm_plugin-disable_attention_plugin-disable_weight_only-float16-nb:1-use_python_runtime] SKIP (https://nvbugs/5244570)
 unittest/_torch/speculative/test_eagle3.py SKIP (https://nvbugs/5280806)
 test_e2e.py::test_ptp_quickstart_multimodal[qwen2-vl-7b-instruct-Qwen2-VL-7B-Instruct-image] SKIP (https://nvbugs/5226211)