Skip to content

Commit ee04c0c

Browse files
authored
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability (vllm-project#26030)
Signed-off-by: mgoin <[email protected]>
1 parent c36f0aa commit ee04c0c

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

.buildkite/test-pipeline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -845,7 +845,7 @@ steps:
845845
- vllm/v1/attention/backends/flashinfer.py
846846
commands:
847847
- uv pip install --system 'gpt-oss[eval]==0.0.5'
848-
- pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py --model openai/gpt-oss-20b --metric 0.58 --server-args '--tensor-parallel-size 2'
848+
- pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py --model openai/gpt-oss-20b --metric 0.58
849849

850850
- label: Blackwell Quantized MoE Test
851851
timeout_in_minutes: 60

tests/evals/gpt_oss/test_gpqa_correctness.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ def run_gpqa_eval(model_name: str, base_url: str) -> float:
2626
# Build the command to run the evaluation
2727
cmd = [
2828
sys.executable, "-m", "gpt_oss.evals", "--eval", "gpqa", "--model",
29-
model_name, "--reasoning-effort", "low", "--base-url", base_url
29+
model_name, "--reasoning-effort", "low", "--base-url", base_url,
30+
"--n-threads", "200"
3031
]
3132

3233
try:
@@ -72,8 +73,6 @@ def test_gpqa_correctness(request):
7273

7374
# Add standard server arguments
7475
server_args.extend([
75-
"--max-model-len",
76-
"32768",
7776
"--trust-remote-code",
7877
])
7978

0 commit comments

Comments
 (0)