feat: implement dataset-agnostic router benchmark #124

rootfs · 2025-09-12T19:07:03Z

What type of PR is this?

The router_reason_bench.py is based on MMLU-Pro. Need coverage of other datasets to fully eval the differences between the router vs direct vLLM calls.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

netlify · 2025-09-12T19:07:08Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`7849a2a`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68c4815158c5980008297aae
😎 Deploy Preview	https://deploy-preview-124--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-09-12T19:08:12Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `bench`

Owners: @yuezhu1, @Xunzhuo
Files changed:

bench/README.md
bench/benchmark_comparison.sh
bench/dataset_factory.py
bench/dataset_implementations/__init__.py
bench/dataset_implementations/arc_dataset.py
bench/dataset_implementations/commonsenseqa_dataset.py
bench/dataset_implementations/gpqa_dataset.py
bench/dataset_implementations/hellaswag_dataset.py
bench/dataset_implementations/mmlu_dataset.py
bench/dataset_implementations/truthfulqa_dataset.py
bench/dataset_interface.py
bench/router_reason_bench_multi_dataset.py
bench/bench_plot.py
bench/router_reason_bench.py

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

…tion support. Add ARC, GPQA, TruthfulQA, CommonsenseQA, and HellaSwag datasets with optimized token limits and robust answer extraction. Signed-off-by: Huamin Chen <[email protected]>

rootfs requested review from Xunzhuo and yuezhu1 as code owners September 12, 2025 19:07

rootfs force-pushed the bench-v2 branch from f3ee07f to 9ba649d Compare September 12, 2025 19:07

github-actions bot assigned Xunzhuo and yuezhu1 Sep 12, 2025

rootfs marked this pull request as draft September 12, 2025 19:19

rootfs force-pushed the bench-v2 branch from 23b45f5 to 3456105 Compare September 12, 2025 19:22

feat: implement dataset-agnostic benchmark with multi-category evalua…

7849a2a

…tion support. Add ARC, GPQA, TruthfulQA, CommonsenseQA, and HellaSwag datasets with optimized token limits and robust answer extraction. Signed-off-by: Huamin Chen <[email protected]>

rootfs force-pushed the bench-v2 branch from f6fe593 to 7849a2a Compare September 12, 2025 20:23

rootfs closed this Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement dataset-agnostic router benchmark #124

feat: implement dataset-agnostic router benchmark #124

Uh oh!

rootfs commented Sep 12, 2025

Uh oh!

netlify bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: implement dataset-agnostic router benchmark #124

feat: implement dataset-agnostic router benchmark #124

Uh oh!

Conversation

rootfs commented Sep 12, 2025

Uh oh!

netlify bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 bench

🎉 Thanks for your contributions!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Sep 12, 2025 •

edited

Loading

github-actions bot commented Sep 12, 2025 •

edited

Loading

📁 `bench`