-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
🚀 The feature, motivation and pitch
1. Motivation
vLLM’s built-in benchmark currently reports
- TTFT – Time-to-First-Token
- TPT – Tokens-Per-Second
- ITL – Inter-Token-Latency
Those latency-centric numbers are perfect for pure-text LLMs, but they do not capture the quality or unique execution characteristics of multimodal large models (MMLMs) that take both text and images. Adding a fit-for-purpose multimodal benchmark would make vLLM even more valuable for researchers and practitioners.
2. What is missing right now?
- Dataset – No out-of-the-box multimodal test set that exercises image → text or text → image abilities.
- Metrics – Current numbers show speed only; they don’t answer “How well is the model performing on the task?”
- Evaluation harness – vLLM lacks a driver that loads multimodal samples, feeds them, and aggregates both quality and latency into one report.
3. Proposed Solution
3.1 Candidate Benchmark Datasets
Category | Suggested dataset | License | Rationale |
---|---|---|---|
VQA | VQAv2 , OK-VQA |
CC-BY-4.0 | Classic image-to-text Q&A |
Caption | MS-COCO Captions |
CC-BY-4.0 | Widely used; automatic metrics available |
Reasoning | MMMU , MMBench , ScienceQA |
CC-BY-NC-SA | Tests multi-hop visual + text reasoning |
Random | ImageNet-1k 5 k random subset | CC BY | Stress-tests generic vision encoder paths |
(One small, redistributable subset per task is usually enough—e.g., ~2 k images total.)
3.2 Suggested Metrics
• ETI – Encode-to-Token Interval: wall-time from first image byte received to first generated token.
• FPS-Enc – images processed per second (encoder throughput).
• Continue to report TTFT, TPT, ITL for apples-to-apples comparison with text-only runs
Noted: Benchmark for speculative decoding is available here and support custom dataset but not friendly
https://docs.google.com/document/d/1SbAnLNfCp04lHLJ_cF22IYc_StJ_U3jRRUSc3dQTxO0/edit?pli=1&tab=t.0
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.