Skip to content

Commit 5e8cfb7

Browse files
committed
Add Projected E2E Latency feature; implement calculation and reporting for normalized performance comparison across providers based on target token counts.
1 parent 6b0da0a commit 5e8cfb7

File tree

2 files changed

+274
-116
lines changed

2 files changed

+274
-116
lines changed

README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ A fast, concurrent benchmarking tool for measuring LLM API performance metrics a
88
- **Multiple Provider Support**: Test OpenAI, NVIDIA NIM, NovitaAI, NebiusAI, MiniMax, and any OpenAI-compatible API
99
- **Concurrent Testing**: Benchmark all providers simultaneously with `--all` flag
1010
- **Real Metrics**: Measures End-to-End Latency, Time to First Token (TTFT), and Throughput
11+
- **Projected E2E Latency**: Normalized metric for fair comparison across different token outputs
1112
- **Accurate Token Counting**: Uses tiktoken for precise token measurements
1213
- **Multi-Run Averaging**: Runs 3 concurrent iterations per provider and averages results for more reliable metrics
1314
- **Multiple Test Modes**: Streaming, tool-calling, and mixed modes for comprehensive testing
@@ -147,6 +148,40 @@ Diagnostic mode produces:
147148
./llm-api-speed --all --diagnostic --mixed
148149
```
149150

151+
### Projected E2E Latency Normalization
152+
153+
The `--target-tokens` flag enables **Projected E2E Latency**, a normalized metric that allows fair performance comparison across providers that generated different token counts.
154+
155+
**Formula:** `Projected E2E = TTFT + (Target Tokens / Throughput)`
156+
157+
This metric answers: *"How long would this provider take to generate exactly N tokens?"*
158+
159+
```bash
160+
# Calculate projected E2E for 350 tokens (default)
161+
./llm-api-speed --provider nim --diagnostic
162+
163+
# Calculate projected E2E for 500 tokens
164+
./llm-api-speed --provider nim --diagnostic --target-tokens 500
165+
166+
# Compare all providers normalized to 1000 tokens
167+
./llm-api-speed --all --diagnostic --target-tokens 1000
168+
```
169+
170+
**When to use:**
171+
- Comparing providers that generated different completion lengths (e.g., 34 tokens vs 384 tokens)
172+
- Identifying the fastest provider for your expected response length
173+
- Understanding performance tradeoffs (TTFT vs throughput) at different scales
174+
175+
**Output:**
176+
- Projected E2E appears in reports alongside actual E2E latency
177+
- New leaderboard: "By Projected E2E Latency" showing normalized rankings
178+
- JSON results include `projectedE2eLatency` field
179+
180+
**Example:** If Provider A has TTFT=5s and throughput=250 tok/s, its projected E2E for 350 tokens would be:
181+
```
182+
5s + (350 / 250) = 5s + 1.4s = 6.4s
183+
```
184+
150185
### Save Response Content
151186

152187
Use the `--save-responses` flag to save all API response content to files in the logs directory:

0 commit comments

Comments
 (0)