You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Projected E2E Latency feature; implement calculation and reporting for normalized performance comparison across providers based on target token counts.
Copy file name to clipboardExpand all lines: README.md
+35Lines changed: 35 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,7 @@ A fast, concurrent benchmarking tool for measuring LLM API performance metrics a
8
8
-**Multiple Provider Support**: Test OpenAI, NVIDIA NIM, NovitaAI, NebiusAI, MiniMax, and any OpenAI-compatible API
9
9
-**Concurrent Testing**: Benchmark all providers simultaneously with `--all` flag
10
10
-**Real Metrics**: Measures End-to-End Latency, Time to First Token (TTFT), and Throughput
11
+
-**Projected E2E Latency**: Normalized metric for fair comparison across different token outputs
11
12
-**Accurate Token Counting**: Uses tiktoken for precise token measurements
12
13
-**Multi-Run Averaging**: Runs 3 concurrent iterations per provider and averages results for more reliable metrics
13
14
-**Multiple Test Modes**: Streaming, tool-calling, and mixed modes for comprehensive testing
@@ -147,6 +148,40 @@ Diagnostic mode produces:
147
148
./llm-api-speed --all --diagnostic --mixed
148
149
```
149
150
151
+
### Projected E2E Latency Normalization
152
+
153
+
The `--target-tokens` flag enables **Projected E2E Latency**, a normalized metric that allows fair performance comparison across providers that generated different token counts.
0 commit comments