Skip to content

Commit c66aea8

Browse files
committed
Enhance README and main.go to support multiple test modes; add diagnostic mode for comprehensive performance analysis
1 parent 665e74a commit c66aea8

File tree

2 files changed

+497
-36
lines changed

2 files changed

+497
-36
lines changed

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ A fast, concurrent benchmarking tool for measuring LLM API performance metrics a
1010
- **Real Metrics**: Measures End-to-End Latency, Time to First Token (TTFT), and Throughput
1111
- **Accurate Token Counting**: Uses tiktoken for precise token measurements
1212
- **Multi-Run Averaging**: Runs 3 concurrent iterations per provider and averages results for more reliable metrics
13+
- **Multiple Test Modes**: Streaming, tool-calling, and mixed modes for comprehensive testing
14+
- **Diagnostic Mode**: 1-minute stress test with 10 concurrent workers for in-depth performance analysis
1315
- **Session-Based Organization**: Each test run creates its own timestamped folder with logs and results
1416
- **Markdown Reports**: Auto-generates performance summaries with leaderboards and failure analysis
1517
- **Timeout Protection**: 2-minute timeout prevents indefinite hangs on stuck providers
@@ -63,6 +65,8 @@ cp example.env .env
6365

6466
## Usage
6567

68+
### Basic Usage
69+
6670
```bash
6771
# Generic provider with custom API endpoint
6872
./llm-api-speed --url https://api.openai.com/v1 --model gpt-4
@@ -78,6 +82,63 @@ cp example.env .env
7882
./llm-api-speed --all
7983
```
8084

85+
### Test Modes
86+
87+
The tool supports three different test modes to measure different aspects of API performance:
88+
89+
#### Streaming Mode (Default)
90+
Tests regular chat completion with streaming responses. This is the default mode.
91+
92+
```bash
93+
# Explicit streaming mode (default behavior)
94+
./llm-api-speed --provider nim
95+
```
96+
97+
#### Tool-Calling Mode
98+
Tests the API's tool/function calling capabilities with streaming. Measures performance when the model needs to generate tool calls.
99+
100+
```bash
101+
# Test tool-calling performance
102+
./llm-api-speed --provider nim --tool-calling
103+
```
104+
105+
#### Mixed Mode
106+
Runs 3 iterations of both streaming and tool-calling modes (6 total runs). Provides comprehensive performance metrics for both use cases.
107+
108+
```bash
109+
# Test both streaming and tool-calling
110+
./llm-api-speed --provider nim --mixed
111+
112+
# Test all providers with both modes
113+
./llm-api-speed --all --mixed
114+
```
115+
116+
### Diagnostic Mode
117+
118+
Diagnostic mode runs intensive stress testing with 10 concurrent workers for 1 minute, making requests every 15 seconds with a 30-second timeout per request. Perfect for:
119+
- Load testing your API endpoints
120+
- Identifying rate limits and throttling behavior
121+
- Measuring performance under sustained concurrent load
122+
- Debugging intermittent issues
123+
124+
```bash
125+
# Run diagnostic mode with streaming
126+
./llm-api-speed --provider nim --diagnostic
127+
128+
# Run diagnostic mode with tool-calling
129+
./llm-api-speed --provider nim --diagnostic --tool-calling
130+
131+
# Run diagnostic mode with mixed workload (alternates between streaming and tool-calling)
132+
./llm-api-speed --provider nim --diagnostic --mixed
133+
```
134+
135+
Diagnostic mode produces:
136+
- Detailed per-request logs for each worker
137+
- Aggregated success/failure statistics
138+
- Average metrics across all successful requests
139+
- Error frequency analysis
140+
- JSON summary file with all metrics
141+
81142
## Output
82143

83144
Each test run creates a session folder: `results/session-YYYYMMDD-HHMMSS/`

0 commit comments

Comments
 (0)