| sidebar-title | Time Slicing for Performance Analysis |
|---|
Time slicing allows you to analyze performance metrics across sequential time windows during a benchmark run. This feature provides visibility into performance trends, degradation patterns, and system behavior over time.
Time slicing divides your benchmark into equal duration segments, computing metrics independently for each segment. This enables:
- Performance Trend Analysis: Identify if performance degrades, improves, or stabilizes over time
- Warm-up Detection: Distinguish initial cold-start behavior from steady-state performance
- Resource Exhaustion: Spot gradual performance degradation due to memory leaks or resource pressure
- Load Pattern Impact: Understand how different phases of load affect system performance
- Time-series Visualization: Export data suitable for plotting performance trends
--slice-duration SECONDS: Duration of each time slice (accepts integers or floats)- Recommended to be used with
--benchmark-duration - Creates non-overlapping sequential time windows
- Example: 60-second benchmark with 10-second slices creates 6 time windows
- When using time-based benchmarking, a grace period may add additional time slices
--benchmark-duration SECONDS: Total benchmark duration- Must be greater than
--slice-duration - Determines how many slices will be created
# Start vLLM server for time slicing demonstration
docker pull vllm/vllm-openai:latest
docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
--model Qwen/Qwen3-0.6B \
--host 0.0.0.0 --port 8000 &# Wait for server to be ready
timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen3-0.6B\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; }Run a 60-second benchmark with 10-second slices to analyze performance trends:
aiperf profile \
--model Qwen/Qwen3-0.6B \
--benchmark-duration 60 \
--slice-duration 10Sample Output (Successful Run):
INFO Starting AIPerf System
INFO Timeslice analysis enabled: 10 second intervals
INFO AIPerf System is PROFILING
Profiling: [01:00] - Running for 60 seconds...
INFO Benchmark completed successfully
INFO Results saved to: artifacts/Qwen_Qwen3-0.6B-chat-duration60/
JSON Export: artifacts/Qwen_Qwen3-0.6B-chat-duration60/profile_export_aiperf.json
Timeslices: artifacts/Qwen_Qwen3-0.6B-chat-duration60/profile_export_aiperf_timeslices.json
Timeslices CSV: artifacts/Qwen_Qwen3-0.6B-chat-duration60/profile_export_aiperf_timeslices.csv
This creates 6 time slices (0-10s, 10-20s, 20-30s, 30-40s, 40-50s, 50-60s), each with independent metrics.
When time slicing is enabled, AIPerf generates additional output files:
File: artifacts/profile_export_aiperf_timeslices.csv
The CSV uses a "tidy" (long) format optimized for data analysis:
Timeslice,Metric,Unit,Stat,Value
0,Time to First Token,ms,avg,45.23
0,Time to First Token,ms,min,32.10
0,Time to First Token,ms,max,78.45
0,Time to First Token,ms,p50,44.20
0,Time to First Token,ms,p90,65.30
0,Time to First Token,ms,p95,70.15
0,Time to First Token,ms,p99,76.80
0,Inter Token Latency,ms,avg,12.34
0,Inter Token Latency,ms,min,8.50
...
1,Time to First Token,ms,avg,46.78
1,Time to First Token,ms,min,33.20
...Format Details:
- Timeslice: Zero-indexed slice number (0, 1, 2, ...)
- Metric: Human-readable metric name (e.g., "Time to First Token")
- Unit: Measurement unit (ms, tokens/sec, etc.)
- Stat: Statistical measure (avg, min, max, p50, p90, p95, p99)
- Value: Numeric value formatted to 2 decimal places
File: artifacts/profile_export_aiperf_timeslices.json
The JSON provides a hierarchical structure with all timeslices in a single file:
{
"timeslices": [
{
"timeslice_index": 0,
"time_to_first_token": {
"unit": "ms",
"avg": 45.23,
"min": 32.10,
"max": 78.45,
"p50": 44.20,
"p90": 65.30,
"p95": 70.15,
"p99": 76.80
},
"inter_token_latency": {
"unit": "ms",
"avg": 12.34,
"min": 8.50,
"max": 18.90,
"p50": 12.10,
"p90": 15.80,
"p95": 16.50,
"p99": 17.90
},
...
},
{
"timeslice_index": 1,
"time_to_first_token": {
"unit": "ms",
"avg": 46.78,
...
},
...
}
],
"input_config": {
"model": "Qwen/Qwen3-0.6B",
"endpoint": "/v1/chat/completions",
...
}
}Key Fields:
timeslices: Array of slice objects, ordered by timetimeslice_index: Zero-indexed slice identifier- Each metric contains
unitand available statistics input_config: Benchmark configuration for reproducibility
- Identify initial cold-start latency vs. steady-state performance.
- Expected pattern: Higher latency in slice 0, stabilizing in later slices.
- Monitor for memory leaks or resource exhaustion.
- Look for: Increasing latency or decreasing throughput in later slices.
- Combine with varying concurrency patterns.
- Compare slice patterns across different load levels.
To be announced...
**Timeslice Boundaries:** - Timeslices are calculated based on absolute wall clock time divisions - The first timeslice may be shorter if requests don't start exactly at a timeslice boundary - The last timeslice may be shorter if the benchmark ends mid-slice **Statistical Considerations:** - Very short slices may have high variance and unstable metrics - Low-concurrency benchmarks need longer slices for adequate sample sizeProblem: Running with --slice-duration but no *_timeslices.* files appear.
Solutions:
- Verify
--slice-duration(in seconds) is less than the benchmark duration - Check that benchmark completed successfully (not cancelled/interrupted)
- Confirm output directory is writable
Problem: Metrics fluctuate wildly between consecutive slices.
Solutions:
- Increase
--slice-durationfor more stable statistics - Increase
--concurrencyto generate more requests per slice - Check for external factors (other processes, network issues)
- Use longer warmup period (
--warmup-request-count)
- Time-based Benchmarking - Understanding
--benchmark-duration - Working with Profile Exports - General export formats
- GPU Telemetry - Correlating GPU metrics with performance
- Request Rate and Concurrency - Load generation strategies