| sidebar-title |
|---|
HTTP Trace Metrics Guide |
This guide explains the HTTP request lifecycle tracing metrics available in AIPerf, which provide granular timing information at the transport layer for performance analysis and debugging.
AIPerf captures detailed timing information throughout the HTTP request lifecycle using the aiohttp tracing system. These metrics follow industry-standard conventions from k6 load testing and the HAR (HTTP Archive) specification, making them familiar and compatible with existing performance analysis tools.
Key characteristics:
- Trace metrics are captured using
time.perf_counter_ns()for nanosecond precision during measurement - When exported, timestamps are converted to wall-clock time (
time.time_ns()) for correlation with logs and external systems - The naming convention uses the
http_req_prefix to match k6's metric naming
Enabling trace timing output:
To display HTTP trace timing metrics in the console output, use the --show-trace-timing flag:
aiperf profile ... --show-trace-timingThis displays a separate table with the HTTP trace timing breakdown after the main metrics table.
The HTTP request lifecycle breaks down into distinct phases, each measured independently:
Request Lifecycle ────────────────────────────────────────────────────────────────────────────────────►
│ │ │ │ │ │
│◄─dns_lookup_ns─►│◄─connecting_ns─►│◄─ sending_ns ─►│◄──── waiting_ns ──►│◄──── receiving_ns ───►│
│ │ │ │ | │ │
dns resolution TCP+TLS request send request_send_end first body chunk last body chunk
(cache miss) handshake (last chunk) (ready for server) (response starts) (response complete)
│ │ │
└─ dns_cache_hit (skip lookup) └── response_headers_received
│
└─ connection_reused_ns (skip TCP/TLS)
These metrics capture the time spent establishing a connection before the HTTP request can be sent. They are specific to the aiohttp HTTP client.
| Metric | k6 Equivalent | HAR Equivalent | Description |
|---|---|---|---|
http_req_blocked |
http_req_blocked |
blocked |
Time spent waiting for a free connection slot from the connection pool. High values indicate pool saturation. |
http_req_dns_lookup |
http_req_looking_up |
dns |
Time spent on DNS resolution. Returns 0 if DNS was cached or connection was reused. |
http_req_connecting |
http_req_connecting |
connect |
Time to establish TCP connection. For HTTPS, this includes TLS handshake time. Returns 0 if connection was reused. |
http_req_connection_reused |
— | — | Binary indicator (0 or 1) showing whether an existing connection was reused from the pool. |
http_req_connection_overhead |
— | — | Combined overhead: blocked + dns_lookup + connecting. Total pre-request setup cost. |
These core timing metrics measure the actual HTTP request and response transfer. They are available for any HTTP client that populates the base trace data model.
| Metric | k6 Equivalent | HAR Equivalent | Description |
|---|---|---|---|
http_req_sending |
http_req_sending |
send |
Time to transmit the complete HTTP request (headers + body) to the server. |
http_req_waiting |
http_req_waiting |
wait |
Time to First Byte (TTFB) — time from request completion to first response body byte. Represents server processing time + network latency. Note: This measures time to the first body chunk, not the first header byte. |
http_req_receiving |
http_req_receiving |
receive |
Time to download the complete response body. Returns 0 for single-chunk responses. |
http_req_duration |
http_req_duration |
time |
Request/response exchange time (excluding connection overhead): sending + waiting + receiving |
http_req_total |
— | — | Full end-to-end time: sum of all 6 timing phases. See HTTP Total Time vs Request Latency. |
| Metric | Description |
|---|---|
http_req_data_sent |
Total bytes transmitted in the request (transport layer). |
http_req_data_received |
Total bytes received in the response (transport layer). |
http_req_chunks_sent |
Number of transport-level write operations during the request. |
http_req_chunks_received |
Number of transport-level read operations during the response. |
The http_req_duration metric is measured directly from timestamps for maximum accuracy:
http_req_duration = response_receive_end_perf_ns - request_send_start_perf_ns
This measures from when the request started being sent to when the response was fully received/finalized. Conceptually this covers sending + waiting + receiving, but the direct measurement is more accurate than summing components.
Connection overhead combines all pre-request setup time:
http_req_connection_overhead = http_req_blocked + http_req_dns_lookup + http_req_connecting
The http_req_total metric sums all 6 timing phases for a reconcilable breakdown:
http_req_total = http_req_blocked + http_req_dns_lookup + http_req_connecting
+ http_req_sending + http_req_waiting + http_req_receiving
Use http_req_total when you need the breakdown to add up exactly. Use http_req_duration when you want the most accurate single measurement of request/response exchange time.
-
TTFB vs TTFT:
http_req_waitingmeasures Time to First Byte (specifically, the first body byte after headers), not Time to First Token. The server sends HTTP headers first, then body content. For LLM APIs, the first body byte may contain protocol overhead before actual tokens appear. Use thetime_to_first_tokenmetric for LLM-specific timing that measures when the first actual token content is received. -
Connection reuse: When
http_req_connection_reused = 1, bothhttp_req_dns_lookupandhttp_req_connectingwill be0since no new connection was established.
You may notice that http_req_total can be larger than request_latency. This is expected behavior — the two metrics measure different things:
| Metric | Start | End | What it measures |
|---|---|---|---|
request_latency |
Before HTTP call | Last content response | Time to receive all meaningful tokens |
http_req_total |
Sum of phases starting at pool wait | Last network chunk | Sum of all HTTP timing phases |
http_req_duration |
Request send start | Last network chunk | Direct measured request/response exchange |
Why http_req_total > request_latency:
For streaming LLM responses (SSE), the HTTP stream typically ends with:
[content chunk 1] ─► included in both metrics
[content chunk 2] ─► included in both metrics
[content chunk N] ─► request_latency ends here (last actual token)
[usage info] ─► http_req_total includes this
[DONE] ─► http_req_total and http_req_duration end here (last network chunk)
The request_latency metric excludes trailing metadata ([DONE] markers, usage statistics) because those don't represent meaningful content delivery. The HTTP trace metrics include all network traffic.
Which metric should I use?
| Use Case | Recommended Metric |
|---|---|
| User-perceived latency ("when did I get the last useful token?") | request_latency |
| Transport-level timing with reconcilable breakdown | http_req_total |
| Most accurate single request/response measurement | http_req_duration |
| Debugging: gap between content and stream end | http_req_total - request_latency |
By default, raw HTTP trace data is not included in profile_export.jsonl to keep file sizes small. The computed metrics (http_req_duration, http_req_waiting, etc.) are always available regardless of this setting.
To include the full trace data (timestamps, chunks, headers, socket info), use the --export-http-trace flag:
aiperf profile ... --export-http-traceThe --export-http-trace flag works with records or raw export levels:
| Export Level | Trace Data (with --export-http-trace) |
Use Case |
|---|---|---|
summary |
Not available | Quick benchmark summaries |
records |
✓ Included | Per-request analysis with timing details |
raw |
✓ Included | Full debugging with complete request/response data |
Example with both flags:
aiperf profile ... --export-level records --export-http-traceWhen exported to profile_export.jsonl, trace data uses wall-clock timestamps (nanoseconds since epoch) for cross-system correlation. The trace data is included in each record:
{
"metadata": { "x_request_id": "9568f1d7-10e9-4d42-bb69-b06c87caae9f", "..." : "..." },
"metrics": {
"http_req_duration": {"value": 37421.26, "unit": "ms"},
"http_req_waiting": {"value": 4473.26, "unit": "ms"},
"http_req_blocked": {"value": 0.0, "unit": "ms"},
"..." : "..."
},
"trace_data": {
"trace_type": "aiohttp",
"request_send_start_ns": 1768309400341882300,
"request_headers": {"Host": "localhost:8000", "Content-Type": "application/json", "...": "..."},
"request_headers_sent_ns": 1768309400342515706,
"request_chunks": [[1768309400342549481, 91586]],
"request_send_end_ns": 1768309400342549481,
"request_chunks_count": 1,
"request_bytes_total": 91586,
"response_status_code": 200,
"response_reason": "OK",
"response_receive_start_ns": 1768309404815807889,
"response_headers": {"Content-Type": "text/event-stream; charset=utf-8", "...": "..."},
"response_headers_received_ns": 1768309400369701553,
"response_chunks": [[1768309404815807889, 565], [1768309405191294711, 268], "..."],
"response_chunks_count": 42,
"response_bytes_total": 16384,
"response_receive_end_ns": 1768309437763141384,
"sending_ns": 667181,
"waiting_ns": 4473258408,
"receiving_ns": 32947333495,
"duration_ns": 37421259084,
"tcp_connect_start_ns": 1768309400341999707,
"tcp_connect_end_ns": 1768309400342480993,
"connecting_ns": 481286,
"dns_cache_hit_ns": 1768309400342023726,
"local_ip": "127.0.0.1",
"local_port": 48362,
"remote_ip": "127.0.0.1",
"remote_port": 8000
},
"error": null
}The trace_data object contains both raw timestamps and computed durations:
Raw Timestamps (wall-clock nanoseconds):
| Field | Description |
|---|---|
request_send_start_ns |
When the HTTP request started being sent |
request_headers_sent_ns |
When the request headers finished being sent |
request_send_end_ns |
When the request body finished being sent (computed from last chunk) |
response_headers_received_ns |
When response headers were received |
response_receive_start_ns |
When the first response body chunk was received |
response_receive_end_ns |
When the response finished being received |
error_timestamp_ns |
When an error occurred during the request (if any) |
connection_pool_wait_start_ns |
When waiting for a connection started (aiohttp only) |
connection_pool_wait_end_ns |
When a connection was obtained (aiohttp only) |
tcp_connect_start_ns |
When TCP connection establishment started (aiohttp only) |
tcp_connect_end_ns |
When TCP connection completed (aiohttp only) |
connection_reused_ns |
When an existing connection was reused (aiohttp only) |
dns_lookup_start_ns |
When DNS resolution started (aiohttp only) |
dns_lookup_end_ns |
When DNS resolution completed (aiohttp only) |
dns_cache_hit_ns |
When a DNS cache hit occurred (aiohttp only) |
dns_cache_miss_ns |
When a DNS cache miss occurred (aiohttp only) |
Chunk Aggregates (always available):
| Field | Description |
|---|---|
request_chunks_count |
Number of request chunks sent |
request_bytes_total |
Total bytes sent in request chunks |
response_chunks_count |
Number of response chunks received |
response_bytes_total |
Total bytes received in response chunks |
Chunk Data (only with --export-http-trace, transport-layer granularity):
| Field | Description |
|---|---|
request_chunks |
Array of [timestamp_ns, size_bytes] tuples for each request chunk sent |
response_chunks |
Array of [timestamp_ns, size_bytes] tuples for each response chunk received |
Computed Durations (nanoseconds):
| Field | Description |
|---|---|
sending_ns |
Request send time |
waiting_ns |
Time to first byte (TTFB) |
receiving_ns |
Response transfer time |
duration_ns |
Total request duration |
blocked_ns |
Connection pool wait time |
dns_lookup_ns |
DNS resolution time |
connecting_ns |
TCP/TLS connection time |
Request/Response Metadata:
| Field | Description |
|---|---|
request_headers |
Dictionary of request headers sent |
response_status_code |
HTTP status code of the response |
response_reason |
HTTP status reason phrase (e.g., "OK", "Not Found") |
response_headers |
Dictionary of response headers received |
Connection Info (aiohttp only):
| Field | Description |
|---|---|
local_ip |
Local IP address used for the connection |
local_port |
Local (ephemeral) port used |
remote_ip |
Remote server IP address |
remote_port |
Remote server port |
If http_req_blocked is consistently high, your connection pool is exhausted. Consider:
- Increasing the connection pool size
- Reducing the number of concurrent requests
- Investigating slow responses that hold connections
If http_req_dns_lookup is high:
- DNS resolution is slow
- Consider using DNS caching or a faster resolver
- Check if DNS TTLs are appropriate
http_req_waiting (TTFB) isolates server-side latency:
- Low
sending+ Highwaiting= Server is the bottleneck - High
receiving= Large response or slow network throughput
Track http_req_connection_reused aggregated values:
- Values close to
1.0(100% reuse) indicate efficient keep-alive usage - Low reuse rates suggest connection churn, adding overhead via DNS lookups and TCP/TLS handshakes
When --export-http-trace is enabled, the request_chunks and response_chunks arrays provide transport-layer granularity useful for:
- Analyzing streaming response patterns from LLM APIs
- Debugging chunked transfer encoding issues
- Understanding network-level timing variations
Aggregate fields (request_chunks_count, request_bytes_total, response_chunks_count, response_bytes_total) are always available regardless of the export flag.
AIPerf trace metrics align with industry standards for compatibility with existing tools:
| Standard | Reference |
|---|---|
| k6 | Grafana k6 Built-in Metrics |
| HAR 1.2 | W3C HTTP Archive Specification |
Per the HAR 1.2 specification:
blocked,dns,connectuse-1when not applicable (AIPerf uses0ornull)send,wait,receiveare required non-negative valuestime(duration) equals the sum of all applicable timing phasesssltiming is included withinconnectfor backwards compatibility
The k6 http_req_tls_handshaking metric is not separated in AIPerf. TLS time is combined with TCP connection time in http_req_connecting because aiohttp's tracing API provides a combined measurement via on_connection_create_start/end events.
| What You Want to Know | Metric to Use |
|---|---|
| Full end-to-end HTTP time | http_req_total |
| Request/response time (excl. connection) | http_req_duration |
| Server processing time | http_req_waiting (TTFB) |
| Network transfer efficiency | http_req_receiving / http_req_data_received |
| Connection pool health | http_req_blocked |
| Connection reuse rate | http_req_connection_reused |
| DNS performance | http_req_dns_lookup |
| Pre-request overhead | http_req_connection_overhead |
| User-perceived latency (LLM) | request_latency (not an HTTP trace metric) |
- Working with Profile Export Files - How to parse and analyze AIPerf output files
- Source: trace_models.py - Trace data model definitions
- Source: http_trace_metrics.py - HTTP trace metric implementations