Skip to content

Commit 35cd4a3

Browse files
committed
afs
1 parent 9818033 commit 35cd4a3

File tree

1 file changed

+22
-6
lines changed

1 file changed

+22
-6
lines changed

docs/user_guide/metrics.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,22 @@ can be used. See the `tritonserver --help` output for more info on these CLI opt
5353

5454
To change the interval at which metrics are polled/updated, see the `--metrics-interval-ms` flag. Metrics that are updated "Per Request" are unaffected by this interval setting. This interval only applies to metrics that are designated as "Per Interval" in the tables of each section below:
5555

56-
- [Inference Request Metrics](#inference-request-metrics)
57-
- [GPU Metrics](#gpu-metrics)
58-
- [CPU Metrics](#cpu-metrics)
59-
- [Pinned Memory Metrics](#pinned-memory-metrics)
60-
- [Response Cache Metrics](#response-cache-metrics)
61-
- [Custom Metrics](#custom-metrics)
56+
- [Metrics](#metrics)
57+
- [Inference Request Metrics](#inference-request-metrics)
58+
- [Counts](#counts)
59+
- [Failure Count Categories](#failure-count-categories)
60+
- [Pending Request Count (Queue Size) Per-Model](#pending-request-count-queue-size-per-model)
61+
- [Latencies](#latencies)
62+
- [Counters](#counters)
63+
- [Summaries](#summaries)
64+
- [GPU Metrics](#gpu-metrics)
65+
- [CPU Metrics](#cpu-metrics)
66+
- [Pinned Memory Metrics](#pinned-memory-metrics)
67+
- [Response Cache Metrics](#response-cache-metrics)
68+
- [Triton-reported Response Cache Metrics](#triton-reported-response-cache-metrics)
69+
- [Custom Metrics](#custom-metrics)
70+
- [TensorRT-LLM Backend Metrics](#tensorrt-llm-backend-metrics)
71+
- [vLLM Backend Metrics](#vllm-backend-metrics)
6272

6373
## Inference Request Metrics
6474

@@ -378,3 +388,9 @@ Further documentation can be found in the `TRITONSERVER_MetricFamily*` and
378388
The TRT-LLM backend uses the custom metrics API to track and expose specific metrics about
379389
LLMs, KV Cache, and Inflight Batching to Triton:
380390
https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#triton-metrics
391+
392+
### vLLM Backend Metrics
393+
394+
The vLLM backend uses the custom metrics API to track and expose specific metrics about
395+
LLMs to Triton:
396+
https://github.com/triton-inference-server/vllm_backend?tab=readme-ov-file#triton-metrics

0 commit comments

Comments
 (0)