afs

yinggeh · yinggeh · commit 35cd4a3d1121 · 2024-08-15T01:39:10.000-07:00
diff --git a/docs/user_guide/metrics.md b/docs/user_guide/metrics.md
@@ -53,12 +53,22 @@ can be used. See the `tritonserver --help` output for more info on these CLI opt
 
 To change the interval at which metrics are polled/updated, see the `--metrics-interval-ms` flag. Metrics that are updated "Per Request" are unaffected by this interval setting. This interval only applies to metrics that are designated as "Per Interval" in the tables of each section below:
 
-- [Inference Request Metrics](#inference-request-metrics)
-- [GPU Metrics](#gpu-metrics)
-- [CPU Metrics](#cpu-metrics)
-- [Pinned Memory Metrics](#pinned-memory-metrics)
-- [Response Cache Metrics](#response-cache-metrics)
-- [Custom Metrics](#custom-metrics)
+- [Metrics](#metrics)
+  - [Inference Request Metrics](#inference-request-metrics)
+    - [Counts](#counts)
+      - [Failure Count Categories](#failure-count-categories)
+      - [Pending Request Count (Queue Size) Per-Model](#pending-request-count-queue-size-per-model)
+    - [Latencies](#latencies)
+      - [Counters](#counters)
+      - [Summaries](#summaries)
+  - [GPU Metrics](#gpu-metrics)
+  - [CPU Metrics](#cpu-metrics)
+  - [Pinned Memory Metrics](#pinned-memory-metrics)
+  - [Response Cache Metrics](#response-cache-metrics)
+    - [Triton-reported Response Cache Metrics](#triton-reported-response-cache-metrics)
+  - [Custom Metrics](#custom-metrics)
+    - [TensorRT-LLM Backend Metrics](#tensorrt-llm-backend-metrics)
+    - [vLLM Backend Metrics](#vllm-backend-metrics)
 
 ## Inference Request Metrics
 
@@ -378,3 +388,9 @@ Further documentation can be found in the `TRITONSERVER_MetricFamily*` and
 The TRT-LLM backend uses the custom metrics API to track and expose specific metrics about
 LLMs, KV Cache, and Inflight Batching to Triton:
 https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#triton-metrics
+
+### vLLM Backend Metrics
+
+The vLLM backend uses the custom metrics API to track and expose specific metrics about
+LLMs to Triton:
+https://github.com/triton-inference-server/vllm_backend?tab=readme-ov-file#triton-metrics