spring-projects · HeeChanN · Aug 24, 2025
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/observability/index.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/observability/index.adoc
@@ -363,3 +363,161 @@ Spring AI supports logging vector search response data, useful for troubleshooti
 |===
 
 WARNING: If you enable logging of the vector search response data, there's a risk of exposing sensitive or private information. Please, be careful!
+
+== More Metrics Reference
+
+This section documents the metrics emitted by Spring AI components as they appear in Prometheus.
+
+=== Metric Naming Conventions
+
+Spring AI uses Micrometer. Base metric names use dots (e.g., `gen_ai.client.operation`), which Prometheus exports with underscores and standard suffixes:
+
+* **Timers** → `<base>_seconds_count`, `<base>_seconds_sum`, `<base>_seconds_max`, and (when supported) `<base>_active_count`
+* **Counters** → `<base>_total` (monotonic)
+
+[NOTE]
+====
+The following shows how base metric names expand to Prometheus time series.
+
+[cols="2,3", options="header", stripes=even]
+|===
+| Base metric name | Exported time series
+| `gen_ai.client.operation` |
+`gen_ai_client_operation_seconds_count` +
+`gen_ai_client_operation_seconds_sum` +
+`gen_ai_client_operation_seconds_max` +
+`gen_ai_client_operation_active_count`
+| `db.vector.client.operation` |
+`db_vector_client_operation_seconds_count` +
+`db_vector_client_operation_seconds_sum` +
+`db_vector_client_operation_seconds_max` +
+`db_vector_client_operation_active_count`
+|===
+====
+
+==== References
+
+* OpenTelemetry — https://opentelemetry.io/docs/specs/semconv/gen-ai/[Semantic Conventions for Generative AI (overview)]
+* Micrometer — https://docs.micrometer.io/micrometer/reference/concepts/naming.html[Naming Meters]
+
+=== Chat Client Metrics
+
+[cols="2,2,1,3", stripes=even]
+|===
+|Metric Name | Type | Unit | Description
+
+|`gen_ai_chat_client_operation_seconds_sum`
+|Timer
+|seconds
+|Total time spent in ChatClient operations (call/stream)
+
+|`gen_ai_chat_client_operation_seconds_count`
+|Counter
+|count
+|Number of completed ChatClient operations
+
+|`gen_ai_chat_client_operation_seconds_max`
+|Gauge
+|seconds
+|Maximum observed duration of ChatClient operations
+
+|`gen_ai_chat_client_operation_active_count`
+|Gauge
+|count
+|Number of ChatClient operations currently in flight
+|===
+
+*Active vs Completed*: `*_active_count` shows in-flight calls; the `_seconds_*` series reflect only completed calls.
+
+=== Chat Model Metrics (Model provider execution)
+
+[cols="2,2,1,3", stripes=even]
+|===
+|Metric Name | Type | Unit | Description
+
+|`gen_ai_client_operation_seconds_sum`
+|Timer
+|seconds
+|Total time executing chat model operations
+
+|`gen_ai_client_operation_seconds_count`
+|Counter
+|count
+|Number of completed chat model operations
+
+|`gen_ai_client_operation_seconds_max`
+|Gauge
+|seconds
+|Maximum observed duration for chat model operations
+
+|`gen_ai_client_operation_active_count`
+|Gauge
+|count
+|Number of chat model operations currently in flight
+|===
+
+==== Token Usage
+
+[cols="2,2,1,3", stripes=even]
+|===
+|Metric Name | Type | Unit | Description
+
+|`gen_ai_client_token_usage_total`
+|Counter
+|tokens
+|Total tokens consumed, labeled by token type
+|===
+
+==== Labels
+
+[cols="2,3", options="header", stripes=even]
+|===
+|Label | Meaning
+|`gen_ai_token_type=input` | Prompt tokens sent to the model
+|`gen_ai_token_type=output` | Completion tokens returned by the model
+|`gen_ai_token_type=total` | Input + output
+|===
+
+=== Vector Store Metrics
+
+[cols="2,2,1,3", stripes=even]
+|===
+|Metric Name | Type | Unit | Description
+
+|`db_vector_client_operation_seconds_sum`
+|Timer
+|seconds
+|Total time spent in vector store operations (add/delete/query)
+
+|`db_vector_client_operation_seconds_count`
+|Counter
+|count
+|Number of completed vector store operations
+
+|`db_vector_client_operation_seconds_max`
+|Gauge
+|seconds
+|Maximum observed duration for vector store operations
+
+|`db_vector_client_operation_active_count`
+|Gauge
+|count
+|Number of vector store operations currently in flight
+|===
+
+==== Labels
+
+[cols="2,3", options="header", stripes=even]
+|===
+|Label | Meaning
+|`db_operation_name` | Operation type (`add`, `delete`, `query`)
+|`db_system` | Vector DB/provider (`redis`, `chroma`, `pgvector`, …)
+|`spring_ai_kind` | `vector_store`
+|===
+
+=== Understanding Active vs Completed
+
+* **Active (`*_active_count`)** — instantaneous gauge of in-progress operations (concurrency/load).
+* **Completed (`*_seconds_sum|count|max`)** — statistics for operations that have finished:
+* `_seconds_sum / _seconds_count` → average latency
+* `_seconds_max` → high-water mark since last scrape (subject to registry behavior)