diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/observability/index.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/observability/index.adoc index 222ce3ea020..2ddeacadd46 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/observability/index.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/observability/index.adoc @@ -363,3 +363,161 @@ Spring AI supports logging vector search response data, useful for troubleshooti |=== WARNING: If you enable logging of the vector search response data, there's a risk of exposing sensitive or private information. Please, be careful! + +== More Metrics Reference + +This section documents the metrics emitted by Spring AI components as they appear in Prometheus. + +=== Metric Naming Conventions + +Spring AI uses Micrometer. Base metric names use dots (e.g., `gen_ai.client.operation`), which Prometheus exports with underscores and standard suffixes: + +* **Timers** → `_seconds_count`, `_seconds_sum`, `_seconds_max`, and (when supported) `_active_count` +* **Counters** → `_total` (monotonic) + +[NOTE] +==== +The following shows how base metric names expand to Prometheus time series. + +[cols="2,3", options="header", stripes=even] +|=== +| Base metric name | Exported time series +| `gen_ai.client.operation` | +`gen_ai_client_operation_seconds_count` + +`gen_ai_client_operation_seconds_sum` + +`gen_ai_client_operation_seconds_max` + +`gen_ai_client_operation_active_count` +| `db.vector.client.operation` | +`db_vector_client_operation_seconds_count` + +`db_vector_client_operation_seconds_sum` + +`db_vector_client_operation_seconds_max` + +`db_vector_client_operation_active_count` +|=== +==== + +==== References + +* OpenTelemetry — https://opentelemetry.io/docs/specs/semconv/gen-ai/[Semantic Conventions for Generative AI (overview)] +* Micrometer — https://docs.micrometer.io/micrometer/reference/concepts/naming.html[Naming Meters] + +=== Chat Client Metrics + +[cols="2,2,1,3", stripes=even] +|=== +|Metric Name | Type | Unit | Description + +|`gen_ai_chat_client_operation_seconds_sum` +|Timer +|seconds +|Total time spent in ChatClient operations (call/stream) + +|`gen_ai_chat_client_operation_seconds_count` +|Counter +|count +|Number of completed ChatClient operations + +|`gen_ai_chat_client_operation_seconds_max` +|Gauge +|seconds +|Maximum observed duration of ChatClient operations + +|`gen_ai_chat_client_operation_active_count` +|Gauge +|count +|Number of ChatClient operations currently in flight +|=== + +*Active vs Completed*: `*_active_count` shows in-flight calls; the `_seconds_*` series reflect only completed calls. + +=== Chat Model Metrics (Model provider execution) + +[cols="2,2,1,3", stripes=even] +|=== +|Metric Name | Type | Unit | Description + +|`gen_ai_client_operation_seconds_sum` +|Timer +|seconds +|Total time executing chat model operations + +|`gen_ai_client_operation_seconds_count` +|Counter +|count +|Number of completed chat model operations + +|`gen_ai_client_operation_seconds_max` +|Gauge +|seconds +|Maximum observed duration for chat model operations + +|`gen_ai_client_operation_active_count` +|Gauge +|count +|Number of chat model operations currently in flight +|=== + +==== Token Usage + +[cols="2,2,1,3", stripes=even] +|=== +|Metric Name | Type | Unit | Description + +|`gen_ai_client_token_usage_total` +|Counter +|tokens +|Total tokens consumed, labeled by token type +|=== + +==== Labels + +[cols="2,3", options="header", stripes=even] +|=== +|Label | Meaning +|`gen_ai_token_type=input` | Prompt tokens sent to the model +|`gen_ai_token_type=output` | Completion tokens returned by the model +|`gen_ai_token_type=total` | Input + output +|=== + +=== Vector Store Metrics + +[cols="2,2,1,3", stripes=even] +|=== +|Metric Name | Type | Unit | Description + +|`db_vector_client_operation_seconds_sum` +|Timer +|seconds +|Total time spent in vector store operations (add/delete/query) + +|`db_vector_client_operation_seconds_count` +|Counter +|count +|Number of completed vector store operations + +|`db_vector_client_operation_seconds_max` +|Gauge +|seconds +|Maximum observed duration for vector store operations + +|`db_vector_client_operation_active_count` +|Gauge +|count +|Number of vector store operations currently in flight +|=== + +==== Labels + +[cols="2,3", options="header", stripes=even] +|=== +|Label | Meaning +|`db_operation_name` | Operation type (`add`, `delete`, `query`) +|`db_system` | Vector DB/provider (`redis`, `chroma`, `pgvector`, …) +|`spring_ai_kind` | `vector_store` +|=== + +=== Understanding Active vs Completed + +* **Active (`*_active_count`)** — instantaneous gauge of in-progress operations (concurrency/load). +* **Completed (`*_seconds_sum|count|max`)** — statistics for operations that have finished: +* `_seconds_sum / _seconds_count` → average latency +* `_seconds_max` → high-water mark since last scrape (subject to registry behavior)