Skip to content

Commit a480234

Browse files
authored
More precise metrics & logging (#4943)
In some cases, responses are generated by interceptors: 404 rejections using `RejectInterceptor`, CORS preflight requests, not-acceptable interceptor, exception interceptor. In most of these cases, there are no `Endpoint`s that are associated with the requests. Whatever the source of the response (endpoint invocation, decode failure, 404 rejection), such responses should be handled by interceptors such as logging & metrics. To properly log such interceptor-generated (or more precisely: `RequestHandler`-generated, without invoking an `EndpointHandler`) responses, the log interceptor must know if the response was generated by an endpoint, or request handler. This is achieved by adding an additional field to `RequestResult.Response`. The logging interceptor is enhanced by calling a new `ServerLog.requestHandledByInterceptor` method, which should log the interceptor-generated response. Moreover, the exception & logging interceptors are combined, so that upon an exception, first an exception is logged, then the exception is turned into a 500 response, and then that response is logged as well. The metrics interceptor is enhanced so that when a request is received, there's always some callback on the `EndpointMetric` (returned by the `Metric`) that is called - including interceptor-generated responses. Finally, active-requests gauge/up-down counter is changed so that it counts all requests. This also means that there's no endpoint-template (path) label associated with the metric. The ordering of interceptors is changed. Now, the metrics & logging+exception interceptors come first, followed by CORS, reject & not-acceptable.
1 parent dbaa5e1 commit a480234

File tree

78 files changed

+1564
-448
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+1564
-448
lines changed

doc/server/observability.md

Lines changed: 39 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,39 +10,59 @@ Certain endpoints can be ignored by adding their definitions to `ignoreEndpoints
1010
`Metric` wraps an aggregation object (like a counter or gauge), and needs to implement the `onRequest` function, which
1111
returns an `EndpointMetric` instance.
1212

13-
`Metric.onRequest` is used to create the proper metric description. Any additional data might be gathered there, like
14-
getting current timestamp and passing it down to `EndpointMetric` callbacks which are then executed in certain points of
15-
request processing.
13+
`Metric.onRequest` is used to create the proper metric description. Apart from triggering an initial metric, additional
14+
data might be gathered there, like getting current timestamp and passing it down to `EndpointMetric` callbacks which
15+
are then executed in certain points of request processing.
1616

17-
There are three callbacks in `EndpointMetric`:
17+
There are six callbacks in `EndpointMetric`:
1818

1919
1. `onEndpointRequest` - called after a request matches an endpoint (inputs are successfully decoded), or when decoding
2020
inputs fails, but a downstream interceptor provides a response.
21-
2. `onResponse` - called after response is assembled. Note that the response body might be lazily produced. To properly
22-
observe end-of-body, use the provided `BodyListener`.
23-
3. `onException` - called after exception is thrown (in underlying streamed body, and/or on any other exception when
24-
there's no default response)
21+
2. `onResponseHeaders` - called after response headers are assembled.
22+
3. `onResponseBody` - called after response body is complete. Note that the response body might be lazily produced.
23+
4. `onException` - called after exception is thrown (in underlying streamed body, and/or on any other exception when
24+
there's no default response).
25+
5. `onInterceptorResponse` - called when the response was generated by a request handler in an interceptor (e.g., a 404
26+
response from the reject interceptor), meaning no other metric callbacks associated with the response have been
27+
invoked. No endpoint or decode failures are associated with the response in such cases.
28+
6. `onDecodeFailure` - called when all endpoints failed to decode the request.
29+
30+
After `Metric.onRequest` is called, it's guaranteed that exactly one callback sequence on `EndpointMetric` will be invoked:
31+
32+
* `onEndpointRequest` followed by `onResponseHeaders` and `onResponseBody` ("happy path")
33+
* `onEndpointRequest` followed by `onException`
34+
* `onException` (exception in interceptor)
35+
* `onInterceptorResponse`
36+
* `onDecodeFailure`
2537

2638
## Metric labels
2739

28-
By default, request metrics are labeled by path and method. Response labels are additionally labelled by status code
29-
group. For example GET endpoint like `http://h:p/api/persons?name=Mike` returning 200 response will be labeled
30-
as `path="api/persons", method="GET", status="2xx"`. Query params are omitted by default, but it's possible to include
31-
them as shown in example below.
40+
By default, request metrics are labeled by method and if available, the matching endpoint's path template. Response
41+
labels are additionally labelled by status code group. For example GET endpoint like `http://h:p/api/persons?name=Mike`
42+
returning 200 response will be labeled as `path="api/persons", method="GET", status="2xx"`. Query params are omitted by
43+
default, but it's possible to include them as shown in example below.
3244

3345
If the path contains captures, the label will include the path capture name instead of the actual value, e.g.
3446
`api/persons/{name}`.
3547

36-
Labels for default metrics can be customized, any attribute from `Endpoint`, `ServerRequest` and `ServerResponse`
37-
could be used, for example:
48+
Labels for default metrics can be customized. Labels are split into three categories based on the data they need:
49+
50+
- `forRequest`: labels that only need the `ServerRequest` (e.g., method, protocol, headers)
51+
- `forEndpoint`: labels that only need the `AnyEndpoint` (e.g., path template, endpoint metadata)
52+
- `forResponse`: labels that need the response or exception (e.g., status code, error type)
53+
54+
For example:
3855

3956
```scala mdoc:compile-only
4057
import sttp.tapir.server.metrics.MetricLabels
4158

4259
val labels = MetricLabels(
4360
forRequest = List(
44-
"path" -> { case (ep, _) => ep.showPathTemplate() },
45-
"protocol" -> { case (_, req) => req.protocol }
61+
"method" -> { req => req.method.method },
62+
"protocol" -> { req => req.protocol }
63+
),
64+
forEndpoint = List(
65+
"path" -> { ep => ep.showPathTemplate() }
4666
),
4767
forResponse = Nil
4868
)
@@ -83,7 +103,7 @@ val routes: FutureRoute = NettyFutureServerInterpreter(serverOptions).toRoute(pr
83103

84104
By default, the following metrics are exposed:
85105

86-
* `tapir_request_active{path, method}` (gauge)
106+
* `tapir_request_active{method}` (gauge)
87107
* `tapir_request_total{path, method, status}` (counter)
88108
* `tapir_request_duration_seconds{path, method, status, phase}` (histogram)
89109

@@ -280,7 +300,8 @@ OtelJava
280300
}
281301
```
282302

283-
By default, the following metrics are exposed:
303+
By default, the following metrics are exposed, following the
304+
[otel semantic conventions](https://opentelemetry.io/docs/specs/semconv/http/http-metrics):
284305

285306
* `http.server.active_requests` (up-down-counter)
286307
* `http.server.requests.total` (counter)

integrations/zio/src/test/scala/sttp/tapir/ztapir/ZTapirTest.scala

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import zio.test.Assertion._
1616
import java.nio.charset.Charset
1717
import scala.util.{Success, Try}
1818
import scala.collection.immutable.Seq
19+
import sttp.tapir.server.interceptor.ResponseSource
1920

2021
object ZTapirTest extends ZIOSpecDefault with ZTapir {
2122

@@ -68,7 +69,12 @@ object ZTapirTest extends ZIOSpecDefault with ZTapir {
6869
}
6970

7071
private def errorToResponse(error: Throwable): UIO[RequestResult.Response[ResponseBodyType]] =
71-
ZIO.succeed(RequestResult.Response(ServerResponse[ResponseBodyType](StatusCode.InternalServerError, Nil, Some(error.getMessage), None)))
72+
ZIO.succeed(
73+
RequestResult.Response(
74+
ServerResponse[ResponseBodyType](StatusCode.InternalServerError, Nil, Some(error.getMessage), None),
75+
ResponseSource.EndpointHandler
76+
)
77+
)
7278

7379
final case class User(name: String)
7480

metrics/datadog-metrics/src/main/scala/sttp/tapir/server/metrics/datadog/DatadogMetrics.scala

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -66,17 +66,13 @@ object DatadogMetrics {
6666
(if (namespace.isBlank) "" else namespace + ".") + "request_active.count"
6767
)(client),
6868
onRequest = (req, counter, m) => {
69-
m.unit {
69+
val tags = mergeTags(labels.namesForRequest, labels.valuesForRequest(req))
70+
m.map(m.eval(counter.increment(tags))) { _ =>
7071
EndpointMetric()
71-
.onEndpointRequest { ep =>
72-
m.eval(counter.increment(mergeTags(labels.namesForRequest, labels.valuesForRequest(ep, req))))
73-
}
74-
.onResponseBody { (ep, _) =>
75-
m.eval(counter.decrement(mergeTags(labels.namesForRequest, labels.valuesForRequest(ep, req))))
76-
}
77-
.onException { (ep, _) =>
78-
m.eval(counter.decrement(mergeTags(labels.namesForRequest, labels.valuesForRequest(ep, req))))
79-
}
72+
.onResponseBody { (_, _) => m.eval(counter.decrement(tags)) }
73+
.onException { (_, _) => m.eval(counter.decrement(tags)) }
74+
.onInterceptorResponse { _ => m.eval(counter.decrement(tags)) }
75+
.onDecodeFailure { () => m.eval(counter.decrement(tags)) }
8076
}
8177
}
8278
)
@@ -93,18 +89,28 @@ object DatadogMetrics {
9389
m.eval {
9490
counter.increment(
9591
mergeTags(
96-
labels.namesForRequest ++ labels.namesForResponse,
97-
labels.valuesForRequest(ep, req) ++ labels.valuesForResponse(res)
92+
labels.namesForRequest ++ labels.namesForEndpoint ++ labels.namesForResponse,
93+
labels.valuesForRequest(req) ++ labels.valuesForEndpoint(ep) ++ labels.valuesForResponse(res)
9894
)
9995
)
10096
}
10197
}
10298
.onException { (ep, ex) =>
99+
m.eval {
100+
counter.increment(
101+
mergeTags(
102+
labels.namesForRequest ++ labels.namesForEndpoint ++ labels.namesForResponse,
103+
labels.valuesForRequest(req) ++ labels.valuesForEndpoint(ep) ++ labels.valuesForResponse(ex)
104+
)
105+
)
106+
}
107+
}
108+
.onInterceptorResponse { res =>
103109
m.eval {
104110
counter.increment(
105111
mergeTags(
106112
labels.namesForRequest ++ labels.namesForResponse,
107-
labels.valuesForRequest(ep, req) ++ labels.valuesForResponse(ex)
113+
labels.valuesForRequest(req) ++ labels.valuesForResponse(res)
108114
)
109115
)
110116
}
@@ -133,8 +139,10 @@ object DatadogMetrics {
133139
recoder.record(
134140
duration,
135141
mergeTags(
136-
labels.namesForRequest ++ labels.namesForResponse ++ List(labels.forResponsePhase.name),
137-
labels.valuesForRequest(ep, req) ++ labels.valuesForResponse(res) ++ List(labels.forResponsePhase.headersValue)
142+
labels.namesForRequest ++ labels.namesForEndpoint ++ labels.namesForResponse ++ List(labels.forResponsePhase.name),
143+
labels.valuesForRequest(req) ++ labels.valuesForEndpoint(ep) ++ labels.valuesForResponse(res) ++ List(
144+
labels.forResponsePhase.headersValue
145+
)
138146
)
139147
)
140148
}
@@ -144,19 +152,34 @@ object DatadogMetrics {
144152
recoder.record(
145153
duration,
146154
mergeTags(
147-
labels.namesForRequest ++ labels.namesForResponse ++ List(labels.forResponsePhase.name),
148-
labels.valuesForRequest(ep, req) ++ labels.valuesForResponse(res) ++ List(labels.forResponsePhase.bodyValue)
155+
labels.namesForRequest ++ labels.namesForEndpoint ++ labels.namesForResponse ++ List(labels.forResponsePhase.name),
156+
labels.valuesForRequest(req) ++ labels.valuesForEndpoint(ep) ++ labels.valuesForResponse(res) ++ List(
157+
labels.forResponsePhase.bodyValue
158+
)
149159
)
150160
)
151161
}
152162
}
153163
.onException { (ep, ex) =>
164+
m.eval {
165+
recoder.record(
166+
duration,
167+
mergeTags(
168+
labels.namesForRequest ++ labels.namesForEndpoint ++ labels.namesForResponse ++ List(labels.forResponsePhase.name),
169+
labels.valuesForRequest(req) ++ labels.valuesForEndpoint(ep) ++ labels.valuesForResponse(ex) ++ List(
170+
labels.forResponsePhase.bodyValue
171+
)
172+
)
173+
)
174+
}
175+
}
176+
.onInterceptorResponse { res =>
154177
m.eval {
155178
recoder.record(
156179
duration,
157180
mergeTags(
158181
labels.namesForRequest ++ labels.namesForResponse ++ List(labels.forResponsePhase.name),
159-
labels.valuesForRequest(ep, req) ++ labels.valuesForResponse(ex)
182+
labels.valuesForRequest(req) ++ labels.valuesForResponse(res) ++ List(labels.forResponsePhase.bodyValue)
160183
)
161184
)
162185
}

metrics/datadog-metrics/src/test/scala/sttp/tapir/server/metrics/datadog/DatadogMetricsTest.scala

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -93,13 +93,13 @@ class DatadogMetricsTest extends AnyFlatSpec with Matchers with BeforeAndAfter w
9393
waitReceiveMessage(statsdServer)
9494

9595
// then
96-
statsdServer.getReceivedMessages should contain("""tapir.request_active.count:1|c|#method:GET,path:/person""")
96+
statsdServer.getReceivedMessages should contain("""tapir.request_active.count:1|c|#method:GET""")
9797

9898
statsdServer.clear()
9999

100100
ScalaFutures.whenReady(response, Timeout(Span(3, Seconds))) { _ =>
101101
waitReceiveMessage(statsdServer)
102-
statsdServer.getReceivedMessages should contain("""tapir.request_active.count:-1|c|#method:GET,path:/person""")
102+
statsdServer.getReceivedMessages should contain("""tapir.request_active.count:-1|c|#method:GET""")
103103
}
104104
}
105105

@@ -131,8 +131,8 @@ class DatadogMetricsTest extends AnyFlatSpec with Matchers with BeforeAndAfter w
131131
waitReceiveMessage(statsdServer, "4xx")
132132

133133
// then
134-
statsdServer.getReceivedMessages should contain("""tapir.request_total.count:2|c|#status:2xx,method:GET,path:/person""")
135-
statsdServer.getReceivedMessages should contain("""tapir.request_total.count:2|c|#status:4xx,method:GET,path:/person""")
134+
statsdServer.getReceivedMessages should contain("""tapir.request_total.count:2|c|#status:2xx,path:/person,method:GET""")
135+
statsdServer.getReceivedMessages should contain("""tapir.request_total.count:2|c|#status:4xx,path:/person,method:GET""")
136136
}
137137

138138
"default metrics" should "collect requests duration" taggedAs Retryable in {
@@ -178,32 +178,33 @@ class DatadogMetricsTest extends AnyFlatSpec with Matchers with BeforeAndAfter w
178178

179179
// then
180180
// headers
181+
181182
statsdServer.getReceivedMessages should contain(
182-
"""tapir.request_duration_seconds:0.1|h|#phase:headers,status:2xx,method:GET,path:/person"""
183+
"""tapir.request_duration_seconds:0.1|h|#phase:headers,status:2xx,path:/person,method:GET"""
183184
)
184185
statsdServer.getReceivedMessages should contain(
185-
"""tapir.request_duration_seconds:0.2|h|#phase:headers,status:2xx,method:GET,path:/person"""
186+
"""tapir.request_duration_seconds:0.2|h|#phase:headers,status:2xx,path:/person,method:GET"""
186187
)
187188
statsdServer.getReceivedMessages should contain(
188-
"""tapir.request_duration_seconds:0.3|h|#phase:headers,status:2xx,method:GET,path:/person"""
189+
"""tapir.request_duration_seconds:0.3|h|#phase:headers,status:2xx,path:/person,method:GET"""
189190
)
190191

191192
// body
192193
statsdServer.getReceivedMessages should contain(
193-
"""tapir.request_duration_seconds:1.1|h|#phase:body,status:2xx,method:GET,path:/person"""
194+
"""tapir.request_duration_seconds:1.1|h|#phase:body,status:2xx,path:/person,method:GET"""
194195
)
195196
statsdServer.getReceivedMessages should contain(
196-
"""tapir.request_duration_seconds:2.2|h|#phase:body,status:2xx,method:GET,path:/person"""
197+
"""tapir.request_duration_seconds:2.2|h|#phase:body,status:2xx,path:/person,method:GET"""
197198
)
198199
statsdServer.getReceivedMessages should contain(
199-
"""tapir.request_duration_seconds:3.3|h|#phase:body,status:2xx,method:GET,path:/person"""
200+
"""tapir.request_duration_seconds:3.3|h|#phase:body,status:2xx,path:/person,method:GET"""
200201
)
201202
}
202203

203204
"default metrics" should "customize labels" taggedAs Retryable in {
204205
// given
205206
val serverEp = PersonsApi().serverEp
206-
val labels = MetricLabels(forRequest = List("key" -> { case (_, _) => "value" }), forResponse = Nil)
207+
val labels = MetricLabels(forRequest = List("key" -> { case _ => "value" }), forResponse = Nil, forEndpoint = Nil)
207208
val client =
208209
new NonBlockingStatsDClientBuilder()
209210
.hostname("localhost")
@@ -253,7 +254,7 @@ class DatadogMetricsTest extends AnyFlatSpec with Matchers with BeforeAndAfter w
253254
waitReceiveMessage(statsdServer)
254255

255256
// then
256-
statsdServer.getReceivedMessages should contain("""tapir.request_total.count:1|c|#status:5xx,method:GET,path:/person""")
257+
statsdServer.getReceivedMessages should contain("""tapir.request_total.count:1|c|#status:5xx,path:/person,method:GET""")
257258
}
258259
}
259260

metrics/opentelemetry-metrics/src/main/scala/sttp/tapir/server/metrics/opentelemetry/OpenTelemetryMetrics.scala

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,11 @@ object OpenTelemetryMetrics {
4848
*/
4949
lazy val OpenTelemetryAttributes: MetricLabels = MetricLabels(
5050
forRequest = List(
51-
HttpAttributes.HTTP_REQUEST_METHOD.getKey -> { case (_, req) => req.method.method },
52-
UrlAttributes.URL_SCHEME.getKey -> { case (_, req) => req.uri.scheme.getOrElse("unknown") },
53-
HttpAttributes.HTTP_ROUTE.getKey -> { case (ep, _) => ep.showPathTemplate(showQueryParam = None) }
51+
HttpAttributes.HTTP_REQUEST_METHOD.getKey -> { req => req.method.method },
52+
UrlAttributes.URL_SCHEME.getKey -> { req => req.uri.scheme.getOrElse("unknown") }
53+
),
54+
forEndpoint = List(
55+
HttpAttributes.HTTP_ROUTE.getKey -> { ep => ep.showPathTemplate(showQueryParam = None) }
5456
),
5557
forResponse = List(
5658
HttpAttributes.HTTP_RESPONSE_STATUS_CODE.getKey -> {
@@ -114,11 +116,13 @@ object OpenTelemetryMetrics {
114116
.setUnit("1")
115117
.build(),
116118
onRequest = (req, counter, m) => {
117-
m.unit {
119+
val attrs = asOpenTelemetryAttributesFromRequest(labels, req)
120+
m.map(m.eval(counter.add(1, attrs))) { _ =>
118121
EndpointMetric()
119-
.onEndpointRequest { ep => m.eval(counter.add(1, asOpenTelemetryAttributes(labels, ep, req))) }
120-
.onResponseBody { (ep, _) => m.eval(counter.add(-1, asOpenTelemetryAttributes(labels, ep, req))) }
121-
.onException { (ep, _) => m.eval(counter.add(-1, asOpenTelemetryAttributes(labels, ep, req))) }
122+
.onResponseBody { (_, _) => m.eval(counter.add(-1, attrs)) }
123+
.onException { (_, _) => m.eval(counter.add(-1, attrs)) }
124+
.onInterceptorResponse { _ => m.eval(counter.add(-1, attrs)) }
125+
.onDecodeFailure { () => m.eval(counter.add(-1, attrs)) }
122126
}
123127
}
124128
)
@@ -148,6 +152,13 @@ object OpenTelemetryMetrics {
148152
counter.add(1, otLabels)
149153
}
150154
}
155+
.onInterceptorResponse { res =>
156+
m.eval {
157+
val otLabels =
158+
merge(asOpenTelemetryAttributesFromRequest(labels, req), asOpenTelemetryAttributes(labels, Right(res), None))
159+
counter.add(1, otLabels)
160+
}
161+
}
151162
}
152163
}
153164
)
@@ -191,13 +202,30 @@ object OpenTelemetryMetrics {
191202
recorder.record(duration, otLabels)
192203
}
193204
}
205+
.onInterceptorResponse { res =>
206+
m.eval {
207+
val otLabels =
208+
merge(
209+
asOpenTelemetryAttributesFromRequest(labels, req),
210+
asOpenTelemetryAttributes(labels, Right(res), Some(labels.forResponsePhase.bodyValue))
211+
)
212+
recorder.record(duration, otLabels)
213+
}
214+
}
194215
}
195216
)
196217

197218
private def defaultMeter(otel: OpenTelemetry): Meter = otel.meterBuilder("tapir").setInstrumentationVersion("1.0.0").build()
198219

199-
private def asOpenTelemetryAttributes(l: MetricLabels, ep: AnyEndpoint, req: ServerRequest): Attributes =
200-
l.forRequest.foldLeft(Attributes.builder())((b, label) => { b.put(label._1, label._2(ep, req)) }).build()
220+
/** Attributes for requests when we only have request data (no endpoint matched). */
221+
private def asOpenTelemetryAttributesFromRequest(l: MetricLabels, req: ServerRequest): Attributes =
222+
l.forRequest.foldLeft(Attributes.builder())((b, label) => { b.put(label._1, label._2(req)) }).build()
223+
224+
/** Attributes for requests when we have both endpoint and request data. */
225+
private def asOpenTelemetryAttributes(l: MetricLabels, ep: AnyEndpoint, req: ServerRequest): Attributes = {
226+
val builder = l.forRequest.foldLeft(Attributes.builder())((b, label) => { b.put(label._1, label._2(req)) })
227+
l.forEndpoint.foldLeft(builder)((b, label) => { b.put(label._1, label._2(ep)) }).build()
228+
}
201229

202230
private def asOpenTelemetryAttributes(l: MetricLabels, res: Either[Throwable, ServerResponse[_]], phase: Option[String]): Attributes = {
203231
val builder = Attributes.builder()

0 commit comments

Comments
 (0)