Failsafe 3.0 instrumentation introduced #14057

onurkybsi · 2025-06-18T06:39:42Z

As requested in #10856, Failsafe instrumentation introduced. Intrumentation of other Failsafe policies such as RetryPolicy, RateLimiter, Bulkhead will be added with upcoming PRs.

settings.gradle.kts

...sting/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/AbstractFailsafeTest.java

...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java

instrumentation/failsafe-3.0/library/build.gradle.kts

settings.gradle.kts

laurit · 2025-07-10T08:49:51Z

docs/supported-libraries.md

 | [Elasticsearch API Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html)                         | 7.16 - 7.17.19,<br>8.0 - 8.9.+ [4] | N/A                                                                                                                                                                                                                                                                                                                                                                                     | [Elasticsearch Client Spans]                                                                                                            |
 | [Elasticsearch REST Client](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html)                              | 5.0+                               | N/A                                                                                                                                                                                                                                                                                                                                                                                     | [Database Client Spans], [Database Client Metrics]&nbsp;[6]                                                                             |
 | [Elasticsearch Transport Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html)                          | 5.0+                               | N/A                                                                                                                                                                                                                                                                                                                                                                                     | [Database Client Spans], [Database Client Metrics]&nbsp;[6]                                                                             |
+| [Failsafe](https://failsafe.dev/)                                                                                                           | 3.0.1+                             | [opentelemetry-failsafe-3.0](../instrumentation/failsafe-3.0/library)                                                                                                                                                                                                                                                                                                                   | none                                                                                                                                    |


@trask I think this might be the first library only instrumentation we have. Do we need to point this out somehow here? Set the Auto-instrumented versions to N/A? Any suggestions?

N/A in the auto-instrumented versions column sounds good to me 👍

Hey, this is fixed. However, fyi, after I'm done with other Failsafe policies'(RetryPolicy, RateLimiter, Bulkhead) library instrumentation, I'm planning to work on the auto insturmentation.

laurit · 2025-09-04T09:34:02Z

...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java

+      CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) {
+    LongCounter successCounter =
+        meter
+            .counterBuilder("failsafe.circuit_breaker.success.count")


@trask could you suggest units for these metrics

maybe {execution}

laurit · 2025-09-04T09:38:40Z

.../library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetry.java

+public final class FailsafeTelemetry {
+  private static final String INSTRUMENTATION_NAME = "io.opentelemetry.failsafe-3.0";
+
+  private static final AttributeKey<String> CIRCUIT_BREAKER_NAME = AttributeKey.stringKey("name");


@trask is using name here fine or should it be circuit_breaker.name?

I'd suggest going even further: failsafe.circuit_breaker.name

prefixing it with failsafe.* makes it clear it's an instrumentation specific attribute and we're not attempting to create a general circuit breaker semantic convention (which requires a lot more work)

jaydeluca · 2025-09-04T10:07:36Z

...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java

+      CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) {
+    LongCounter failureCounter =
+        meter
+            .counterBuilder("failsafe.circuit_breaker.failure.count")


maybe @trask can weigh in on this too because I'm not sure if it's directly applicable, but the docs around "recording errors on metrics" recommends using a single metric with an attribute that differentiates between success and failure

It’s RECOMMENDED to report one metric that includes successes and failures as opposed to reporting two (or more) metrics depending on the operation status.

trask · 2025-09-05T03:21:59Z

...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java

+      CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) {
+    LongCounter openCircuitBreakerCounter =
+        meter
+            .counterBuilder("failsafe.circuit_breaker.open.count")


maybe failsafe.circuit_breaker.state_changes

with attribute failsafe.circuit_breaker.state = open / half_open / closed?

trask · 2025-09-05T03:31:16Z

...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java

+    LongCounter failureCounter =
+        meter
+            .counterBuilder("failsafe.circuit_breaker.failure.count")
+            .setDescription("Count of failed circuit breaker executions.")


I'm not quite sure what this is measuring, is it just the number of times the circuit breaker has allowed an execution, or how many times the execution it has allowed has failed?

This represents the number of executions failed by the configured circuit breaker, for example:

var circuitBreaker = CircuitBreaker .builder() .handleResultIf(r -> true) // Each execution is failure .build(); Failsafe.with(circuitBreaker).get(() -> { // Execution happens here... return null; });

Failure count will always be incremented for this circuit breaker until the circuit breaker is opened. Once the circuit breaker is open, CircuitBreakerOpenException is thrown until it's half-open again so it is not counted as failure since the circuit breaker didn't let the execution happen.

After a second thought, I'm not sure if this metric is truly valuable, and the same applies to the number of successes. What do you think? Should we keep them?

.../library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetry.java

otelbot · 2025-09-17T14:11:47Z

Thank you for your contribution @onurkybsi! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey.

Co-authored-by: Lauri Tulmin <[email protected]>

onurkybsi requested a review from a team as a code owner June 18, 2025 06:39

onurkybsi force-pushed the issue-10856 branch from 44a5169 to f1ac254 Compare June 20, 2025 06:10