Skip to content

Conversation

onurkybsi
Copy link
Contributor

As requested in #10856, Failsafe instrumentation introduced. Intrumentation of other Failsafe policies such as RetryPolicy, RateLimiter, Bulkhead will be added with upcoming PRs.

@onurkybsi onurkybsi requested a review from a team as a code owner June 18, 2025 06:39
| [Elasticsearch API Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) | 7.16 - 7.17.19,<br>8.0 - 8.9.+ [4] | N/A | [Elasticsearch Client Spans] |
| [Elasticsearch REST Client](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html) | 5.0+ | N/A | [Database Client Spans], [Database Client Metrics]&nbsp;[6] |
| [Elasticsearch Transport Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html) | 5.0+ | N/A | [Database Client Spans], [Database Client Metrics]&nbsp;[6] |
| [Failsafe](https://failsafe.dev/) | 3.0.1+ | [opentelemetry-failsafe-3.0](../instrumentation/failsafe-3.0/library) | none |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask I think this might be the first library only instrumentation we have. Do we need to point this out somehow here? Set the Auto-instrumented versions to N/A? Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N/A in the auto-instrumented versions column sounds good to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this is fixed. However, fyi, after I'm done with other Failsafe policies'(RetryPolicy, RateLimiter, Bulkhead) library instrumentation, I'm planning to work on the auto insturmentation.

CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) {
LongCounter successCounter =
meter
.counterBuilder("failsafe.circuit_breaker.success.count")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask could you suggest units for these metrics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe {execution}

public final class FailsafeTelemetry {
private static final String INSTRUMENTATION_NAME = "io.opentelemetry.failsafe-3.0";

private static final AttributeKey<String> CIRCUIT_BREAKER_NAME = AttributeKey.stringKey("name");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask is using name here fine or should it be circuit_breaker.name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest going even further: failsafe.circuit_breaker.name

prefixing it with failsafe.* makes it clear it's an instrumentation specific attribute and we're not attempting to create a general circuit breaker semantic convention (which requires a lot more work)

CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) {
LongCounter failureCounter =
meter
.counterBuilder("failsafe.circuit_breaker.failure.count")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe @trask can weigh in on this too because I'm not sure if it's directly applicable, but the docs around "recording errors on metrics" recommends using a single metric with an attribute that differentiates between success and failure

It’s RECOMMENDED to report one metric that includes successes and failures as opposed to reporting two (or more) metrics depending on the operation status.

CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) {
LongCounter openCircuitBreakerCounter =
meter
.counterBuilder("failsafe.circuit_breaker.open.count")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe failsafe.circuit_breaker.state_changes

with attribute failsafe.circuit_breaker.state = open / half_open / closed?

LongCounter failureCounter =
meter
.counterBuilder("failsafe.circuit_breaker.failure.count")
.setDescription("Count of failed circuit breaker executions.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what this is measuring, is it just the number of times the circuit breaker has allowed an execution, or how many times the execution it has allowed has failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This represents the number of executions failed by the configured circuit breaker, for example:

var circuitBreaker = CircuitBreaker
    .builder()
    .handleResultIf(r -> true) // Each execution is failure
    .build();
    
Failsafe.with(circuitBreaker).get(() -> {
    // Execution happens here...
    return null;
});

Failure count will always be incremented for this circuit breaker until the circuit breaker is opened. Once the circuit breaker is open, CircuitBreakerOpenException is thrown until it's half-open again so it is not counted as failure since the circuit breaker didn't let the execution happen.

After a second thought, I'm not sure if this metric is truly valuable, and the same applies to the number of successes. What do you think? Should we keep them?

@laurit laurit added this to the v2.21.0 milestone Sep 17, 2025
@trask trask merged commit 8eea435 into open-telemetry:main Sep 17, 2025
89 checks passed
Copy link
Contributor

otelbot bot commented Sep 17, 2025

Thank you for your contribution @onurkybsi! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey.

mznet pushed a commit to mznet/opentelemetry-java-instrumentation that referenced this pull request Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants