-
Notifications
You must be signed in to change notification settings - Fork 1k
Failsafe 3.0 instrumentation introduced #14057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...sting/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/AbstractFailsafeTest.java
Outdated
Show resolved
Hide resolved
...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java
Outdated
Show resolved
Hide resolved
...java/io/opentelemetry/instrumentation/failsafe/v3_0/CircuitBreakerEventListenerBuilders.java
Outdated
Show resolved
Hide resolved
docs/supported-libraries.md
Outdated
| [Elasticsearch API Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) | 7.16 - 7.17.19,<br>8.0 - 8.9.+ [4] | N/A | [Elasticsearch Client Spans] | | ||
| [Elasticsearch REST Client](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html) | 5.0+ | N/A | [Database Client Spans], [Database Client Metrics] [6] | | ||
| [Elasticsearch Transport Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html) | 5.0+ | N/A | [Database Client Spans], [Database Client Metrics] [6] | | ||
| [Failsafe](https://failsafe.dev/) | 3.0.1+ | [opentelemetry-failsafe-3.0](../instrumentation/failsafe-3.0/library) | none | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask I think this might be the first library only instrumentation we have. Do we need to point this out somehow here? Set the Auto-instrumented versions
to N/A
? Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N/A
in the auto-instrumented versions column sounds good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this is fixed. However, fyi, after I'm done with other Failsafe policies'(RetryPolicy
, RateLimiter
, Bulkhead
) library instrumentation, I'm planning to work on the auto insturmentation.
8fd8249
to
8b580c2
Compare
CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) { | ||
LongCounter successCounter = | ||
meter | ||
.counterBuilder("failsafe.circuit_breaker.success.count") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask could you suggest units for these metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe {execution}
public final class FailsafeTelemetry { | ||
private static final String INSTRUMENTATION_NAME = "io.opentelemetry.failsafe-3.0"; | ||
|
||
private static final AttributeKey<String> CIRCUIT_BREAKER_NAME = AttributeKey.stringKey("name"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask is using name
here fine or should it be circuit_breaker.name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest going even further: failsafe.circuit_breaker.name
prefixing it with failsafe.*
makes it clear it's an instrumentation specific attribute and we're not attempting to create a general circuit breaker semantic convention (which requires a lot more work)
CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) { | ||
LongCounter failureCounter = | ||
meter | ||
.counterBuilder("failsafe.circuit_breaker.failure.count") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe @trask can weigh in on this too because I'm not sure if it's directly applicable, but the docs around "recording errors on metrics" recommends using a single metric with an attribute that differentiates between success and failure
It’s RECOMMENDED to report one metric that includes successes and failures as opposed to reporting two (or more) metrics depending on the operation status.
CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) { | ||
LongCounter openCircuitBreakerCounter = | ||
meter | ||
.counterBuilder("failsafe.circuit_breaker.open.count") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe failsafe.circuit_breaker.state_changes
with attribute failsafe.circuit_breaker.state
= open
/ half_open
/ closed
?
LongCounter failureCounter = | ||
meter | ||
.counterBuilder("failsafe.circuit_breaker.failure.count") | ||
.setDescription("Count of failed circuit breaker executions.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what this is measuring, is it just the number of times the circuit breaker has allowed an execution, or how many times the execution it has allowed has failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represents the number of executions failed by the configured circuit breaker, for example:
var circuitBreaker = CircuitBreaker
.builder()
.handleResultIf(r -> true) // Each execution is failure
.build();
Failsafe.with(circuitBreaker).get(() -> {
// Execution happens here...
return null;
});
Failure count will always be incremented for this circuit breaker until the circuit breaker is opened. Once the circuit breaker is open, CircuitBreakerOpenException
is thrown until it's half-open again so it is not counted as failure since the circuit breaker didn't let the execution happen.
After a second thought, I'm not sure if this metric is truly valuable, and the same applies to the number of successes. What do you think? Should we keep them?
041830f
to
af6013f
Compare
.../library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetry.java
Show resolved
Hide resolved
Thank you for your contribution @onurkybsi! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. |
Co-authored-by: Lauri Tulmin <[email protected]>
As requested in #10856, Failsafe instrumentation introduced. Intrumentation of other Failsafe policies such as
RetryPolicy
,RateLimiter
,Bulkhead
will be added with upcoming PRs.