Skip to content

Conversation

@onurkybsi
Copy link
Contributor

@onurkybsi onurkybsi commented Nov 10, 2025

Library instrumentation added for Failsafe's RetryPolicy.

@onurkybsi onurkybsi marked this pull request as ready for review November 11, 2025 05:33
@onurkybsi onurkybsi requested a review from a team as a code owner November 11, 2025 05:33
.build();
LongHistogram attemptsHistogram =
meter
.histogramBuilder("failsafe.retry_policy.attempts")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure using a histogram for this is justified. @trask could you provide guidance on this

.setDescription("Histogram of number of attempts for each execution.")
.ofLongs()
.setExplicitBucketBoundariesAdvice(
LongStream.range(1, userConfig.getMaxAttempts() + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onurkybsi what's typical userConfig.getMaxAttempts()?

could you come up with a smallish static set, e.g. 1, 2, 5, 10, 20, 50?

also worth reading open-telemetry/semantic-conventions#316 (comment)

Copy link
Contributor Author

@onurkybsi onurkybsi Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @trask, userConfig.getMaxAttempts() returns the user configured maximum attempts allowed for the retry policy execution. So, if this value is 3, the possibilities would be like [1(execution succeeded without retry), 2(first retry), 3(last attempt as configured)]. And what is implemented is using this fact, i.e, one by one between 1 and the maximum attempt.

I didn't take having enormous numbers into the account maybe. Do you think we should? If so, I can refactor this part to build up a list which distributes the range(1 to maxAttempt) evenly considering a maximum number of buckets like 10. Maybe something like this:

  private static List<Long> buildBoundaries(int maxNumOfBuckets, long maxNumOfAttempts) {
    List<Long> boundaries = new ArrayList<>(maxNumOfBuckets);
    boundaries.add(1L);

    double step = (double) (maxNumOfAttempts - 1) / (maxNumOfBuckets - 1);
    for (int i = 1; i < maxNumOfBuckets; i++) {
      long boundary = Math.min(Math.round(1 + step * i), maxNumOfAttempts);
      boundaries.add(boundary);
    }
    return boundaries.stream()
      .distinct()
      .sorted()
      .toList();
  }

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buckets are costly, so I'd try to keep the number small if possible, e.g. with gc duration metrics, we went with just 5 buckets: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/runtime/jvm-metrics.md#metric-jvmgcduration

do you have any idea what are typical values for userConfig.getMaxAttempts()?

Copy link
Contributor Author

@onurkybsi onurkybsi Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 3 as default in Failsafe and same for resilience4j. I think it wouldn't make sense to have a value more than 5 in most of the cases so maybe just [ 1, 2, 3, 5 ]. What do you say?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants