-
Notifications
You must be signed in to change notification settings - Fork 1k
Failsafe RetryPolicy instrumentation added #15255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
17f361b to
ee4ab3d
Compare
| .build(); | ||
| LongHistogram attemptsHistogram = | ||
| meter | ||
| .histogramBuilder("failsafe.retry_policy.attempts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure using a histogram for this is justified. @trask could you provide guidance on this
ee4ab3d to
e0715f9
Compare
| .setDescription("Histogram of number of attempts for each execution.") | ||
| .ofLongs() | ||
| .setExplicitBucketBoundariesAdvice( | ||
| LongStream.range(1, userConfig.getMaxAttempts() + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onurkybsi what's typical userConfig.getMaxAttempts()?
could you come up with a smallish static set, e.g. 1, 2, 5, 10, 20, 50?
also worth reading open-telemetry/semantic-conventions#316 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @trask, userConfig.getMaxAttempts() returns the user configured maximum attempts allowed for the retry policy execution. So, if this value is 3, the possibilities would be like [1(execution succeeded without retry), 2(first retry), 3(last attempt as configured)]. And what is implemented is using this fact, i.e, one by one between 1 and the maximum attempt.
I didn't take having enormous numbers into the account maybe. Do you think we should? If so, I can refactor this part to build up a list which distributes the range(1 to maxAttempt) evenly considering a maximum number of buckets like 10. Maybe something like this:
private static List<Long> buildBoundaries(int maxNumOfBuckets, long maxNumOfAttempts) {
List<Long> boundaries = new ArrayList<>(maxNumOfBuckets);
boundaries.add(1L);
double step = (double) (maxNumOfAttempts - 1) / (maxNumOfBuckets - 1);
for (int i = 1; i < maxNumOfBuckets; i++) {
long boundary = Math.min(Math.round(1 + step * i), maxNumOfAttempts);
boundaries.add(boundary);
}
return boundaries.stream()
.distinct()
.sorted()
.toList();
}What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buckets are costly, so I'd try to keep the number small if possible, e.g. with gc duration metrics, we went with just 5 buckets: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/runtime/jvm-metrics.md#metric-jvmgcduration
do you have any idea what are typical values for userConfig.getMaxAttempts()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 3 as default in Failsafe and same for resilience4j. I think it wouldn't make sense to have a value more than 5 in most of the cases so maybe just [ 1, 2, 3, 5 ]. What do you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Library instrumentation added for Failsafe's RetryPolicy.