-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES-10037 Periodic logging in autosharding service #126171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES-10037 Periodic logging in autosharding service #126171
Conversation
Hey @dakrone. I'm adding you here to flag it for your attention even though it's in draft — I've slacked you with context. |
b290bb9
to
cb6e524
Compare
This enhances DataStreamAutoShardingService so that it periodically logs at INFO level the most 'interesting' results it has produced in the last period. In this PR, the most 'interesting' results are considered to be the ones with the highest load, keeping track separately of the top 10 which resulting in an increase decision and the top 10 which did not. In practice, increase recommendations are sufficiently rare that the top 10 will often be 'all of them', and they are all potentially interesting (but we cap it to protect against taking up an unbounded amount of memory). We keep the high load non-increase recommendations as well, since these are likely to be the interesting ones to look at when investigating why some data stream did not get an increase shards recommendation when we might have expected it. The mechanism would be easily extended to add in other categories. For example, we may find that there are some categories of decrease decisions we consider 'interesting'. (N.B. The rollover service will not roll over just because the auto-sharding service recommended down-sharding — down-sharding only happens if the rollover was going to happen for some other reason (age, size, etc.) So it's normal for the auto-sharding service to return decrease recommendations for the same data streams every 5 minutes until those other conditions are met. Which makes things a bit more complicated.) This PR just covers the cases that seem likely to be useful in the support cases we have seen. The existing DEBUG and TRACE log lines in the service are replaced with a single DEBUG log which pulls together all the data. This is an improvement, since at the moment it is hard to figure out from the logs which lines refer to the same data stream (they are interleaved, and don't all include the data stream name). The write load field in the AutoShardingResult was unused, and is removed.
cb6e524
to
866aa07
Compare
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left some minor comments but nothing blocking
} | ||
|
||
private AutoShardingResult getDecreaseShardsResult( | ||
private Decision.DecreaseCalculation calculateIncreaseShardsDecision( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably calculateIncreaseShardsDecision
-> calculateDecreaseShardsDecision
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, snap! Good spot. See, I told you that the copy-and-paste errors were my forte...
) {} | ||
|
||
record DecreaseCalculation( | ||
MaxLoadWithinCooldown maxLoadWithinCooldownForDecrease, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits:
- Considering that
MaxLoadWithinCooldown
is an inner class ofDecreaseCalculation
, I am wondering if we could change the name tomaxLoadWithinCooldown
to make it a bit less verbose. - Maybe adding a comment about the
previousIndexWithMaxLoad
. It's not clear to me when I see this record what it represents. I understand it's an index with max load, but the previous is a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah, that's fair enough. For the increase one, I included the extra words at the end of
writeIndexLoadForIncrease
to differentiate it from thewriteIndexLoad*
fields in theInputs
record. But it's not really necessary here. - Good point. And I just learnt how to write javadoc on a
record
, which I'd never had to do before.
…m-autosharding-extra-logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I also left some minor comments
if (decisionLogger == null) { | ||
PeriodicDecisionLogger periodicDecisionLogger = new PeriodicDecisionLogger(nowSupplier); | ||
this.decisionLogger = periodicDecisionLogger::maybeLogDecision; | ||
} else { | ||
this.decisionLogger = decisionLogger; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the treatment of null
as a default of "log the message." I think I'd rather make the constructor take a required non-null argument (with a version that leaves out the argument and passes in the maybeLogDecision
method), and then leave it to the caller to override with a no-op if necessary. Otherwise someone would likely assume that they can pass in a null
and have the logging disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the reason I'd done that was because the naive way of doing the alternative doesn't work, because of the restrictions on what you can do before the this
call in a a delegating constructor. I figured it was okay since the constructor with the extra nullable argument is package-private and labelled as for testing, so the scope for misuse was narrow.
But I think I've found a way of doing it with a static method, so I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I've changed the PeriodicLogger
constructor to do something similar (only no need for a static method there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I guess the alternative would be to junk the public ctor in favour of a static factory method, but this is less invasive of a change.)
} | ||
|
||
// package-private for testing | ||
record Decision( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful for these records (this one and its nested children) to document what the fields are, especially since devs may come here not aware of the interior bits of how autoscaling works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Mary already had me put javadoc on one of the records, I might as well do it for the rest...
"For data stream %s: %s based on [inc/dec cooldowns %s/%s, %d-%d threads, " | ||
+ "write index %s has all-time/recent/peak loads %g/%g/%g, current shards %d, " | ||
+ "using %s value %g for increase gives %d shards%s]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand that this log message encapsulates a lot more metrics and info than the previous one, I have to say it's a bit harder to read than "Data stream auto-sharding service recommends increasing the number of shards from [2] to [3] after [5m] cooldown for data stream [logs-foo-bar]
". Is it possible to keep the information, but make it a little more user-friendly? (I'm concerned that no one would be able to understand it without knowing details about how data stream autoscaling works, and thus the team would be pulled into SDHs just to decipher the message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the current behaviour is like this: Data stream auto-sharding result: For data stream my-awesome-data-stream: Deferred recommendation to decrease shards from 3 to 2 after cooldown period 5h based on [...]
. The nice human-readable bit comes from the toString()
I added to AutoShardingResult
. I think this is pretty much on a par with the old thing. The information which was present before is all presented in the same order as before and the additional information is at the end. (In fact, I have tweaked it so that it no longer says after [0s] cooldown
or whatever in the cases where there's no cooldown, which I argue is a readability improvement.)
If there are further tweaks you'd like to make, let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I quoted above was the debug logging. The periodic info logging looks like this:
Data stream auto-sharding decisions in the last 5m with highest load without an increase shards recommendation:
- For data stream my-data-stream: Recommendation to leave shards unchanged at 1 based on [inc/dec cooldowns 4.5m/3d, 2-32 threads, write index .ds-my-data-stream-2025.04.04-000001 has all-time/recent/peak loads 0.000530127/0.000525674/0.00123915, current shards 1, using ALL_TIME value 0.000530127 for increase gives 1 shards, and using ALL_TIME value 0.000530127 for dec based on write index gives 1 shards]
- For data stream my-other-data-stream: Recommendation to leave shards unchanged at 1 based on [inc/dec cooldowns 4.5m/3d, 2-32 threads, write index .ds-my-data-stream-2025.04.04-000001 has all-time/recent/peak loads 0.000128001/0.000123547/0.00123915, current shards 1, using ALL_TIME value 0.000128001 for increase gives 1 shards, and using ALL_TIME value 0.000128001 for dec based on write index gives 1 shards]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The … using ALL_TIME value 0.000530127 for dec based on write index gives 1 shards
is the most unclear to me, do you think that one could be clarified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about how this is confusing? Is it the ordering of the parts? The use of the abbreviation dec
? (I'm conscious of the width of these lines, but that's probably less important than comprehensibility.) Is it that it's not clear that "based on write index" is telling you where that load comes from?
Would decrease calculation gives 3 shards based on PEAK load of 2.71828 for write index
be better?
I don't think I can confidently change it without knowing more about the nature of your confusion... Or, even better, you could propose an alternative wording!
I'm running out of time to get this in before I have to leave for the weekend. Unless you say otherwise, I'm going to assume that you don't hate this so much that you can't bear to see it go out into the world, and merge the change which is currently going through CI, and we can wordsmith it in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential follow-up PR for word-smithing: #126339.
private static class DecisionBuffer { | ||
|
||
private final Comparator<Decision> comparator; | ||
private final PriorityQueue<Decision> queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using Lucene's PriorityQueue
here only for the insertWithOverflow
? Can you add a comment to explain the reasoning so someone doesn't revert it to the JDK version without understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JDK version isn't bounded. I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Based on what I saw on SO, there are people out there who are depending on Lucene just to get their bounded PQ implementation.)
…m-autosharding-extra-logging
private static Consumer<Decision> createPeriodicLoggingDecisionConsumer(LongSupplier nowSupplier) { | ||
PeriodicDecisionLogger periodicDecisionLogger = new PeriodicDecisionLogger(nowSupplier); | ||
return periodicDecisionLogger::maybeLogDecision; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static method isn't strictly necessary. It could be inlined as new PeriodicDecisionLogger(nowSupplier)::maybeLogDecision
. But I think that's bad readability. If I saw that, it would make me stop and think about what it's doing, and I don't like to have to think.
Also, fun fact, if you write new PeriodicDecisionLogger(nowSupplier)::maybeLogDecision
, IntelliJ will offer to "refactor" it into decision -> new PeriodicDecisionLogger(nowSupplier).maybeLogDecision(decision)
, which would be super bad news. (It does pop up a warning about possible side effects, but still.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that refactoring (Intellij's) is bad. This static method works for me.
This enhances
DataStreamAutoShardingService
so that it periodically logs atINFO
level the most 'interesting' results it has produced in the last period.In this PR, the most 'interesting' results are considered to be the ones with the highest load, keeping track separately of the top 10 which resulting in an increase decision and the top 10 which did not. In practice, increase recommendations are sufficiently rare that the top 10 will often be 'all of them', and they are all potentially interesting (but we cap it to protect against taking up an unbounded amount of memory). We keep the high load non-increase recommendations as well, since these are likely to be the interesting ones to look at when investigating why some data stream did not get an increase shards recommendation when we might have expected it.
The mechanism would be easily extended to add in other categories. For example, we may find that there are some categories of decrease decisions we consider 'interesting'. (N.B. The rollover service will not roll over just because the auto-sharding service recommended down-sharding — down-sharding only happens if the rollover was going to happen for some other reason (age, size, etc.) So it's normal for the auto-sharding service to return decrease recommendations for the same data streams every 5 minutes until those other conditions are met. Which makes things a bit more complicated.) This PR just covers the cases that seem likely to be useful in the support cases we have seen.
The existing
DEBUG
andTRACE
log lines in the service are replaced with a singleDEBUG
log which pulls together all the data. This is an improvement, since at the moment it is hard to figure out from the logs which lines refer to the same data stream (they are interleaved, and don't all include the data stream name).The write load field in the
AutoShardingResult
was unused, and is removed.