-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Calculate recent write load in indexing stats #124652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate recent write load in indexing stats #124652
Conversation
e76c01c to
43b8c6c
Compare
d8896c6 to
1c1c43d
Compare
This uses the recently-added `ExponentiallyWeightedMovingRate` class to calculate a write load which favours more recent load and include this alongside the existing unweighted all-time write load in `IndexingStats.Stats`. As of this change, the new load metric is not used anywhere, although it can be retrieved with the index stats or node stats APIs.
1c1c43d to
d47d3d8
Compare
| this.startTime = startTime; | ||
| this.lastTime = 0; // after an increment, this must be positive, so a zero value indicates we're waiting for the first | ||
| this.lastTime = 0; // should never be used | ||
| this.waitingForFirstIncrement = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are tweaking this class so that it allows times to be zero. This is technically the correct thing to do, since it is allowed for System.nanoTime() to return zero (though it is vanishingly unlikely!). More importantly, it avoids having to tweak the times used in some tests.
As a result, we can no longer use lastTime == 0 as a marker that we're waiting for the first increment, so we add a boolean to do that job explicitly.
| public double calculateRateSince(long currentTime, double currentRate, long oldTime, double oldRate) { | ||
| if (oldTime < startTime) { | ||
| oldTime = startTime; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this guard for a few edge cases.
| * | ||
| * <p>If this {@link Stats} instance represents multiple shards, this is the average of that ratio for each shard, weighted by | ||
| * the elapsed time for each shard. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. This PR is not changing this behaviour, it's just documenting and adding tests for the existing behaviour (since I found it surprising).
| fakeClock.advance(); | ||
| return IndexingOperationListener.super.preIndex(shardId, operation); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the FakeClock implementation in this test advanced the time on every other call to its getter. This worked only as long as the assumptions about when the getter is called held true, which made it pretty fragile. This PR removes this auto-advance behaviour, and instead explicitly advances the time at the correct point in the indexing operation using a listener, which is much more robust.
|
Pinging @elastic/es-data-management (Team:Data Management) |
| if (index.origin().isRecovery() == false) { | ||
| long took = result.getTook(); | ||
| totalStats.indexMetric.inc(took); | ||
| totalStats.recentIndexMetric.addIncrement(took, relativeTimeInNanosSupplier.getAsLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me slightly nervous since it has a synchronized block and I think this code is called fairly often. The operations in the block are fast, but it's still a point of synchronization. I think we'll notice in nightly performance tests if it causes any problems though. And I don't see a good way around having a synchronized block in that method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the state in ExponentiallyWeightedMovingRate could be moved into an object and swapped with an atomic ref? I suppose that would create lots of objects, but would remove the need for sync'ing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my benchmarking, the slowest I ever saw that addIncrement was about 11ns on my machine. So we'd need to be writing millions of documents per second on a node before this started getting significant. Does that seem likely?
I did consider something like what you suggest, Parker. I was concerned about the impact on GC, which my gut feeling would be worse. Though I could be wrong.
I also considered some kind of batching, accumulating the values in a LongAdder before flushing them into the EWMR (it would be fine to increment for e.g. a whole second's worth of docs using the same timestamp) but that adds complexity which I'm hoping we don't need.
| logger.debug( | ||
| "Generating stats for an index shard with indexing time {} sec and active time {} sec giving unweighted write load {} " | ||
| + "while the recency-weighted write load is {} using a half-life of {} sec", | ||
| totalIndexingTimeSinceShardStartedInNanos * 1.0e-9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use TimeValue.timeValueNanos(totalIndexingTimeSinceShardStartedInNanos) to avoid having to have numbers like 1.0e-9 in here (and it automatically use seconds, hours, days, etc, depending on the size of the number, which is convenient. For example, it shows the current half-life as 10000d rather than 864000000 sec).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Done.
I occurred to me that I should probably also use the overload of logger.debug which takes a Supplier to avoid doing the extra work unless debug logging is enabled. So I've done that, too.
| */ | ||
| public class IndexingStatsSettings { | ||
|
|
||
| // TODO: Change this default to something sensible: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to do this before merging this PR? Or is this just to be determined at some point before we start using this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning this for the next PR.
| && totalIndexingTimeSinceShardStartedInNanos == that.totalIndexingTimeSinceShardStartedInNanos | ||
| && totalActiveTimeInNanos == that.totalActiveTimeInNanos; | ||
| && totalActiveTimeInNanos == that.totalActiveTimeInNanos | ||
| && recentIndexingLoad == that.recentIndexingLoad; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs probably ought to be in hashCode() if it's going to be in equals() (although I seriously doubt either one is used anywhere other than maybe tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's allowed for hashCode() to only use a subset of the fields used in equals(). It does mean that there'd be a collision making a hash map very slightly less performant if it contained two entries keyed by instances of this which were identical in every field except this one... but I'm not worried about that! As you say, these really shouldn't be used anywhere outside of (maybe) tests.
I opted not to put it in hashCode() because I'm really queasy about calculating the hash code of the result of a floating point computation. I mean, I'm a bit queasy about doing == on it, but figured it might be the right thing to do for some tests. But I drew the line at hashCode().
I'm happy to change this if you feel strongly, e.g. because it might make readers stop of think more than just including it. Since we don't think it's used, I don't feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized after i wrote the comment that it was harmless. Yeah I'm also uneasy with it being in equals. But since I really don't think either one is ever used, I don't feel strongly about either of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think some of the synchronization could be removed at a later point, but we can see if it matters for performance at all first.
This uses the recently-added `ExponentiallyWeightedMovingRate` class to calculate a write load which favours more recent load and include this alongside the existing unweighted all-time write load in `IndexingStats.Stats`. As of this change, the new load metric is not used anywhere, although it can be retrieved with the index stats or node stats APIs.
This uses the recently-added `ExponentiallyWeightedMovingRate` class to calculate a write load which favours more recent load and include this alongside the existing unweighted all-time write load in `IndexingStats.Stats`. As of this change, the new load metric is not used anywhere, although it can be retrieved with the index stats or node stats APIs.
This adds the fields added in elastic/elasticsearch#125521 and elastic/elasticsearch#124652 to the spec.
This adds the fields added in elastic/elasticsearch#125521 and elastic/elasticsearch#124652 to the spec.
This uses the recently-added
ExponentiallyWeightedMovingRateclass to calculate a write load which favours more recent load and include this alongside the existing unweighted all-time write load inIndexingStats.Stats.As of this change, the new load metric is not used anywhere, although it can be retrieved with the index stats or node stats APIs.