-
Notifications
You must be signed in to change notification settings - Fork 869
feat: Heartbeat shard statistics #7431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Theis-Mathiassen
wants to merge
56
commits into
cadence-workflow:master
Choose a base branch
from
AndreasHolt:heartbeat-shard-statistics
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+353
−21
Open
Changes from 19 commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
2de12d8
feat(shard distributor): add shard key helpers and metrics state
AndreasHolt 5d95067
feat(shard distributor): persist shard metrics in etcd store
AndreasHolt 6e57536
fix(shard distributor): update LastMoveTime in the case where a shard…
AndreasHolt 595d320
test(shard distributor): add tests for shard metrics
AndreasHolt d9ba54d
fix(shard distributor): modify comment
AndreasHolt 32d2ecd
fix(shard distributor): add atomic check to prevent metrics race
AndreasHolt b624a00
fix(shard distributor): apply shard metric updates in a second phase …
AndreasHolt aad7b2e
feat(shard distributor): move shard metric updates out of AssignShard…
AndreasHolt 6360f8a
fix(shard distributor): keep NamespaceState revisions tied to assignm…
AndreasHolt 1536d0a
refactor(shard distributor): use shard cache and clock for preparing …
AndreasHolt f316fbf
test(shard distributor): BuildShardPrefix, BuildShardKey, ParseShardKey
AndreasHolt 4524da9
feat(shard distributor): simplify shard metrics updates
AndreasHolt 126f725
refactor(shard distributor): ShardMetrics renamed to ShardStatistics.…
AndreasHolt cc53f68
test(shard distributor): small changes to shard key tests s.t. they l…
AndreasHolt 733bbcb
fix(shard distributor): no longer check for key type ShardStatisticsK…
AndreasHolt 6816b8e
refactor(shard distributor): found a place where I forgot to rename t…
AndreasHolt f97e0cf
fix(shard distributor): move non-exported helpers to end of file to f…
AndreasHolt 513e88c
feat(shard distributor): clean up the shard statistics
AndreasHolt 9833525
test(shard distributor): add test case for when shard stats are deleted
AndreasHolt 0332fe5
fix(shard distributor): add mapping (new metric)
AndreasHolt d5a13d9
feat(shard distributor): retain shard stats while shards are within h…
AndreasHolt 634bc02
feat: function to update shard statistics from heartbeat (currently n…
AndreasHolt 812e854
test(shard distributor): add tests to verify statistics are updated a…
AndreasHolt b9813e7
feat(shard distributor): calculate smoothed load (ewma) using the Sha…
AndreasHolt dfb7448
fix(shard distributor): log invalid shard load
AndreasHolt 36ec08f
chore: added logger warning and simplified ewma calculation
Theis-Mathiassen 38a6e81
Merge branch 'master' into heartbeat-shard-statistics
Theis-Mathiassen af733e6
fix: remove duplicate test introduced in merge
AndreasHolt a52e86f
chore: consistent error checking, and rename function
Theis-Mathiassen abfc80e
chore: added decompress to unmarshal
Theis-Mathiassen df0feaf
feat(shard distributor): persist shard metrics in etcd store
AndreasHolt 8546a26
feat(shard distributor): move shard metric updates out of AssignShard…
AndreasHolt dde87ef
fix(shard distributor): keep NamespaceState revisions tied to assignm…
AndreasHolt 415e80c
refactor(shard distributor): use shard cache and clock for preparing …
AndreasHolt c67d5c3
feat(shard distributor): simplify shard metrics updates
AndreasHolt 9ffcefb
refactor(shard distributor): ShardMetrics renamed to ShardStatistics.…
AndreasHolt cc769bf
refactor(shard distributor): found a place where I forgot to rename t…
AndreasHolt 8c22663
fix(shard distributor): move non-exported helpers to end of file to f…
AndreasHolt 5ac3c5d
test(shard distributor): add test case for when shard stats are deleted
AndreasHolt 3973b82
feat(shard distributor): retain shard stats while shards are within h…
AndreasHolt 3830d5e
feat: function to update shard statistics from heartbeat (currently n…
AndreasHolt 443c0b1
test(shard distributor): add tests to verify statistics are updated a…
AndreasHolt 9d159e7
feat(shard distributor): calculate smoothed load (ewma) using the Sha…
AndreasHolt 18e63b7
fix(shard distributor): log invalid shard load
AndreasHolt e08a286
chore: added logger warning and simplified ewma calculation
Theis-Mathiassen 08eb635
fix: remove duplicate test introduced in merge
AndreasHolt f63664a
chore: consistent error checking, and rename function
Theis-Mathiassen 10e2ffa
chore: added decompress to unmarshal
Theis-Mathiassen 8c6b0c8
chore: removed an old struct that appeared during rebase
Theis-Mathiassen 158e030
feat(shard distributor): throttle shard-stat writes
AndreasHolt dd45ff0
Merge branch 'heartbeat-shard-statistics' of github.com:AndreasHolt/c…
AndreasHolt 05e0d1d
fix(shard distributor): linter error
AndreasHolt e0779ec
feat(shard distributor): decouple shard stats write-throttling decisi…
AndreasHolt 9546f24
Merge branch 'master' into heartbeat-shard-statistics
AndreasHolt db70702
fix(shard-distributor): inverted condition in shard stats cleanup loop
AndreasHolt 481f9c6
chore(shard-distributor): did some formatting, and use current load i…
Theis-Mathiassen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that we are generating too much writes load to etcd,do you think we can avoid writing if the the load is stable?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that is possible, so as an example if load from heartbeat is within +/- 5% of the currently stored value, we do not update it?
Some problems with this approach might be:
Because of how ewmaSmoothedLoad calculates the new load, if there have been a longer time since last load change, the new load will have greater impact.
So I fear if we have a very stable shard load with some spikes in the load, the smoothing effect is lost, excuse my drawing, but i hope it illustrates the idea:

It might still be possible, we will probably just have to rethink the update function, how we can avoid this.
We will look into how to solve this, and comment again when we have something.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have implemented something that can help throttle the writes in 158e030.
We now only write
Edit:
e0779ec decouples the stats cleanup, which before this change was coupled to heartbeat TTL when determining whether a shard stat is stale.