Commit 078daf9
authored
feat(shard-manager): Add support for watching drains (#7697)
<!-- 1-2 line summary of WHAT changed technically:
- Always link the relevant projects GitHub issue, unless it is a minor
bugfix
- Good: "Modified FailoverDomain mapper to allow null ActiveClusterName
#320"
- Bad: "added nil check" -->
**What changed?**
This PR introduces DrainSignalObserver interface in clientcommon to
allow shard-distributor components to react to infrastructure drain
signals.
DrainSignalObserver is a simple interface that allows
deployment-specific implementations to signal when this instance has
been removed from or added back to service discovery. The leader
namespace manager subscribes to drain and signal to proactively resign
from etcd elections, it also listens to undrain signal to resume back
leadership operations to campain again for the namespace.
<!-- Your goal is to provide all the required context for a future
maintainer
to understand the reasons for making this change (see
https://cbea.ms/git-commit/#why-not-how).
How did this work previously (and what was wrong with it)? What has
changed, and why did you solve it
this way?
- Good: "Active-active domains have independent cluster attributes per
region. Previously,
modifying cluster attributes required spedifying the default
ActiveClusterName which
updates the global domain default. This prevents operators from updating
regional
configurations without affecting the primary cluster designation. This
change allows
attribute updates to be independent of active cluster selection."
- Bad: "Improves domain handling" -->
**Why?**
The shard-distributor leader holds an etcd lease to coordinate shard
assignments across all executors. In production environments,
infrastructure operations (e.g. host drains) can remove a service
instance from service discovery while the process continues running.
Without active detection, the leader in a drained zone continues holding
its etcd lease and operating normally - unaware that it is no longer
reachable by other components.
<!-- Include specific test commands and setup. Please include the exact
commands such that
another maintainer or contributor can reproduce the test steps taken.
- e.g Unit test commands with exact invocation
`go test -v ./common/types/mapper/proto -run TestFailoverDomainRequest`
- For integration tests include setup steps and test commands
Example: "Started local server with `./cadence start`, then ran `make
test_e2e`"
- For local simulation testing include setup steps for the server and
how you ran the tests
- Good: Full commands that reviewers can copy-paste to verify
- Bad: "Tested locally" or "Added tests" -->
**How did you test it?**
Added unit tests and checked with `go test -v
./service/sharddistributor/leader/namespace`
<!-- If there are risks that the release engineer should know about
document them here.
For example:
- Has an API/IDL been modified? Is it backwards/forwards compatible? If
not, what are the repecussions?
- Has a schema change been introduced? Is it possible to roll back?
- Has a feature flag been re-used for a new purpose?
- Is there a potential performance concern? Is the change modifying core
task processing logic?
- If truly N/A, you can mark it as such -->
**Potential risks**
NA
<!-- If this PR completes a user facing feature or changes functionality
add release notes here.
Your release notes should allow a user and the release engineer to
understand the changes with little context.
Always ensure that the description contains a link to the relevant
GitHub issue. -->
**Release notes**
NA
<!-- Consider whether this change requires documentation updates in the
Cadence-Docs repo
- If yes: mention what needs updating (or link to docs PR in
cadence-docs repo)
- If in doubt, add a note about potential doc needs
- Only mark N/A if you're certain no docs are affected -->
**Documentation Changes**
NA
---
## Reviewer Validation
**PR Description Quality** (check these before reviewing code):
- [ ] **"What changed"** provides a clear 1-2 line summary
- [ ] Project Issue is linked
- [ ] **"Why"** explains the full motivation with sufficient context
- [ ] **Testing is documented:**
- [ ] Unit test commands are included (with exact `go test` invocation)
- [ ] Integration test setup/commands included (if integration tests
were run)
- [ ] Canary testing details included (if canary was mentioned)
- [ ] **Potential risks** section is thoughtfully filled out (or
legitimately N/A)
- [ ] **Release notes** included if this completes a user-facing feature
- [ ] **Documentation** needs are addressed (or noted if uncertain)
---------
Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>1 parent ac363d7 commit 078daf9
File tree
4 files changed
+440
-96
lines changed- service/sharddistributor
- client/clientcommon
- leader/namespace
4 files changed
+440
-96
lines changedLines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
Lines changed: 68 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
22 | 29 | | |
23 | 30 | | |
24 | 31 | | |
25 | 32 | | |
| 33 | + | |
26 | 34 | | |
27 | 35 | | |
28 | 36 | | |
29 | 37 | | |
30 | 38 | | |
31 | 39 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
37 | 45 | | |
38 | 46 | | |
39 | 47 | | |
| |||
43 | 51 | | |
44 | 52 | | |
45 | 53 | | |
| 54 | + | |
46 | 55 | | |
47 | 56 | | |
48 | 57 | | |
| |||
51 | 60 | | |
52 | 61 | | |
53 | 62 | | |
| 63 | + | |
54 | 64 | | |
55 | 65 | | |
56 | 66 | | |
| |||
73 | 83 | | |
74 | 84 | | |
75 | 85 | | |
76 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
77 | 89 | | |
78 | 90 | | |
79 | 91 | | |
| |||
82 | 94 | | |
83 | 95 | | |
84 | 96 | | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
| 97 | + | |
| 98 | + | |
89 | 99 | | |
90 | 100 | | |
91 | 101 | | |
92 | 102 | | |
93 | 103 | | |
94 | | - | |
| 104 | + | |
95 | 105 | | |
96 | 106 | | |
97 | 107 | | |
98 | 108 | | |
99 | 109 | | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | 110 | | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
119 | 115 | | |
120 | 116 | | |
121 | 117 | | |
122 | 118 | | |
123 | | - | |
124 | | - | |
| 119 | + | |
| 120 | + | |
125 | 121 | | |
126 | 122 | | |
127 | 123 | | |
128 | 124 | | |
129 | | - | |
130 | | - | |
131 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
132 | 130 | | |
133 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
134 | 135 | | |
135 | | - | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
136 | 173 | | |
137 | 174 | | |
138 | 175 | | |
139 | 176 | | |
140 | | - | |
141 | | - | |
142 | | - | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
143 | 186 | | |
144 | | - | |
| 187 | + | |
145 | 188 | | |
146 | | - | |
| 189 | + | |
147 | 190 | | |
148 | 191 | | |
149 | 192 | | |
150 | 193 | | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
0 commit comments