Commit 73c5a63
authored
Add timeouts to summarizer stopping (microsoft#26271)
There are some peculiar cases of documents hitting 10k ops where a
summarizer doesn't shut down properly. The parent container continually
recognizes it is the elected client, but it still has a summarizer
hanging somewhere. Some timeouts were added in various places to ensure
the summarizer closes properly for the parent to spawn a new one.
Looking at telemetry, the P99.9 of a summarizer's first event to proper
connection was 35 seconds. Thus, a timeout of 2 minutes should be more
than enough.
One interesting potential case is the following:
1. Parent container spawns summarizer
2. Summarizer starts load flow
3. Parent container disconnects (`summarizer?.stop(...)` is not called
since load flow has not finished and returned the `summarizer` instance)
4. Summarizer finishes load flow, and we attempt to call
`summarizer.run(...)`
5. At the start of `summarizer.run(...)`, it waits to see a
`"connected"` event from the ContainerRuntime
i. This may not always happen if the summarizer gets into some weird
state and isn't able to get a connection from service, causing the whole
summarizer flow to hang
[AB#54693](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/54693)1 parent eec5a71 commit 73c5a63
File tree
2 files changed
+85
-14
lines changed- packages/runtime/container-runtime/src/summary
- summaryDelayLoadedModule
2 files changed
+85
-14
lines changedLines changed: 46 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
169 | | - | |
170 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
171 | 199 | | |
172 | 200 | | |
173 | 201 | | |
| |||
216 | 244 | | |
217 | 245 | | |
218 | 246 | | |
219 | | - | |
220 | | - | |
| 247 | + | |
| 248 | + | |
221 | 249 | | |
222 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
223 | 263 | | |
224 | 264 | | |
225 | 265 | | |
| |||
Lines changed: 39 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
106 | 107 | | |
107 | 108 | | |
108 | 109 | | |
| |||
347 | 348 | | |
348 | 349 | | |
349 | 350 | | |
350 | | - | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
351 | 354 | | |
352 | | - | |
353 | | - | |
354 | | - | |
| 355 | + | |
| 356 | + | |
355 | 357 | | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
360 | 371 | | |
361 | 372 | | |
362 | 373 | | |
| |||
368 | 379 | | |
369 | 380 | | |
370 | 381 | | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
371 | 399 | | |
372 | 400 | | |
373 | 401 | | |
| |||
453 | 481 | | |
454 | 482 | | |
455 | 483 | | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
456 | 487 | | |
457 | 488 | | |
458 | 489 | | |
| |||
0 commit comments