Commit b2dc84b
Fix crash that occurs sometimes when aborting a slot migration while child snapshot is active (#2721)
The race condition causes the client to be used and subsequently double
freed by the slot migration read pipe handler. The order of events is:
1. We kill the slot migration child process during CANCELSLOTMIGRATIONS
1. We then free the associated client to the target node
1. Although we kill the child process, it is not guaranteed that the
pipe will be empty from child to parent
1. If the pipe is not empty, we later will read that out in the
slotMigrationPipeReadHandler
1. In the pipe read handler, we attempt to write to the connection. If
writing to the connection fails, we will attempt to free the client
1. However, the client was already freed, so this a double free
Notably, the slot migration being aborted doesn't need to be triggered
by `CANCELSLOTMIGRATIONS`, it can be any failure.
To solve this, we simply:
1. Set the slot migration pipe connection to NULL whenever it is
unlinked
2. Bail out early in slot migration pipe read handler if the connection
is NULL
I also consolidate the killSlotMigrationChild call to one code path,
which is executed on client unlink. Before, there were two code paths
that would do this twice (once on slot migration job finish, and once on
client unlink). Sending the signal twice is fine, but inefficient.
Also, add a test to cancel during the slot migration snapshot to make
sure this case is covered (we only caught it during the module test).
---------
Signed-off-by: Jacob Murphy <[email protected]>
(cherry picked from commit 28e5dcc)
Signed-off-by: cherukum-amazon <[email protected]>1 parent e761433 commit b2dc84b
File tree
4 files changed
+38
-22
lines changed- src
- tests/unit/cluster
4 files changed
+38
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2309 | 2309 | | |
2310 | 2310 | | |
2311 | 2311 | | |
2312 | | - | |
2313 | | - | |
2314 | | - | |
2315 | 2312 | | |
2316 | 2313 | | |
2317 | 2314 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1883 | 1883 | | |
1884 | 1884 | | |
1885 | 1885 | | |
1886 | | - | |
1887 | | - | |
| 1886 | + | |
1888 | 1887 | | |
1889 | 1888 | | |
1890 | 1889 | | |
| |||
1895 | 1894 | | |
1896 | 1895 | | |
1897 | 1896 | | |
1898 | | - | |
1899 | | - | |
1900 | | - | |
1901 | | - | |
1902 | | - | |
| 1897 | + | |
1903 | 1898 | | |
1904 | 1899 | | |
1905 | 1900 | | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
| 1907 | + | |
| 1908 | + | |
1906 | 1909 | | |
1907 | 1910 | | |
1908 | 1911 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1849 | 1849 | | |
1850 | 1850 | | |
1851 | 1851 | | |
| 1852 | + | |
| 1853 | + | |
| 1854 | + | |
1852 | 1855 | | |
1853 | 1856 | | |
1854 | 1857 | | |
1855 | 1858 | | |
1856 | 1859 | | |
1857 | 1860 | | |
1858 | | - | |
| 1861 | + | |
1859 | 1862 | | |
1860 | | - | |
1861 | 1863 | | |
1862 | 1864 | | |
1863 | 1865 | | |
| |||
1879 | 1881 | | |
1880 | 1882 | | |
1881 | 1883 | | |
1882 | | - | |
| 1884 | + | |
1883 | 1885 | | |
1884 | 1886 | | |
1885 | 1887 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
386 | 386 | | |
387 | 387 | | |
388 | 388 | | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
389 | 398 | | |
390 | 399 | | |
391 | 400 | | |
| |||
407 | 416 | | |
408 | 417 | | |
409 | 418 | | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
410 | 433 | | |
411 | 434 | | |
412 | 435 | | |
413 | 436 | | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
421 | | - | |
422 | | - | |
423 | 437 | | |
424 | 438 | | |
425 | 439 | | |
| |||
0 commit comments