Commit a188849
Automerge: [OpenMP] Fix crash with task stealing and task dependencies (#126049)
This patch series demonstrates and fixes a bug that causes crashes with
OpenMP 'taskwait' directives in heavily multi-threaded scenarios.
TLDR: The early return from __kmpc_omp_taskwait_deps_51 missed the
synchronization mechanism in place for the late return.
Additional debug assertions check for the implied invariants of the code.
@jpeyton52 found the timing hole as this sequence of events:
>
> 1. THREAD 1: A regular task with dependences is created, call it T1
> 2. THREAD 1: Call into `__kmpc_omp_taskwait_deps_51()` and create a stack
based depnode (`NULL` task), call it T2 (stack)
> 3. THREAD 2: Steals task T1 and executes it getting to
`__kmp_release_deps()` region.
> 4. THREAD 1: During processing of dependences for T2 (stack) (within
`__kmp_check_deps()` region), a link is created T1 -> T2. This increases
T2's (stack) `nrefs` count.
> 5. THREAD 2: Iterates through the successors list: decrement the T2's
(stack) npredecessor count. BUT HASN'T YET `__kmp_node_deref()`-ed it.
> 6. THREAD 1: Now when finished with `__kmp_check_deps()`, it returns false
because npredecessor count is 0, but T2's (stack) `nrefs` count is 2 because
THREAD 2 still references it!
> 7. THREAD 1: Because `__kmp_check_deps()` returns false, early exit.
> _Now the stack based depnode is invalid, but THREAD 2 still references it._
>
> We've reached improper stack referencing behavior. Varied results/crashes/
asserts can occur if THREAD 1 comes back and recreates the exact same depnode
in the exact same stack address during the same time THREAD 2 calls
`__kmp_node_deref()`.2 files changed
+32
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
45 | 49 | | |
46 | 50 | | |
47 | 51 | | |
| |||
51 | 55 | | |
52 | 56 | | |
53 | 57 | | |
54 | | - | |
| 58 | + | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
| |||
825 | 829 | | |
826 | 830 | | |
827 | 831 | | |
828 | | - | |
| 832 | + | |
829 | 833 | | |
830 | 834 | | |
831 | 835 | | |
| |||
1007 | 1011 | | |
1008 | 1012 | | |
1009 | 1013 | | |
1010 | | - | |
| 1014 | + | |
1011 | 1015 | | |
1012 | 1016 | | |
1013 | 1017 | | |
| |||
1018 | 1022 | | |
1019 | 1023 | | |
1020 | 1024 | | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
1021 | 1035 | | |
1022 | 1036 | | |
1023 | 1037 | | |
| |||
1032 | 1046 | | |
1033 | 1047 | | |
1034 | 1048 | | |
1035 | | - | |
1036 | | - | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
1037 | 1055 | | |
| 1056 | + | |
| 1057 | + | |
1038 | 1058 | | |
1039 | 1059 | | |
1040 | 1060 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
0 commit comments