Commit 48d73cd
authored
neural: Fix a sync issue (#10281)
Close #10272.
The root cause of this issue is the race condition. Add two syncs to
avoid this issue.
1. At beginning of the mma loop -> This prevent the fast warp start
writing the shared memory A in the i+1 iter when other slow warps are
still reading the shared memory A.
2. In the backward, after mma and before outerproduceAccumulate ->
becuase mma ends with warp-sync, so outerproduceAccumulate could start
executing for some fast warps while the slow warps are still running
mma.
This PR also fix a numerical issue in the activation functions.
Basically every function using exp() is not numerical stable, so update
them.1 parent d76973a commit 48d73cd
File tree
2 files changed
+24
-3
lines changed- source/standard-modules/neural
2 files changed
+24
-3
lines changedLines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1044 | 1044 | | |
1045 | 1045 | | |
1046 | 1046 | | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
1047 | 1052 | | |
1048 | 1053 | | |
1049 | 1054 | | |
| |||
1271 | 1276 | | |
1272 | 1277 | | |
1273 | 1278 | | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
1274 | 1286 | | |
1275 | 1287 | | |
1276 | 1288 | | |
| |||
1284 | 1296 | | |
1285 | 1297 | | |
1286 | 1298 | | |
| 1299 | + | |
| 1300 | + | |
1287 | 1301 | | |
1288 | 1302 | | |
1289 | 1303 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| 97 | + | |
97 | 98 | | |
98 | | - | |
| 99 | + | |
| 100 | + | |
99 | 101 | | |
100 | 102 | | |
101 | 103 | | |
| |||
185 | 187 | | |
186 | 188 | | |
187 | 189 | | |
| 190 | + | |
188 | 191 | | |
189 | | - | |
| 192 | + | |
| 193 | + | |
190 | 194 | | |
191 | 195 | | |
192 | 196 | | |
| |||
212 | 216 | | |
213 | 217 | | |
214 | 218 | | |
| 219 | + | |
215 | 220 | | |
216 | | - | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
217 | 224 | | |
218 | 225 | | |
219 | 226 | | |
| |||
0 commit comments