Commit a50eedb
committed
sendrecv: fix random system crashes during multi-threaded comm abort scenario
The `err_buffer` needs to be zero initialized before passed to libfabric `fi_cq_readerr`.
Earlir when the buffer was allocated outside the while() loop, we missed resetting it
to zero before invoking `fi_cq_readerr` everytime inside the while loop. and this was
causing random memory corruptions.
The fix is either (1) allocate the `err_buffer` outside and zero init every time before
calling `fi_cq_readerr` or (2) move the allocation+zero init to inside the while loop.
This commit implements the option(2): moved the `err_buffer` allocation+zero init to
inside the while loop.
Signed-off-by: Sunita Nadampalli <nadampal@amazon.com>1 parent 7a252ef commit a50eedb
1 file changed
+6
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | 280 | | |
287 | 281 | | |
288 | 282 | | |
| |||
296 | 290 | | |
297 | 291 | | |
298 | 292 | | |
299 | | - | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
300 | 299 | | |
301 | 300 | | |
302 | 301 | | |
| |||
0 commit comments