Commit ae605d1
committed
TL/CUDA: fix oob_req leak and double-free in NVLS init cleanup
Two bugs in the BARRIER state error path and cleanup section:
1. When req_test returns a negative status (barrier failure), team->oob_req
was freed via barrier_data path but req_free was never called, leaking
the OOB transport request handle. Add req_free + NULL the pointer before
goto cleanup.
2. nvls->mc_va, nvls->uc_va, and nvls->mc_memhandle are stored at the end
of STATE_ADD_DEVICE before falling through to STATE_BARRIER. If the
barrier then fails and jumps to cleanup, the cleanup block frees these
resources via local variables but leaves the nvls struct fields non-NULL.
A subsequent ucc_tl_cuda_nvls_destroy call then unmaps/releases them
again causing a double-free / CUDA resource corruption. Zero the nvls
fields immediately after the local-variable cleanup blocks.1 parent 2551467 commit ae605d1
1 file changed
+5
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
695 | 695 | | |
696 | 696 | | |
697 | 697 | | |
| 698 | + | |
| 699 | + | |
698 | 700 | | |
699 | 701 | | |
700 | 702 | | |
| |||
738 | 740 | | |
739 | 741 | | |
740 | 742 | | |
| 743 | + | |
741 | 744 | | |
742 | 745 | | |
743 | 746 | | |
| |||
750 | 753 | | |
751 | 754 | | |
752 | 755 | | |
| 756 | + | |
753 | 757 | | |
754 | 758 | | |
755 | 759 | | |
| |||
758 | 762 | | |
759 | 763 | | |
760 | 764 | | |
| 765 | + | |
761 | 766 | | |
762 | 767 | | |
763 | 768 | | |
| |||
0 commit comments