Hello everyone I am facing error shown below
An error occurred while trying to map in the address of a function.
Function Name: cuIpcOpenMemHandle_v2
Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.
I was trying to benchmark cuda-aware openmpi-4.1.8 linked with cuda-aware ucx-1.19.x using OSU benchmark from https://mvapich.cse.ohio-state.edu/benchmarks/
Things I have done so far
- Build cuda-11.8 tool kit using gcc-8.2.0 then export its lib64 and bin
- Make ucx-1.19.x cuda-aware using the built cuda-11.8 then export its lib and bin (gcc-8.2.0 compiler used)
- Link openmpi-4.1.8 with cuda-11.8 making it cuda-aware and also link cuda-aware ucx-1.19.x (gcc-8.2.0 compiler used)
- Build the OSU benchmark with the built cuda-aware openmpi-4.1.8 linked with cuda-aware ucx-1.19.x and with
the cuda-11.8 (gcc-8.2.0 compiler used)
- The OSU program picked to benchmark was osu_bw
after the execution I am facing the above error
One thing I notice in the built cuda-aware ucx-1.19.x was it had a missing transport gdr_copy
thought it has cuda_copy and cuda_ipc when checking for cuda support with "ucx_info -d | grep -i cuda"
I heard that gdr_copy transport should also be there if ucx is cuda-aware
and that this transport is dependent on module called nv_peer_mem or nvidia-peermem
later I found out that my driver have a missing module call nv_peer_mem or
nvidia-peermem
Could this also be the reason for the above error i.e.
An error occurred while trying to map in the address of a function.
Function Name: cuIpcOpenMemHandle_v2
Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.
Thanks a lot for taking time to read