v5.0.x OSU microbenchmarks CUDA memory segfault

## Background information

### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.x head

```
$ git log --oneline -10
75795c04eb (HEAD -> v5.0.x, origin/v5.0.x) Merge pull request #12821 from Sergei-Lebedev/topic/coll_ucc_fix_buf_size_overflow_v5
a2868acd84 coll/ucc: fix int overflow in coll init
6f08eaf910 Merge pull request #12781 from janjust/v5.0.x
6f91498f59 Merge pull request #12809 from edgargabriel/pr/vulcan-aggr-list-leak-v5.0.x
ff740b4256 fcoll/vulcan: fix memory leak
d380ab6971 Merge pull request #12798 from wenduwan/fix_ipv6
ce3b892360 3rd-party/openpmix: include ipv6 fix
3968cab0fe Merge pull request #12800 from wenduwan/test_mpi4py
b4c98c9487 .github/workflow: set up runtime params right before mpi4py test
3bec944cf0 Merge pull request #12789 from jsquyres/pr/v5.0.x/gcc-14-complier-warning-fixes
```

### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Source build

```
./configure --with-sge --without-verbs --disable-man-pages --enable-ipv6 LDFLAGS=-Wl,--as-needed --enable-prte-prefix-by-default --enable-mca-dso=all --with-libevent=external --with-hwloc=external --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda/lib64/stubs --enable-debug
```

### If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.

```
$ git submodule status
 e62fa4252f0cadda29c4103e01b0e277e8180d3e 3rd-party/openpmix (v5.0.3-17-ge62fa425)
 b68a0acb32cfc0d3c19249e5514820555bcf438b 3rd-party/prrte (v3.0.6)
 dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)
```


### Please describe the system on which you are running

* Operating system/version: Amazon Linux 2
* Computer hardware: AWS EC2 p4d.24xlarge
```
$ nvidia-smi
Tue Sep 24 17:56:49 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  |   00000000:10:1C.0 Off |                    0 |
| N/A   45C    P0             60W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  |   00000000:10:1D.0 Off |                    0 |
| N/A   41C    P0             57W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  |   00000000:20:1C.0 Off |                    0 |
| N/A   44C    P0             59W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  |   00000000:20:1D.0 Off |                    0 |
| N/A   39C    P0             55W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  |   00000000:90:1C.0 Off |                    0 |
| N/A   42C    P0             55W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  |   00000000:90:1D.0 Off |                    0 |
| N/A   41C    P0             58W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  |   00000000:A0:1C.0 Off |                    0 |
| N/A   46C    P0             62W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  |   00000000:A0:1D.0 Off |                    0 |
| N/A   40C    P0             63W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```
* Network type: BTL/SM

-----------------------------

## Details of the problem

We are seeing segfaults with this commit: https://github.com/open-mpi/ompi/pull/12781/files#diff-750d0e8be09c5f4ee5f703b8ba2c735a3e1b8b807162936e55530ec721ec5b86

```
mpirun --wdir . -n 2 --mca pml ob1 openmpi-v5.0.6a1-v5.0.x-debug/install/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw  -d cuda D D
```

The backtrace is
```
(gdb) bt
#0  0x00007fd46edddbe8 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1  0x00007fd46e5f3653 in opal_convertor_accelerator_memcpy (dest=0x7fd41755ce40, src=0x7fd43b200000, size=1, convertor=0x7ffedb4caf80) at opal_convertor.c:52
#2  0x00007fd46e5f3e93 in opal_convertor_pack (pConv=0x7ffedb4caf80, iov=0x7ffedb4cae70, out_size=0x7ffedb4cae84, max_data=0x7ffedb4cae88) at opal_convertor.c:284
#3  0x00007fd42114bb61 in mca_btl_sm_sendi (btl=0x7fd421350180 <mca_btl_sm>, endpoint=0x400c3830, convertor=0x7ffedb4caf80, header=0x7ffedb4cb0b0, header_size=16, payload_size=1,
    order=255 '\377', flags=3, tag=65 'A', descriptor=0x0) at btl_sm_sendi.c:98
#4  0x00007fd4208e9c2d in mca_bml_base_sendi (bml_btl=0x7fd41c068540, convertor=0x7ffedb4caf80, header=0x7ffedb4cb0b0, header_size=16, payload_size=1, order=255 '\377', flags=3,
    tag=65 'A', descriptor=0x0) at ../../../../ompi/mca/bml/bml.h:301
#5  0x00007fd4208eae09 in mca_pml_ob1_send_inline (buf=0x7fd43b200000, count=1, datatype=0x62ef80 <ompi_mpi_char>, dst=1, tag=100, seqn=2, dst_proc=0x40089a80, ob1_proc=0x3fbb9b40,
    endpoint=0x400c5880, comm=0x62f980 <ompi_mpi_comm_world>) at pml_ob1_isend.c:125
#6  0x00007fd4208eaf62 in mca_pml_ob1_isend (buf=0x7fd43b200000, count=1, datatype=0x62ef80 <ompi_mpi_char>, dst=1, tag=100, sendmode=MCA_PML_BASE_SEND_STANDARD,
    comm=0x62f980 <ompi_mpi_comm_world>, request=0x6310e0 <send_request>) at pml_ob1_isend.c:182
#7  0x00007fd46f550673 in PMPI_Isend (buf=0x7fd43b200000, count=1, type=0x62ef80 <ompi_mpi_char>, dest=1, tag=100, comm=0x62f980 <ompi_mpi_comm_world>, request=0x6310e0 <send_request>)
    at isend.c:101
#8  0x000000000040304f in main (argc=<optimized out>, argv=<optimized out>) at osu_bibw.c:216
```


We also get segfault with EFA network but so far the issue appears to be within CUDA memory copy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v5.0.x OSU microbenchmarks CUDA memory segfault #12825

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

v5.0.x OSU microbenchmarks CUDA memory segfault #12825

Description

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.