-
Notifications
You must be signed in to change notification settings - Fork 495
Closed
Labels
Description
Configuration
OMPI: 3.1.0rc4
MOFED: MLNX_OFED_LINUX-4.3-1.0.1.0
Nodes: hercules x47 (ppn=32(x47), nodelist=clx-hercules-[005,008,081-125])
Job: ucx
Test passes without -x UCX_MAX_EAGER_LANES=3 -x UCX_MAX_RNDV_LANES=3.
Cmd:
mpirun -np 2 -mca btl self --tag-output --timestamp-output -mca pml ucx -mca coll '^hcoll' --bind-to core -x UCX_NET_DEVICES=mlx5_0:1,mlx5_2:1 -x UCX_IB_REG_METHODS=rcache,direct -x UCX_TLS=all -x UCX_RC_TM_ENABLE=n -mca pmix_base_async_modex 1 -mca mpi_add_procs_cutoff 0 -mca pmix_base_collect_data 0 -x UCX_MAX_EAGER_LANES=3 -x UCX_MAX_RNDV_LANES=3 --map-by node /mnt/lustre/users/mtt/scratch/ucx_ompi/20180427_020635_16343_83128_clx-hercules-005/installs/mOeN/tests/mpich_tests/mpich-mellanox.git/test/mpi/pt2pt/sendrecv1
Output:
...
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data in target buffer did not match for destination datatype recv struct (64 nblock 4096 blocklen 40960 stride 0 lb) and source datatype MPI_DOUBLE, count = 262144
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 35 but got p[0,3065] = 2d
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 34 but got p[0,3065] = 2c
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 33 but got p[0,3065] = 2b
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 32 but got p[0,3065] = 2a
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 31 but got p[0,3065] = 29
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 30 but got p[0,3065] = 28
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 2f but got p[0,3066] = 27
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 2e but got p[0,3066] = 26
Sun Apr 29 16:27:34 2018[1,0]<stdout>:Data expected = 2d but got p[0,3066] = 25
Sun Apr 29 16:27:39 2018[1,0]<stdout>: Found 40 errors