-
Notifications
You must be signed in to change notification settings - Fork 929
Description
Describe the bug
build of ompi v5.0.8 with ROCm support fails because of missing header (I suspect)
accelerator_rocm_module.c:286:9: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]
286 | memcpy(dest, src, size);
| ^~~~~~
Steps to Reproduce
Install UCX
git clone --branch v1.19.0 https://github.com/openucx/ucx/
cd ucx
./autogen.sh
./contrib/configure-release \
--prefix=/prefix_path/bin/ucx/ \
--enable-optimizations \
--disable-logging \
--disable-debug \
--disable-assertions \
--disable-params-check \
--with-rocm="${ROCM_PATH}"
make -j
make installgit clone --branch v5.0.8 https://github.com/open-mpi/ompi
cd ompi
git submodule update --init --recursive
./autogen.pl
./configure \
--with-rocm=/opt/rocm \
--with-ucx=/path-to-ucx/bin/ucx \
--with-prefix=/prefix_path/bin/ompi
make -j
make install
Build log is attached here.
ompi_build.log
Setup and versions
-
AlmaLinux release 9.4 (Seafoam Ocelot)
-
CPU architecture (x86_64)
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7452 32-Core Processor -
For RDMA/IB/RoCE related issues:
-
Driver version:
rpm -q rdma-core-> rdma-core-2404mlnx51-1.2404066.x86_64rpm -q libibverbs-> libibverbs-2404mlnx51-1.2404066.x86_64- MLNX_OFED version
ofed_info -s-> MLNX_OFED_LINUX-24.04-0.7.0.0
-
HW information from
ibstatoribv_devinfo -vvcommand- CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.30.1004
Hardware version: 0
Node GUID: 0x0c42a1030086badc
System image GUID: 0x0c42a1030086badc
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 860
LMC: 0
SM lid: 692
Capability mask: 0x2651e848
Port GUID: 0x0c42a1030086badc
Link layer: InfiniBand
- CA 'mlx5_0'
-
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 20.30.1004
node_guid: 0c42:a103:0086:badc
sys_image_guid: 0c42:a103:0086:badc
vendor_id: 0x02c9
vendor_part_id: 4123
hw_ver: 0x0
board_id: MT_0000000222
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0x21361c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
UD_IP_CSUM
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
MANAGED_FLOW_STEERING
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 8388608
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
xrc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
SUPPORT_SRQ
completion timestamp_mask: 0x7fffffffffffffff
hca_core_clock: 156250kHZ
device_cap_flags_ex: 0x3000005021361C36
PCI_WRITE_END_PADDING
Unknown flags: 0x3000004000000000
tso_caps:
max_tso: 0
rss_caps:
max_rwq_indirection_tables: 0
max_rwq_indirection_table_size: 0
rx_hash_function: 0x0
rx_hash_fields_mask: 0x0
max_wq_type_rq: 0
packet_pacing_caps:
qp_rate_limit_min: 0kbps
qp_rate_limit_max: 0kbps
max_rndv_hdr_size: 64
max_num_tags: 127
max_ops: 32768
max_sge: 1
flags:
IBV_TM_CAP_RCcq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us maximum available device memory: 262144Bytes num_comp_vectors: 63 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 692 port_lid: 860 port_lmc: 0x00 link_layer: InfiniBand max_msg_sz: 0x40000000 port_cap_flags: 0x2251e848 port_cap_flags2: 0x0032 max_vl_num: 4 (3) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 8 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 25.0 Gbps (32) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:0c42:a103:0086:badc
-
-
For GPU related issues:
- GPU type
- ROCm:
- Drivers version 7.0.2
- Check if peer-direct is loaded:
lsmod|grep nv_peer_memand/or gdrcopy:lsmod|grep gdrdrv- Is not
Additional information (depending on the issue)
- Output of
ucx_info -dto show transports and devices recognized by UCX
#
# Memory domain: self
# Component: self
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
# rkey_ptr is supported
# memory types: host (access,reg_nonblock,reg,cache)
#
# Transport: self
# Device: memory
# Type: loopback
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 19360.00 MB/sec
# latency: 0 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 8K
# am_bcopy: <= 8K
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 0 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: tcp
# Component: tcp
# memory types:
#
# Transport: tcp
# Device: eno1
# Type: network
# System device: eno1 (0)
#
# capabilities:
# bandwidth: 113.16/ppn + 0.00 MB/sec
# latency: 5776 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 0
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: ibs8
# Type: network
# System device: ibs8 (1)
#
# capabilities:
# bandwidth: 2200.00/ppn + 0.00 MB/sec
# latency: 5206 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: lo
# Type: network
# System device: <unknown>
#
# capabilities:
# bandwidth: 11.91/ppn + 0.00 MB/sec
# latency: 10960 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 18 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
# max_conn_priv: 2064 bytes
#
# Memory domain: sysv
# Component: sysv
# allocate: unlimited
# remote key: 12 bytes
# rkey_ptr is supported
# memory types: host (access,alloc,cache)
#
# Transport: sysv
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 15360.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: posix
# Component: posix
# allocate: <= 264104868K
# remote key: 24 bytes
# rkey_ptr is supported
# memory types: host (access,alloc,cache)
#
# Transport: posix
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 15360.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: mlx5_0
# Component: ib
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache), rocm (reg,cache)
#
# Transport: rc_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (1)
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 124
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 5 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 5 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 123
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 4 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 7 bytes
# error handling: peer failure, ep_check
#
#
# Transport: ud_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (1)
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 105 nsec
# am_short: <= 116
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 3992
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Transport: dc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (1)
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 660 nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 11 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 11 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 138
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 7 bytes
# error handling: buffer (zcopy), remote access, peer failure
#
#
# Transport: rc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (1)
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 14 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 14 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 186
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 10 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
# Transport: ud_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (1)
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 80 nsec
# am_short: <= 180
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 132
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Connection manager: rdmacm
# max_conn_priv: 54 bytes
#
# Memory domain: rocm_cpy
# Component: rocm_cpy
# allocate: unlimited
# register: unlimited, cost: 0 nsec
# remote key: 16 bytes
# memory types: host (reg,cache), rocm (access,alloc,reg,cache,detect)
#
# Transport: rocm_copy
# Device: rocm_cpy
# Type: accelerator
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 6911.00 MB/sec
# latency: 100 nsec
# overhead: 0 nsec
# put_short: <= 4294967295
# put_zcopy: unlimited, up to 1 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_short: <= 4294967295
# get_zcopy: unlimited, up to 1 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 0 bytes
# iface address: 8 bytes
# error handling: none
#
#
# Memory domain: rocm_ipc
# Component: rocm_ipc
# register: unlimited, cost: 9 nsec
# remote key: 56 bytes
# memory types: rocm (access,reg,cache)
#
# Transport: rocm_ipc
# Device: rocm_ipc
# Type: accelerator
# System device: <unknown>
#
# capabilities:
# bandwidth: 204800.00/ppn + 0.00 MB/sec
# latency: 100 nsec
# overhead: 0 nsec
# put_zcopy: 128..inf, up to 1 iov
# put_opt_zcopy_align: <= 4
# put_align_mtu: <= 4
# get_zcopy: 128..inf, up to 1 iov
# get_opt_zcopy_align: <= 4
# get_align_mtu: <= 4
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 4 bytes
# error handling: none
#
#
# Memory domain: cma
# Component: cma
# memory types:
#
# Transport: cma
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 11145.00 MB/sec
# latency: 80 nsec
# overhead: 2000 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 4 bytes
# error handling: peer failure, ep_check
#
#
# Memory domain: knem
# Component: knem
# register: unlimited, cost: 1200 + 0.007 * N nsec
# remote key: 16 bytes
# memory types: host (access,reg,cache)
#
# Transport: knem
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 13862.00 MB/sec
# latency: 80 nsec
# overhead: 2000 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 0 bytes
# error handling: none
#