Skip to content

build fails with "error: implicit declaration of function ‘memcpy’" #13476

@alexschroeter

Description

@alexschroeter

Describe the bug

build of ompi v5.0.8 with ROCm support fails because of missing header (I suspect)

accelerator_rocm_module.c:286:9: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]
  286 |         memcpy(dest, src, size);
      |         ^~~~~~

Steps to Reproduce

Install UCX

git clone --branch v1.19.0 https://github.com/openucx/ucx/
cd ucx
./autogen.sh
./contrib/configure-release \
        --prefix=/prefix_path/bin/ucx/ \
        --enable-optimizations \
        --disable-logging \
        --disable-debug \
        --disable-assertions \
        --disable-params-check \
        --with-rocm="${ROCM_PATH}"
make -j
make install
git clone --branch v5.0.8 https://github.com/open-mpi/ompi

cd ompi
git submodule update --init --recursive
./autogen.pl
./configure \
        --with-rocm=/opt/rocm \
        --with-ucx=/path-to-ucx/bin/ucx \
        --with-prefix=/prefix_path/bin/ompi

make -j
make install

Build log is attached here.
ompi_build.log

Setup and versions

  • AlmaLinux release 9.4 (Seafoam Ocelot)

  • CPU architecture (x86_64)
    Vendor ID: AuthenticAMD
    Model name: AMD EPYC 7452 32-Core Processor

  • For RDMA/IB/RoCE related issues:

    • Driver version:

      • rpm -q rdma-core -> rdma-core-2404mlnx51-1.2404066.x86_64
      • rpm -q libibverbs -> libibverbs-2404mlnx51-1.2404066.x86_64
      • MLNX_OFED version ofed_info -s -> MLNX_OFED_LINUX-24.04-0.7.0.0
    • HW information from ibstat or ibv_devinfo -vv command

      • CA 'mlx5_0'
        CA type: MT4123
        Number of ports: 1
        Firmware version: 20.30.1004
        Hardware version: 0
        Node GUID: 0x0c42a1030086badc
        System image GUID: 0x0c42a1030086badc
        Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 100
        Base lid: 860
        LMC: 0
        SM lid: 692
        Capability mask: 0x2651e848
        Port GUID: 0x0c42a1030086badc
        Link layer: InfiniBand
    • hca_id: mlx5_0
      transport: InfiniBand (0)
      fw_ver: 20.30.1004
      node_guid: 0c42:a103:0086:badc
      sys_image_guid: 0c42:a103:0086:badc
      vendor_id: 0x02c9
      vendor_part_id: 4123
      hw_ver: 0x0
      board_id: MT_0000000222
      phys_port_cnt: 1
      max_mr_size: 0xffffffffffffffff
      page_size_cap: 0xfffffffffffff000
      max_qp: 262144
      max_qp_wr: 32768
      device_cap_flags: 0x21361c36
      BAD_PKEY_CNTR
      BAD_QKEY_CNTR
      AUTO_PATH_MIG
      CHANGE_PHY_PORT
      PORT_ACTIVE_EVENT
      SYS_IMAGE_GUID
      RC_RNR_NAK_GEN
      MEM_WINDOW
      UD_IP_CSUM
      XRC
      MEM_MGT_EXTENSIONS
      MEM_WINDOW_TYPE_2B
      MANAGED_FLOW_STEERING
      max_sge: 30
      max_sge_rd: 30
      max_cq: 16777216
      max_cqe: 4194303
      max_mr: 16777216
      max_pd: 8388608
      max_qp_rd_atom: 16
      max_ee_rd_atom: 0
      max_res_rd_atom: 4194304
      max_qp_init_rd_atom: 16
      max_ee_init_rd_atom: 0
      atomic_cap: ATOMIC_HCA (1)
      max_ee: 0
      max_rdd: 0
      max_mw: 16777216
      max_raw_ipv6_qp: 0
      max_raw_ethy_qp: 0
      max_mcast_grp: 2097152
      max_mcast_qp_attach: 240
      max_total_mcast_qp_attach: 503316480
      max_ah: 2147483647
      max_fmr: 0
      max_srq: 8388608
      max_srq_wr: 32767
      max_srq_sge: 31
      max_pkeys: 128
      local_ca_ack_delay: 16
      general_odp_caps:
      ODP_SUPPORT
      ODP_SUPPORT_IMPLICIT
      rc_odp_caps:
      SUPPORT_SEND
      SUPPORT_RECV
      SUPPORT_WRITE
      SUPPORT_READ
      SUPPORT_SRQ
      uc_odp_caps:
      NO SUPPORT
      ud_odp_caps:
      SUPPORT_SEND
      xrc_odp_caps:
      SUPPORT_SEND
      SUPPORT_WRITE
      SUPPORT_READ
      SUPPORT_SRQ
      completion timestamp_mask: 0x7fffffffffffffff
      hca_core_clock: 156250kHZ
      device_cap_flags_ex: 0x3000005021361C36
      PCI_WRITE_END_PADDING
      Unknown flags: 0x3000004000000000
      tso_caps:
      max_tso: 0
      rss_caps:
      max_rwq_indirection_tables: 0
      max_rwq_indirection_table_size: 0
      rx_hash_function: 0x0
      rx_hash_fields_mask: 0x0
      max_wq_type_rq: 0
      packet_pacing_caps:
      qp_rate_limit_min: 0kbps
      qp_rate_limit_max: 0kbps
      max_rndv_hdr_size: 64
      max_num_tags: 127
      max_ops: 32768
      max_sge: 1
      flags:
      IBV_TM_CAP_RC

        cq moderation caps:
                max_cq_count:   65535
                max_cq_period:  4095 us
      
        maximum available device memory:        262144Bytes
      
        num_comp_vectors:               63
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 692
                        port_lid:               860
                        port_lmc:               0x00
                        link_layer:             InfiniBand
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x2251e848
                        port_cap_flags2:        0x0032
                        max_vl_num:             4 (3)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           128
                        gid_tbl_len:            8
                        subnet_timeout:         18
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           25.0 Gbps (32)
                        phys_state:             LINK_UP (5)
                        GID[  0]:               fe80:0000:0000:0000:0c42:a103:0086:badc
      
  • For GPU related issues:

    • GPU type
    • ROCm:
      • Drivers version 7.0.2
      • Check if peer-direct is loaded: lsmod|grep nv_peer_mem and/or gdrcopy: lsmod|grep gdrdrv
        • Is not

Additional information (depending on the issue)

  • Output of ucx_info -d to show transports and devices recognized by UCX
#
# Memory domain: self
#     Component: self
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#           rkey_ptr is supported
#         memory types: host (access,reg_nonblock,reg,cache)
#
#      Transport: self
#         Device: memory
#           Type: loopback
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 19360.00 MB/sec
#              latency: 0 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 8K
#             am_bcopy: <= 8K
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: tcp
#     Component: tcp
#         memory types: 
#
#      Transport: tcp
#         Device: eno1
#           Type: network
#  System device: eno1 (0)
#
#      capabilities:
#            bandwidth: 113.16/ppn + 0.00 MB/sec
#              latency: 5776 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 0
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#      Transport: tcp
#         Device: ibs8
#           Type: network
#  System device: ibs8 (1)
#
#      capabilities:
#            bandwidth: 2200.00/ppn + 0.00 MB/sec
#              latency: 5206 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#      Transport: tcp
#         Device: lo
#           Type: network
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 11.91/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 18 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
#      max_conn_priv: 2064 bytes
#
# Memory domain: sysv
#     Component: sysv
#             allocate: unlimited
#           remote key: 12 bytes
#           rkey_ptr is supported
#         memory types: host (access,alloc,cache)
#
#      Transport: sysv
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 15360.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: posix
#     Component: posix
#             allocate: <= 264104868K
#           remote key: 24 bytes
#           rkey_ptr is supported
#         memory types: host (access,alloc,cache)
#
#      Transport: posix
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 15360.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: mlx5_0
#     Component: ib
#             register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#           memory invalidation is supported
#         memory types: host (access,reg,cache), rocm (reg,cache)
#
#      Transport: rc_verbs
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (1)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 75 nsec
#            put_short: <= 124
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 5 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 5 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 123
#             am_bcopy: <= 8255
#             am_zcopy: <= 8255, up to 4 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 127
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 7 bytes
#       error handling: peer failure, ep_check
#
#
#      Transport: ud_verbs
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (1)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 105 nsec
#             am_short: <= 116
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 5 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 3992
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure, ep_check
#
#
#      Transport: dc_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (1)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 660 nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 11 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 11 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 138
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 7 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: rc_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (1)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 14 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 14 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 186
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 10 bytes
#       error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
#      Transport: ud_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (1)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 80 nsec
#             am_short: <= 180
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 132
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure, ep_check
#
#
# Connection manager: rdmacm
#      max_conn_priv: 54 bytes
#
# Memory domain: rocm_cpy
#     Component: rocm_cpy
#             allocate: unlimited
#             register: unlimited, cost: 0 nsec
#           remote key: 16 bytes
#         memory types: host (reg,cache), rocm (access,alloc,reg,cache,detect)
#
#      Transport: rocm_copy
#         Device: rocm_cpy
#           Type: accelerator
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 6911.00 MB/sec
#              latency: 100 nsec
#             overhead: 0 nsec
#            put_short: <= 4294967295
#            put_zcopy: unlimited, up to 1 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_short: <= 4294967295
#            get_zcopy: unlimited, up to 1 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: none
#
#
# Memory domain: rocm_ipc
#     Component: rocm_ipc
#             register: unlimited, cost: 9 nsec
#           remote key: 56 bytes
#         memory types: rocm (access,reg,cache)
#
#      Transport: rocm_ipc
#         Device: rocm_ipc
#           Type: accelerator
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 204800.00/ppn + 0.00 MB/sec
#              latency: 100 nsec
#             overhead: 0 nsec
#            put_zcopy: 128..inf, up to 1 iov
#  put_opt_zcopy_align: <= 4
#        put_align_mtu: <= 4
#            get_zcopy: 128..inf, up to 1 iov
#  get_opt_zcopy_align: <= 4
#        get_align_mtu: <= 4
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: none
#
#
# Memory domain: cma
#     Component: cma
#         memory types: 
#
#      Transport: cma
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 2000 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: peer failure, ep_check
#
#
# Memory domain: knem
#     Component: knem
#             register: unlimited, cost: 1200 + 0.007 * N nsec
#           remote key: 16 bytes
#         memory types: host (access,reg,cache)
#
#      Transport: knem
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 13862.00 MB/sec
#              latency: 80 nsec
#             overhead: 2000 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 0 bytes
#       error handling: none
#

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions