Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/mellanox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
CI is managed by [Azure Pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/?view=azure-devops) service.

Mellanox Open MPI CI includes:
* Open MPI building with internal stable engineering versions of UCX and HCOLL. The building is run in Docker-based environment.
* Open MPI building with internal stable engineering versions of UCX. The building is run in Docker-based environment.
* Sanity functional testing.
### How to Run CI
Mellanox Open MPI CI is triggered upon the following events:
Expand Down
61 changes: 0 additions & 61 deletions config/ompi_check_libhcoll.m4

This file was deleted.

4 changes: 2 additions & 2 deletions contrib/amca-param-sets/ft-mpi
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ btl=^usnic
# The following frameworks/components are UNTESTED, and probably won't work.
# They should run without faults, and will probably crash/deadlock after a fault.
# You may try at your own risk.
# coll hcoll, portals4
# coll portals4
# topo (all)
# osc (all)
# io (all)
Expand All @@ -72,7 +72,7 @@ btl=^usnic
# We will disable only the components for which good components are known to exist.
# Other untested components are selectable but will issue a runtime warning at
# initiation if FT is enabled.
coll=^hcoll,portals4
coll=^portals4

#
# The following frameworks/components are NOT WORKING. Do not enable these with FT.
Expand Down
2 changes: 1 addition & 1 deletion contrib/platform/intel/bend/linux
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ enable_ipv6=no
enable_man_pages=no
enable_mpi_fortran=no
enable_memchecker=no
enable_mca_no_build=memchecker,coll-adapt,coll-cuda,coll-demo,coll-ftagree,coll-han,coll-hcoll,coll-inter,coll-libnbc,coll-monitoring,coll-portals4,coll-tuned,common-monitoring,common-ompio,fbtl,fcoll,fs,io,mtl,osc,pml-cm,pml-monitoring,pml-ucx,pml-v,sharedfp,topo,vprotocol,btl-ofi,btl-portals4,btl-smcuda,btl-uct,btl-ugni,btl-usnic,common-cuda,common-ofi,common-ucx
enable_mca_no_build=memchecker,coll-adapt,coll-cuda,coll-demo,coll-ftagree,coll-han,coll-inter,coll-libnbc,coll-monitoring,coll-portals4,coll-tuned,common-monitoring,common-ompio,fbtl,fcoll,fs,io,mtl,osc,pml-cm,pml-monitoring,pml-ucx,pml-v,sharedfp,topo,vprotocol,btl-ofi,btl-portals4,btl-smcuda,btl-uct,btl-ugni,btl-usnic,common-cuda,common-ofi,common-ucx
enable_contrib_no_build=libompitrace
with_memory_manager=no
with_devel_headers=yes
Expand Down
1 change: 0 additions & 1 deletion contrib/platform/lanl/toss/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ created.
(change S to X; make sure numbers match those for the same entry in
contrib/platform/lanl/toss/optimized-mlx.conf)
- addition: pml = ob1 (disable MXM)
- addition: coll = ^hcoll (disable MXM)
- toss3-hfi-optimized
- copy of toss2-qib-optimized
- toss3-hfi-optimized.conf
Expand Down
1 change: 0 additions & 1 deletion contrib/platform/lanl/toss/toss2-mlx-optimized.conf
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,3 @@ ras_base_launch_orted_on_hn = true

## Disable MXM
pml = ob1
coll = ^hcoll
5 changes: 0 additions & 5 deletions contrib/platform/mellanox/optimized
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,6 @@ if [ "$mellanox_autodetect" == "yes" ]; then
with_ucx=$ucx_dir
fi

hcoll_dir=${hcoll_dir:="$(pkg-config --variable=prefix hcoll)"}
if [ -d $hcoll_dir ]; then
with_hcoll=$hcoll_dir
fi

slurm_dir=${slurm_dir:="/usr"}
if [ -f $slurm_dir/include/slurm/slurm.h ]; then
with_slurm=$slurm_dir
Expand Down
2 changes: 1 addition & 1 deletion docs/features/ulfm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ correctly after a failure.
* ``cuda``, ``inter``, ``sync``, ``sm``: **untested** (they have not
been modified to handle faults, but we expect correct post-fault
behavior)
* ``hcoll``, ``portals4`` **disabled** (they have not been modified
* ``portals4`` **disabled** (it has not been modified
to handle faults, and we expect unspecified post-fault behavior)

* ``osc``: MPI one-sided communications
Expand Down
9 changes: 0 additions & 9 deletions docs/installing-open-mpi/configure-cli-options/networking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,6 @@ can be used with ``configure``:

FCA is the support library for Mellanox switches and HCAs.

* ``--with-hcoll=DIR``:
Specify the directory where the Mellanox hcoll library and header
files are located. This option is generally only necessary if the
hcoll headers and libraries are not in default compiler/linker
search paths.

hcoll is the support library for MPI collective operation offload on
Mellanox ConnectX-3 HCAs (and later).

* ``--with-knem=DIR``:
Specify the directory where the knem libraries and header files are
located. This option is generally only necessary if the knem headers
Expand Down
2 changes: 1 addition & 1 deletion docs/tuning-apps/coll-tuned.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Tuning Collectives

Open MPI's ``coll`` framework provides a number of components implementing
collective communication, including: ``han``, ``libnbc``, ``self``, ``ucc`` ``base``,
``hcoll``, ``sync``, ``xhc``, ``accelerator``, ``basic``, ``ftagree``, ``inter``, ``portals4``,
``sync``, ``xhc``, ``accelerator``, ``basic``, ``ftagree``, ``inter``, ``portals4``,
and ``tuned``. Some of these components may not be available depending on how
Open MPI was compiled and what hardware is available on the system. A run-time
decision based on each component's self reported priority, selects which
Expand Down
28 changes: 0 additions & 28 deletions docs/tuning-apps/networking/cuda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,6 @@ CUDA-aware support is available in:
* The OFI (``ofi``) MTL with the CM (``cm``) PML.
* Both CUDA-ized shared memory (``smcuda``) and TCP (``tcp``) BTLs
with the OB1 (``ob1``) PML.
* The HCOLL (``hcoll``) COLL

/////////////////////////////////////////////////////////////////////////

Expand Down Expand Up @@ -702,30 +701,3 @@ to query rank information and utilize that to select a GPU.

MPI internal CUDA resources are released during MPI_Finalize. Thus it is an
application error to call cudaDeviceReset before MPI_Finalize is called.


/////////////////////////////////////////////////////////////////////////

How do I enable CUDA support in HCOLL collective component
----------------------------------------------------------

HCOLL component supports CUDA GPU buffers for the following
collectives:

MPI_Allreduce
MPI_Bcast
MPI_Allgather
MPI_Ibarrier
MPI_Ibcast
MPI_Iallgather
MPI_Iallreduce

To enable CUDA GPU buffer support in these collectives pass the
following environment variables via mpirun:

.. code-block::

shell$ mpirun -x HCOLL_GPU_ENABLE=1 -x HCOLL_ENABLE_NBC=1 ..

See `nVidia HCOLL documentation <https://docs.nvidia.com/networking/display/HPCXv29/HCOLL>`_
for more information.
4 changes: 2 additions & 2 deletions ompi/mca/coll/base/coll_base_allgather.c
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,8 @@ int ompi_coll_base_allgather_intra_sparbit(const void *sbuf, size_t scount,

/* Since each process sends several non-contiguos blocks of data, each block sent (and therefore each send and recv call) needs a different tag. */
/* As base OpenMPI only provides one tag for allgather, we are forced to use a tag space from other components in the send and recv calls */
MCA_PML_CALL(isend(tmpsend + (ptrdiff_t) send_disp * scount * rext, scount, rdtype, sendto, MCA_COLL_BASE_TAG_HCOLL_BASE - send_disp, MCA_PML_BASE_SEND_STANDARD, comm, requests + transfer_count));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is problematic:

  1. for jobs with more than 1024 ranks we will use tags into an unreserved space (outside the [ MCA_COLL_BASE_TAG_NEIGHBOR_BASE .. MCA_COLL_BASE_TAG_NEIGHBOR_END ] which is only 1024 long).
  2. it changes all the collective components internal messaging. Which means we need to bump the component version up to break compatibility with older components
  3. I think 2 means we cannot do it in the middle of a stable series, soas is this PR is only meant for 6.0. If we want to bring it into the 5.x we should leave the HCOLL tag space as is, and only clean it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't going into v5.x only into into v6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, I just wanted to make sure we are well aware of this. However, I think the change to the collective component version is still required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll adjust

MCA_PML_CALL(irecv(tmprecv + (ptrdiff_t) recv_disp * rcount * rext, rcount, rdtype, recvfrom, MCA_COLL_BASE_TAG_HCOLL_BASE - recv_disp, comm, requests + data_expected - exclusion + transfer_count));
MCA_PML_CALL(isend(tmpsend + (ptrdiff_t) send_disp * scount * rext, scount, rdtype, sendto, MCA_COLL_BASE_TAG_NEIGHBOR_BASE - send_disp, MCA_PML_BASE_SEND_STANDARD, comm, requests + transfer_count));
MCA_PML_CALL(irecv(tmprecv + (ptrdiff_t) recv_disp * rcount * rext, rcount, rdtype, recvfrom, MCA_COLL_BASE_TAG_NEIGHBOR_BASE - recv_disp, comm, requests + data_expected - exclusion + transfer_count));
}
ompi_request_wait_all(transfer_count * 2, requests, MPI_STATUSES_IGNORE);

Expand Down
4 changes: 2 additions & 2 deletions ompi/mca/coll/base/coll_base_allgatherv.c
Original file line number Diff line number Diff line change
Expand Up @@ -332,12 +332,12 @@ int ompi_coll_base_allgatherv_intra_sparbit(const void *sbuf, size_t scount,
if(ompi_count_array_get(rcounts, send_disp) > 0)
MCA_PML_CALL(isend(tmpsend + ompi_disp_array_get(rdispls, send_disp) * rext,
ompi_count_array_get(rcounts, send_disp), rdtype, sendto,
MCA_COLL_BASE_TAG_HCOLL_BASE - send_disp,
MCA_COLL_BASE_TAG_NEIGHBOR_BASE - send_disp,
MCA_PML_BASE_SEND_STANDARD, comm, requests + step_requests++));
if(ompi_count_array_get(rcounts, recv_disp) > 0)
MCA_PML_CALL(irecv(tmprecv + ompi_disp_array_get(rdispls, recv_disp) * rext,
ompi_count_array_get(rcounts, recv_disp), rdtype, recvfrom,
MCA_COLL_BASE_TAG_HCOLL_BASE - recv_disp, comm,
MCA_COLL_BASE_TAG_NEIGHBOR_BASE - recv_disp, comm,
requests + step_requests++));
}
ompi_request_wait_all(step_requests, requests, MPI_STATUSES_IGNORE);
Expand Down
4 changes: 1 addition & 3 deletions ompi/mca/coll/base/coll_tags.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,8 @@
#define MCA_COLL_BASE_TAG_NONBLOCKING_END ((-1 * INT_MAX/2) + 1)
#define MCA_COLL_BASE_TAG_NEIGHBOR_BASE (MCA_COLL_BASE_TAG_NONBLOCKING_END - 1)
#define MCA_COLL_BASE_TAG_NEIGHBOR_END (MCA_COLL_BASE_TAG_NEIGHBOR_BASE - 1024)
#define MCA_COLL_BASE_TAG_HCOLL_BASE (-1 * INT_MAX/2)
#define MCA_COLL_BASE_TAG_HCOLL_END (-1 * INT_MAX)

#define MCA_COLL_BASE_TAG_BASE MCA_COLL_BASE_TAG_BLOCKING_BASE
#define MCA_COLL_BASE_TAG_END MCA_COLL_BASE_TAG_HCOLL_END
#define MCA_COLL_BASE_TAG_END MCA_COLL_BASE_TAG_NEIGHBOR_END

#endif /* MCA_COLL_BASE_TAGS_H */
50 changes: 0 additions & 50 deletions ompi/mca/coll/hcoll/Makefile.am

This file was deleted.

Loading