Replies: 1 comment
-
We did a lot of work with UCX and dask/ucx-py on summit here: rapidsai/ucx-py#616 I lost my summit access so I can't help directly anymore. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm testing UCX PUTs/GETs on CUDA memory on ORNL Summit, and I'm seeing very low performance for small GETs.
I'm using UCX trunk edacb52, configured with
--enable-mt --enable-cma --enable-numa --enable-silent-rules --enable-optimizations --enable-builtin-memcpy --enable-compiler-opt=3 --enable-option-checking --disable-debug --disable-gtest --disable-stats --disable-static --disable-tuning --disable-logging --disable-examples --disable-profiling --disable-assertions --disable-debug-data --disable-params-check --disable-fault-injection --disable-frame-pointer --disable-dependency-tracking --without-go --without-bfd --without-mpi --without-knem --without-java --without-valgrind --with-cache-line-size=128 --with-cuda=/sw/summit/cuda/11.0.3 --with-gdrcopy=/sw/summit/gdrcopy/2.0
.For environment variable, I have set:
With the setup above, I get the following inter-node numbers from
ucx_perftest -t ucp_get
:If I add
UCX_ZCOPY_THRESH=0
to the environment, I get:Adding
UCX_PROTO_INFO=y
to the environment shows thatsoftware emulation
(am bcopy?) is used for GET sizes of0..64
, but I cannot find a way to change this behavior or improve its performance. What's the correct setting for Summit?Any help is appreciated!
Beta Was this translation helpful? Give feedback.
All reactions