[CI] add mi300 pipeline by chaoos · Pull Request #669 · etmc/tmLQCD

chaoos · 2026-03-09T11:56:23Z

This PR adds requirements on the side of tmLQCD for CI/CD testing on the CSCS test system "beverin". This system hosts AMD MI300A GPUs.

TODO:

minimal dependency list in environment.yaml.
make quda@develop work in the spack spec even through develop is an evolving target.
merge CMake support #664
adopt to cmake build

chaoos · 2026-03-09T11:57:22Z

cscs-ci run beverin

chaoos · 2026-03-09T12:06:15Z

This currently fails due to missing access to the beverin test system at CSCS.

chaoos · 2026-03-10T17:23:03Z

A manual build using quda branch feature/prefetch2 did work, see https://cicd-ext-mw.cscs.ch/ci/pipeline/results/3690753405420143/64239695/2375574724?iid=1711

I used this command on beverin:

uenv build .ci/uenv-recipes/tmlqcd/beverin-mi300 tmlqcd/quda-prefetch2@beverin%mi300

with this spack spec for quda;

  specs:
  - "quda@git.feature/prefetch2 +qdp +multigrid +twisted_clover +twisted_mass"

The image is available in the service namespace of CSCSs uenv registry:

$ uenv image find service::
uenv                                       arch   system   id                size(MB)  date
tmlqcd/quda-prefetch2:2375574724           mi300  beverin  8f0acefe49988d34   3,857    2026-03-10

But the gcc compiler in the uenv is broken 😵:

$ uenv start tmlqcd --view=default
$ gcc --version
Illegal instruction (core dumped)

mtaillefumier · 2026-03-11T01:56:21Z

I learned that the mi300 cluster uses a different authentification mechanism that's why it is failing.

mtaillefumier · 2026-03-11T02:04:28Z

cscs-ci run beverin

mtaillefumier · 2026-03-11T09:06:29Z

$ uenv image find service::
uenv                                       arch   system   id                size(MB)  date
tmlqcd/quda-prefetch2:2375574724           mi300  beverin  8f0acefe49988d34   3,857    2026-03-10

But the gcc compiler in the uenv is broken 😵:

$ uenv start tmlqcd --view=default
$ gcc --version
Illegal instruction (core dumped)

it is a sign that the code was compiled on mi300 and executed on mi250 nodes. Donig the reverse would work.

mtaillefumier · 2026-03-11T09:06:45Z

$ uenv image find service::
uenv                                       arch   system   id                size(MB)  date
tmlqcd/quda-prefetch2:2375574724           mi300  beverin  8f0acefe49988d34   3,857    2026-03-10

But the gcc compiler in the uenv is broken 😵:

$ uenv start tmlqcd --view=default
$ gcc --version
Illegal instruction (core dumped)

it is a sign that the code was compiled on mi300 and executed on mi250 nodes. Doing the reverse would work.

mtaillefumier · 2026-03-11T09:07:04Z

$ uenv image find service::
uenv                                       arch   system   id                size(MB)  date
tmlqcd/quda-prefetch2:2375574724           mi300  beverin  8f0acefe49988d34   3,857    2026-03-10

But the gcc compiler in the uenv is broken 😵:

$ uenv start tmlqcd --view=default
$ gcc --version
Illegal instruction (core dumped)

it is a sign that the code was compiled on mi300 and executed on mi250 nodes. Doing the reverse would work.

chaoos · 2026-03-11T10:10:10Z

it is a sign that the code was compiled on mi300 and executed on mi250 nodes. Doing the reverse would work.

I see that makes sense. Apparently, I started the uenv on the login node which has mi200s. I did start now on a compute node with mi300s and the compiler seems to work. Next problem is that cmake's C compiler test fails:

configure:2981: $? = 1
configure:3001: checking whether the C compiler works
configure:3023: /user-environment/env/default/bin/mpicc -O3 -fopenmp -mtune=neoverse-v2 -mcpu=neoverse-v2  -fopenmp conftest.c  >&5
gcc: warning: '-mcpu=' is deprecated; use '-mtune=' or '-march=' instead
cc1: error: bad value 'neoverse-v2' for '-mtune=' switch
cc1: note: valid arguments to '-mtune=' switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 canno
nlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids emeraldrapids alderlake raptorlake meteorlake graniterapids graniterapids-d arrowlake arrowlak
e-s lunarlake pantherlake bonnell atom silvermont slm goldmont goldmont-plus tremont gracemont sierraforest grandridge clearwaterforest knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000
 nano-3000 nano-x2 eden-x4 nano-x4 lujiazui yongfeng k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 znv
er4 znver5 btver1 btver2 generic native

It seems to me that thorugh the spack process and uenv packaging the compiler uses the neoverse flags for the GH200 node instead of -march=znver4 -mtune=znver4 for the mi300 CPUs. I have to see where this gets injected.

chaoos · 2026-03-11T10:21:43Z

cscs-ci run beverin

Add F7T_CLIENT_ID and F7T_CLIENT_SECRET variables for build stage.

chaoos · 2026-03-11T10:31:08Z

cscs-ci run beverin

chaoos · 2026-03-11T11:10:12Z

cscs-ci run default

mtaillefumier · 2026-03-11T11:11:31Z

cscs-ci run default

mtaillefumier · 2026-03-11T11:12:13Z

cscs-ci run beverin

mtaillefumier · 2026-03-11T11:26:21Z

cscs-ci run beverin

fix typo

mtaillefumier · 2026-03-11T11:36:33Z

cscs-ci run beverin

mtaillefumier · 2026-03-11T11:39:25Z

cscs-ci run beverin

chaoos · 2026-03-11T12:10:41Z

Status update: compiling on the mi300 node works, but running the code fails.

Allocate an mi300 node on beverin:

salloc --nodes=1 --time=01:00:00 --partition=mi300 --gpus-per-node=4

Interactive shell on the compute node for compilation:

srun --uenv=tmlqcd --view=default --pty bash

Compile tmlqcd on the compute node against quda in the uenv:

export CFLAGS="-O3 -fopenmp -mtune=znver4 -mcpu=znver4"
export CXXFLAGS="-O3 -fopenmp -mtune=znver4 -mcpu=znver4"
export LDFLAGS="-fopenmp"
export CC="$(which mpicc)"
export CXX="$(which mpicxx)"
mkdir -p install_dir
autoconf
./configure \
  --enable-quda_experimental \
  --enable-mpi \
  --enable-omp \
  --with-mpidimension=4 \
  --enable-alignment=32 \
  --with-qudadir="/user-environment/env/default" \
  --with-limedir="/user-environment/env/default" \
  --with-lemondir="/user-environment/env/default" \
  --with-lapack="-lopenblas -L/user-environment/env/default/lib" \
  --with-hipdir="/user-environment/env/default/lib" \
  --prefix="$(pwd)/install_dir"
make
make install

Run on the login node:

srun --uenv=tmlqcd --view=default -n 4 ./install_dir/bin/hmc_tm -f doc/sample-input/sample-hmc-quda-cscs-beverin.input

fails with:

# QUDA: ERROR: hipStreamCreateWithPriority(&streams[i], hipStreamDefault, greatestPriority) returned out of memory
 (/tmp/anfink/spack-stage/spack-stage-quda-git.feature_prefetch2_1.0.0-git.7857-e4a5b7x7tczfsshsilrkl2hrmqvgqkam/spack-src/lib/targets/hip/device.cpp:116 in create_context())
 (rank 3, host nid002920, quda_api.cpp:60 in void quda::target::hip::set_runtime_error(hipError_t, const char *, const char *, const char *, const char *, bool)())
# QUDA:        last kernel called was (name=,volume=,aux=)
# QUDA:        last tune param used was block=(64,1,1), grid=(1,1,1), shared_bytes=0, shared_carve_out=0, aux=(1,1,1,1)

The mapping of the 4 processes to the 4 GPUs is correct.

mtaillefumier · 2026-03-11T12:21:52Z

cscs-ci run beverin

chaoos · 2026-03-11T12:24:55Z

For future reference some information about the mi300:
rocminfo.txt
rocm-smi.txt
lscpu.txt
numactl.txt

mtaillefumier · 2026-03-11T12:24:57Z

@chaoos : The ci/cd is properly set. Only I can set it up correctly dues to some restrictions on our side.

chaoos · 2026-03-11T12:27:19Z

@chaoos : The ci/cd is properly set. Only I can set it up correctly dues to some restrictions on our side.

I see, shall we zoom briefly?

kostrzewa · 2026-03-11T12:33:42Z

@mtaillefumier could you please let me know which exact QUDA commit was compiled here? I currently can't compile the develop head commit on Lumi-G (with rocm-6.3.4 or rocm-6.4.4) and have to resort to working with the feature/prefetch2 branch which, however, seems to introduce severe performance regressions.

chaoos · 2026-03-11T12:40:20Z

@kostrzewa The CI now compiles against the develop branch, which will fail. The test I made above is against the feature/prefetch2 branch, which did compile (hip@6.3.3).

I cannot say anything about performance since I cannot run.

kostrzewa · 2026-03-11T12:41:07Z

Thanks, I should have seen that above! tmlqcd/quda-prefetch2:2375574724

mtaillefumier · 2026-03-11T14:03:24Z

@kostrzewa: I use the commit used in the ci/cd. I am not sure if it help you or not.

mtaillefumier · 2026-03-11T14:04:59Z

I still need to test the build locally which simplify work a lot.

chaoos · 2026-03-11T15:44:37Z

@mtaillefumier Well, right now the build failed because of the timelimit of 2h. Nevertheless quda will fail to build because of the issues mentioned by @kostrzewa .

kostrzewa · 2026-03-11T15:51:20Z

See also lattice/quda#1604 (comment) where maybe we'll get a reply from the QUDA team or AMD.

Increases the time limit to 8 hours

mtaillefumier · 2026-03-11T18:49:24Z

cscs-ci run beverin

kostrzewa · 2026-03-14T08:07:16Z

The very latest QUDA develop commit should compile fine again.

chaoos · 2026-03-16T14:34:46Z

cscs-ci run beverin

kostrzewa · 2026-03-20T14:41:31Z

When testing / benchmarking on beverin, can you record if you see any kind of host memory leak? HMC runs on Lumi-G, besides other problems, are currently plagued by this.

Adding

# Function to print datetime and memory usage
print_system_info() {
  memfile=mem_usage_${SLURM_JOB_ID}.txt
  rm ${memfile}
  while true; do
    # Print current datetime and memory usage
    echo "$(date '+%Y-%m-%d %H:%M:%S') - Memory Usage: $(free -m | awk '/Mem:/ {print $3 " / " $2}') MB" \
      >> ${memfile}
    # Wait for 5 seconds before next update
    sleep 5
  done
}
# Run the function in the background
print_system_info &
BG_PID=$!
# Trap the EXIT signal to kill the background process 
trap 'kill $BG_PID 2>/dev/null' EXIT

to my job script I see:

2026-03-20 00:13:52 - Memory Usage: 22571 / 515214 MB
2026-03-20 00:13:57 - Memory Usage: 22610 / 515214 MB
2026-03-20 00:14:02 - Memory Usage: 22693 / 515214 MB
[...]
2026-03-20 00:38:35 - Memory Usage: 162781 / 515214 MB
2026-03-20 00:38:40 - Memory Usage: 162830 / 515214 MB
[...]
2026-03-20 08:57:38 - Memory Usage: 295013 / 515214 MB
2026-03-20 08:57:43 - Memory Usage: 295014 / 515214 MB

which is something that @Finkenrath had observed in the past before the 23.09 / rocm-5.6.1 software stack became available on Lumi-G.

kostrzewa · 2026-03-20T14:42:38Z

And I see a huge slowdown (of everything) as a result. My suspicion currently is that the leak is within Cray MPICH.

kostrzewa · 2026-03-21T08:48:39Z

Issues remaining on LUMI-G:

sub-par performance -- resolved by Use __builtin_bit_cast instead of memcpy as the type punning lattice/quda#1622
solver instability and excessive iteration counts -- resolved by CrayEnv / PrgEnv-gnu/8.6.0 / rocm-6.4.4
memory leak described above and cumulative performance degradation -- unresolved

chaoos · 2026-03-25T17:29:15Z

I finally got an update on the pipeline on beverin. I'm sorry this took so long, but I have only a finite amount of frustration I can bear per day. I still have to check the memory leak @kostrzewa

State (manual build)

I'm using prgenv-gnu/25.07-6.3.3:v12 on beverin and compiling all dependencies (quda, lemon, c-lime) and tmlqcd manually. This finally ran, but only using QUDA_ENABLE_P2P=0!

✅ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_P2P=0`

On the login node with no uenv loaded, but a node allocated with salloc:

QUDA_ENABLE_TUNING=1 QUDA_ENABLE_P2P=0 LD_LIBRARY_PATH=${quda_instdir}/lib:${LD_LIBRARY_PATH} srun --uenv=prgenv-gnu --view=default -n 4 ${tmlqcd_instdir}/bin/hmc_tm -f ${tmlqcd_srcdir}/doc/sample-input/sample-hmc-quda-cscs-beverin.input

finally runs without error.

numdiff -r 1.2e-6 -X 1:22 -X 1:5-21 -X 2:22 -X 2:5-21 output.data ../tmLQCD/doc/sample-output/hmc-quda-cscs/output.data

gives

----------------
##7       #:3   <== -0.561931713950
##7       #:3   ==> -0.561928249197
@ Absolute error = 3.4647530000e-6, Relative error = 6.1658281194e-6
##7       #:4   <== 1.754058e+00
##7       #:4   ==> 1.754051e+00
@ Absolute error = 7.0000000000e-6, Relative error = 3.9907619562e-6

+++  File "output.data" differs from file "../tmLQCD/doc/sample-output/hmc-quda-cscs/output.data"

and

for i in $(seq 0 2 18); do
  f=onlinemeas.$(printf %06d $i);
  numdiff -r 5e-4 ${f} ../tmLQCD/doc/sample-output/hmc-quda-cscs/${f};
done

gives

+++  Files "onlinemeas.000000" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000000" are equal

+++  Files "onlinemeas.000002" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000002" are equal

+++  Files "onlinemeas.000004" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000004" are equal

+++  Files "onlinemeas.000006" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000006" are equal

+++  Files "onlinemeas.000008" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000008" are equal

+++  Files "onlinemeas.000010" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000010" are equal

+++  Files "onlinemeas.000012" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000012" are equal

+++  Files "onlinemeas.000014" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000014" are equal

+++  Files "onlinemeas.000016" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000016" are equal

+++  Files "onlinemeas.000018" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000018" are equal

The other setups fail. I assume for reasons similar to lattice/quda#1623.
However, below are the outputs:

❌ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=0` with default QUDA_ENABLE_P2P

fails with:

FATAL ERROR
  Within invert_eo_degenerate_quda (reported by node 0):
    QUDA-MG solver failed to converge in 1000 iterations even after forced setup refresh. Terminating!

❌ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=1 QUDA_ENABLE_P2P=0`

fails with:

Bus error (core dumped)

❌ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=1` with default QUDA_ENABLE_P2P

fails with:

Multishift solver appears to have diverged on shift 0 with residual      -nan

State (uenv build mimicing CI pipeline)

I successfully built tmlqcd/quda-develop:2407908471 through the CSCS pipeline. This uenv image is a copy of this one (https://github.com/eth-cscs/alps-uenv/tree/main/recipes/prgenv-gnu/25.7/amdgpu). I've just added root-dependencies:

  - numdiff
  - quda@develop +qdp +multigrid +twisted_clover +twisted_mass amdgpu_target=gfx942
  - lemonio
  - c-lime

The build took an eternity, because it has so many dependencies, for the (cached) output see (https://cicd-ext-mw.cscs.ch/ci/job/result/3690753405420143/64239695/13642343302).

Then build tmlqcd on the compute node:

uenv start tmlqcd/quda-develop:2407908471 --view=default
srun --pty bash

Build tmlqcd against QUDA, HIP, lemon, c-lime all pointing to the spack build /user-environment/env/default using this command:

rm -rf "${tmlqcd_instdir}"
mkdir -p "${tmlqcd_instdir}"

rm -rf "${tmlqcd_builddir}"
mkdir -p "${tmlqcd_builddir}"

cd "${tmlqcd_builddir}"

CC=$(which mpicc) \
CXX=$(which mpicxx) \
CFLAGS="-O3 -fopenmp -mtune=znver4 -march=znver4" \
CXXFLAGS="-O3 -fopenmp -mtune=znver4 -march=znver4" \
LDFLAGS="-fopenmp" \
${tmlqcd_srcdir}/configure \
  --enable-quda_experimental \
  --enable-mpi \
  --enable-omp \
  --with-mpidimension=4 \
  --enable-alignment=32 \
  --with-qudadir="/user-environment/env/default" \
  --with-limedir="/user-environment/env/default" \
  --with-lemondir="/user-environment/env/default" \
  --with-hipdir="/user-environment/env/default/lib" \
  --with-lapack="-lopenblas -L/user-environment/env/default/lib" \
  --prefix="${tmlqcd_instdir}"

cd -
make -j -C "${tmlqcd_builddir}"
make install -C "${tmlqcd_builddir}"

Run on the login node:

QUDA_ENABLE_TUNING=1 QUDA_ENABLE_P2P=0 srun --uenv=tmlqcd/quda-develop:2407908471 --view=default -n 4 ${tmlqcd_instdir}/bin/hmc_tm -f ${tmlqcd_srcdir}/doc/sample-input/sample-hmc-quda-cscs-beverin.input

now works. The manual verification looks good:

numdiff -r 1.2e-6 -X 1:22 -X 1:5-21 -X 2:22 -X 2:5-21 output.data ../tmLQCD/doc/sample-output/hmc-quda-cscs/output.data
----------------
##7       #:3   <== -0.561931728531
##7       #:3   ==> -0.561928249197
@ Absolute error = 3.4793340000e-6, Relative error = 6.1917762721e-6
##7       #:4   <== 1.754058e+00
##7       #:4   ==> 1.754051e+00
@ Absolute error = 7.0000000000e-6, Relative error = 3.9907619562e-6

+++  File "output.data" differs from file "../tmLQCD/doc/sample-output/hmc-quda-cscs/output.data"


for i in $(seq 0 2 18); do
  f=onlinemeas.$(printf %06d $i);
  numdiff -r 5e-4 ${f} ../tmLQCD/doc/sample-output/hmc-quda-cscs/${f};
done

+++  Files "onlinemeas.000000" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000000" are equal

+++  Files "onlinemeas.000002" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000002" are equal

+++  Files "onlinemeas.000004" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000004" are equal

+++  Files "onlinemeas.000006" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000006" are equal

+++  Files "onlinemeas.000008" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000008" are equal

+++  Files "onlinemeas.000010" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000010" are equal

+++  Files "onlinemeas.000012" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000012" are equal

+++  Files "onlinemeas.000014" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000014" are equal

+++  Files "onlinemeas.000016" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000016" are equal

+++  Files "onlinemeas.000018" and "../tmLQCD/doc/sample-output/hmc-quda-cscs/onlinemeas.000018" are equal

State (uenv build with minimal dependencies)

Next step is to determine the minimal set of dependencies required for tmlqcd/quda to run on beverin and then reproduce the above running example. Then the CI pipeline has to be adjusted appropriately.

ci: added mi300 tests on beverin

faa76d8

Beverin uses a different authentification mechanism

be4b0b1

Add client ID and secret to beverin pipeline

02c82ce

Add F7T_CLIENT_ID and F7T_CLIENT_SECRET variables for build stage.

pass right compiler flags for gh200 or mi300

f6ba11d

Update 00-variables.yml

4737a9d

fix typo

Fix variable names for beverin pipeline secrets

0a5445f

Add SLURM_TIMELIMIT variable to build stage

6a2e926

Increases the time limit to 8 hours

Conversation

chaoos commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaoos commented Mar 9, 2026

Uh oh!

chaoos commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaoos commented Mar 10, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

kostrzewa commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

kostrzewa commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

chaoos commented Mar 11, 2026

Uh oh!

kostrzewa commented Mar 11, 2026

Uh oh!

mtaillefumier commented Mar 11, 2026

Uh oh!

kostrzewa commented Mar 14, 2026

Uh oh!

chaoos commented Mar 16, 2026

Uh oh!

kostrzewa commented Mar 20, 2026

Uh oh!

kostrzewa commented Mar 20, 2026

Uh oh!

kostrzewa commented Mar 21, 2026

Uh oh!

chaoos commented Mar 25, 2026

State (manual build)

✅ 4 GPUs: QUDA_ENABLE_TUNING=1 QUDA_ENABLE_P2P=0

❌ 4 GPUs: QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=0 with default QUDA_ENABLE_P2P

chaoos commented Mar 9, 2026 •

edited

Loading

chaoos commented Mar 9, 2026 •

edited

Loading

chaoos commented Mar 11, 2026 •

edited

Loading

✅ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_P2P=0`

❌ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=0` with default QUDA_ENABLE_P2P

❌ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=1 QUDA_ENABLE_P2P=0`

❌ 4 GPUs: `QUDA_ENABLE_TUNING=1 QUDA_ENABLE_GDR=1` with default QUDA_ENABLE_P2P