Skip to content

Conversation

@smoors
Copy link
Collaborator

@smoors smoors commented Jun 18, 2025

this adds the BLAS test included in the BLIS sources, using a modified/simplified Makefile so it works with FlexiBLAS.

supported BLAS libs: OpenBLAS, BLIS, AOCL-BLAS, imkl

ready for testing and review.

notes:

  • executables are compiled in every run-only test job, no separate compile-only test jobs:

    • the compilation is very fast and the generated executables are small
    • it keeps the code simpler
    • it avoids extra queue times for compile-only test jobs

    the downside is that the executables have to be built for every test case, while with a compile-only test job this has to be done only once for each architecture. i'm fine with adding a compile-only job if you prefer.

  • no programming environments (environs) are used:

    • they are not flexible/portable: you need to either create a new environ for each blas-toolchain combination, or create "fat" environs with all the blas libs included, which means that all blas libs need to be installed in the system, even if you don't use them.
  • for BLIS and AOCL-BLAS, no loop-specific BLIS_**_NT environment variables are set, only the generic OMP_NUM_THREADS.

update 2025-08-16:

  • the test now works without predefined module lists, but it creates them on the fly based on what is available on the system. this makes the test more portable and requires less maintenance. however, it's not perfect, because the splitting of the module names into (name, version, toolchain, versionsuffix) assumes that there are no hyphens in the version and toolchain name, see split_module(). if needed we can always add exceptions, like is already done for intel-compilers.
  • also added the 8_core scale to accomodate nodes with high cpu counts.
  • tested with local and EESSI modules. with EESSI, imkl and AOCL-BLAS tests are skipped because they are not (yet?) available in EESSI.
  • also added generic function select_matching_modules, which selects from a list of modules the ones that match a reference module (i.e. their toolchains are compatible). this function requires the availability of EasyBuild (the test will be skipped if it's not available).

@smoors smoors marked this pull request as draft June 18, 2025 11:49
@smoors smoors marked this pull request as ready for review June 19, 2025 18:01
@satishskamath
Copy link
Collaborator

Sorry for the late checking @smoors .

I tried running these tests today but I am getting empty job script files when I try to run.

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_BLAS_BLIS_mt_8ca83808"
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=128
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:10:0
#SBATCH -p rome
#SBATCH --export=None
#SBATCH --mem=218749M
module load 2023

Now I am trying to figure out why.

@smoors
Copy link
Collaborator Author

smoors commented Jul 5, 2025

@satishskamath can you try again to see if your issue is fixed with the latest update?

@casparvl
Copy link
Collaborator

casparvl commented Jul 16, 2025

@smoors I'm not sure how you got this to work for EESSI - it shouldn't. I'm getting:

+ ./test_gemm_flexiblas_mt.x -d s -c nn -i auto -p '200 2000 200' -r 5
./test_gemm_flexiblas_mt.x: error while loading shared libraries: libflexiblas.so.3: cannot open shared object file: No such file or directory
+ set +x
+ ./test_hemm_flexiblas_mt.x -d s -c ll -i auto -p '200 2000 200' -r 5
./test_hemm_flexiblas_mt.x: error while loading shared libraries: libflexiblas.so.3: cannot open shared object file: No such file or directory
... etc

And this is totally expected if you 'just' compile something on top of EESSI. The reason is that we don't set LD_LIBRARY_PATH. So,

$ ldd test_gemm_flexiblas_mt.x
        linux-vdso.so.1 (0x00007fff7c7fe000)
        libflexiblas.so.3 => not found
        libm.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib64/libm.so.6 (0x000014983350b000)
        libgomp.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib/gcc/x86_64-pc-linux-gnu/10/libgomp.so.1 (0x00001498334cc000)
        libc.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib64/libc.so.6 (0x00001498332fb000)
        /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib64/ld-linux-x86-64.so.2 (0x00001498335f8000)

Everything in EESSI is RPATH-ed, and everything built on top should be RPATH-ed as well. This can be done using the buildenv modules (see https://www.eessi.io/docs/using_eessi/building_on_eessi/#manually-building-software-on-top-of-eessi-without-easybuild ), which essentially set a ton of environment variables, but more importantly, they add compiler wrappers to your PATH - just like the compiler wrappers used by EasyBuild to RPATH things.

With that, in the staging dir, I do:

module load buildenv/default-foss-2023b
make clean
make flexiblas-mt
$ ldd test_gemm_flexiblas_mt.x
        linux-vdso.so.1 (0x00007fff0dd8d000)
        libflexiblas.so.3 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/FlexiBLAS/3.3.1-GCC-13.2.0/lib64/libflexiblas.so.3 (0x000014773c000000)
        libm.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/../lib64/libm.so.6 (0x000014773c48b000)
        libgomp.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/13.2.0/lib64/libgomp.so.1 (0x000014773c439000)
        libc.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/../lib64/libc.so.6 (0x000014773be2f000)
        /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib64/ld-linux-x86-64.so.2 (0x000014773c571000)
        libgfortran.so.5 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/13.2.0/lib/gcc/x86_64-pc-linux-gnu/13.2.0/../../../../lib64/libgfortran.so.5 (0x000014773ba00000)
        libquadmath.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/13.2.0/lib/gcc/x86_64-pc-linux-gnu/13.2.0/../../../../lib64/libquadmath.so.0 (0x000014773c3ee000)
        libgcc_s.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/13.2.0/lib64/libgcc_s.so.1 (0x000014773be0a000)

Tada, it now finds libflexiblas.so.3, because it is in the RPATH:

$ readelf -a test_gemm_flexiblas_mt.x | grep RPATH | grep -i FlexiBlas
 0x000000000000000f (RPATH)              Library rpath: [lots:of:paths:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/FlexiBLAS/3.3.1-GCC-13.2.0/lib64:lots:more:paths]

Now, I don't know how to properly implement this in a generalized test (it's clear that this is our first compiled test - we're hitting all kinds of stuff). Should we start defining programming environments, that then load this? Should we make some clever code that loads this module if EESSI is being used? If so, how do we determine the correct version: your FlexiBLAS is at GCC level, you need knowledge on the toolchain hierarchy to know which foss version (and thus which buildenv module version) should be loaded with that. Yes, it's fixed, and limited, so we could hard-code it, but that makes it non-forward-compatible with future EasyBuild toolchains - we don't know the hierarchy for those yet.

Honestly, I'm not sure how to resolve this, we need to think about it... I guess one option would be to just pass the right arguments for RPATH-ing ourselves, and not use the buildenv at all. One question is what we're missing out on then (i.e. what's the effect of the other env vars set by buildnev) - and if that's still a 'good' test of EESSI, even though the docs state that you should use buildenv if you compile your own code on top...

@casparvl
Copy link
Collaborator

casparvl commented Jul 16, 2025

On the upside, on our local module stack, this works out of the box:

reframe -c test-suite/eessi/testsuite/tests/libs/blas/blas.py --run --system=snellius:rome -t 1_8_node
...
[       OK ] (1/2) EESSI_BLAS_BLIS_mt %module_name=['FlexiBLAS/3.4.4-GCC-13.3.0', 'BLIS/1.0-GCC-13.3.0'] %scale=1_8_node /0aa602fd @snellius:rome+default
P: sgemm: 153.13 GFLOPS (r:0, l:None, u:None)
P: shemm: 1114.61 GFLOPS (r:0, l:None, u:None)
P: sherk: 972.95 GFLOPS (r:0, l:None, u:None)
P: strmm: 772.03 GFLOPS (r:0, l:None, u:None)
P: strsm: 958.69 GFLOPS (r:0, l:None, u:None)
P: dgemm: 539.06 GFLOPS (r:0, l:None, u:None)
P: dhemm: 547.26 GFLOPS (r:0, l:None, u:None)
P: dherk: 494.56 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 388.08 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 471.23 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1181.78 GFLOPS (r:0, l:None, u:None)
P: chemm: 1167.15 GFLOPS (r:0, l:None, u:None)
P: cherk: 951.8 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1008.75 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 1043.49 GFLOPS (r:0, l:None, u:None)
P: zgemm: 553.32 GFLOPS (r:0, l:None, u:None)
P: zhemm: 557.42 GFLOPS (r:0, l:None, u:None)
P: zherk: 508.14 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 506.04 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 503.01 GFLOPS (r:0, l:None, u:None)
[       OK ] (2/2) EESSI_BLAS_OpenBLAS_mt %module_name=['FlexiBLAS/3.4.4-GCC-13.3.0', 'BLIS/1.0-GCC-13.3.0', 'OpenBLAS/0.3.27-GCC-13.3.0'] %scale=1_8_node /c027f8ba @snellius:rome+default
P: sgemm: 1061.67 GFLOPS (r:0, l:None, u:None)
P: shemm: 1069.2 GFLOPS (r:0, l:None, u:None)
P: sherk: 735.02 GFLOPS (r:0, l:None, u:None)
P: strmm: 980.96 GFLOPS (r:0, l:None, u:None)
P: strsm: 767.65 GFLOPS (r:0, l:None, u:None)
P: dgemm: 429.71 GFLOPS (r:0, l:None, u:None)
P: dhemm: 436.88 GFLOPS (r:0, l:None, u:None)
P: dherk: 364.39 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 466.4 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 304.96 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1202.06 GFLOPS (r:0, l:None, u:None)
P: chemm: 1203.29 GFLOPS (r:0, l:None, u:None)
P: cherk: 997.35 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1105.3 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 912.27 GFLOPS (r:0, l:None, u:None)
P: zgemm: 595.29 GFLOPS (r:0, l:None, u:None)
P: zhemm: 594.09 GFLOPS (r:0, l:None, u:None)
P: zherk: 470.45 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 550.73 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 460.83 GFLOPS (r:0, l:None, u:None)
[----------] all spawned checks have finished

[  PASSED  ] Ran 2/2 test case(s) from 2 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Wed Jul 16 15:33:02 2025+0200
Log file(s) saved in '/gpfs/home4/casparl/EESSI/reframe_runs/logs/reframe_20250716_153130.log'

That first one (sgemm with BLIS) is terrible though. Didn't you have some issue with the BLIS performance as well, initially? What was the solution again there?

@satishskamath
Copy link
Collaborator

satishskamath commented Jul 17, 2025

[satishk@tcn3 projects]$ reframe -C eessi_reframe/settings_example.py -c test-suite/eessi/testsuite/tests/libs/blas/blas.py --system=snellius_eessi:cpu_genoa -t 
"1_8" -r
[ReFrame Setup]
  version:           4.6.1
  command:           '/home/satishk/.local/easybuild/RHEL8/2023/software/ReFrame/4.6.1/bin/reframe -C eessi_reframe/settings_example.py -c test-suite/eessi/tests
uite/tests/libs/blas/blas.py --system=snellius_eessi:cpu_genoa -t 1_8 -r'
  launched by:       [email protected]
  working directory: '/gpfs/home5/satishk/projects'
  settings files:    '<builtin>', 'eessi_reframe/settings_example.py'
  check search path: '/gpfs/home5/satishk/projects/test-suite/eessi/testsuite/tests/libs/blas/blas.py'
  stage directory:   '/scratch-shared/satishk/reframe_output'
  output directory:  '/gpfs/home5/satishk/projects/output'
  log files:         '/gpfs/home5/satishk/projects/reframe.log'

WARNING: skipping test 'EESSI_BLAS_AOCLBLAS_st': test has one or more undefined parameters
WARNING: skipping test 'EESSI_BLAS_AOCLBLAS_mt': test has one or more undefined parameters
[==========] Running 5 check(s)
[==========] Started on Thu Jul 17 09:58:15 2025+0200

[----------] start processing checks
[ RUN      ] EESSI_BLAS_BLIS_mt %module_name=['FlexiBLAS/3.3.1-GCC-12.3.0', 'BLIS/0.9.0-GCC-12.3.0'] %scale=1_8_node /5b508380 @snellius_eessi:cpu_genoa+default
[ RUN      ] EESSI_BLAS_BLIS_mt %module_name=['FlexiBLAS/3.3.1-GCC-13.2.0', 'BLIS/0.9.0-GCC-13.2.0'] %scale=1_8_node /b5d29a20 @snellius_eessi:cpu_genoa+default
[ RUN      ] EESSI_BLAS_imkl_mt %module_name=['FlexiBLAS/3.3.1-GCC-13.2.0', 'BLIS/0.9.0-GCC-13.2.0', 'imkl/2023.1.0'] %scale=1_8_node /af71efb6 @snellius_eessi:c
pu_genoa+default
[ RUN      ] EESSI_BLAS_OpenBLAS_mt %module_name=['FlexiBLAS/3.3.1-GCC-12.3.0', 'BLIS/0.9.0-GCC-12.3.0', 'OpenBLAS/0.3.23-GCC-12.3.0'] %scale=1_8_node /2699cea7 
@snellius_eessi:cpu_genoa+default
[ RUN      ] EESSI_BLAS_OpenBLAS_mt %module_name=['FlexiBLAS/3.3.1-GCC-13.2.0', 'BLIS/0.9.0-GCC-13.2.0', 'OpenBLAS/0.3.24-GCC-13.2.0'] %scale=1_8_node /20514c40 
@snellius_eessi:cpu_genoa+default
[       OK ] (1/5) EESSI_BLAS_BLIS_mt %module_name=['FlexiBLAS/3.3.1-GCC-12.3.0', 'BLIS/0.9.0-GCC-12.3.0'] %scale=1_8_node /5b508380 @snellius_eessi:cpu_genoa+de
fault
P: sgemm: 1560.73 GFLOPS (r:0, l:None, u:None)
P: shemm: 1561.54 GFLOPS (r:0, l:None, u:None)
P: sherk: 1368.87 GFLOPS (r:0, l:None, u:None)
P: strmm: 1221.3 GFLOPS (r:0, l:None, u:None)
P: strsm: 1405.89 GFLOPS (r:0, l:None, u:None)
P: dgemm: 784.91 GFLOPS (r:0, l:None, u:None)
P: dhemm: 781.86 GFLOPS (r:0, l:None, u:None)
P: dherk: 717.46 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 598.88 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 704.71 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1674.32 GFLOPS (r:0, l:None, u:None)
P: chemm: 1664.29 GFLOPS (r:0, l:None, u:None)
P: cherk: 1429.26 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1465.66 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 1487.7 GFLOPS (r:0, l:None, u:None)
P: zgemm: 822.55 GFLOPS (r:0, l:None, u:None)
P: zhemm: 828.5 GFLOPS (r:0, l:None, u:None)
P: zherk: 746.43 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 728.12 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 765.54 GFLOPS (r:0, l:None, u:None)
[       OK ] (2/5) EESSI_BLAS_BLIS_mt %module_name=['FlexiBLAS/3.3.1-GCC-13.2.0', 'BLIS/0.9.0-GCC-13.2.0'] %scale=1_8_node /b5d29a20 @snellius_eessi:cpu_genoa+de
fault
P: sgemm: 1578.75 GFLOPS (r:0, l:None, u:None)
P: shemm: 1579.26 GFLOPS (r:0, l:None, u:None)
P: sherk: 1370.59 GFLOPS (r:0, l:None, u:None)
P: strmm: 1199.61 GFLOPS (r:0, l:None, u:None)
P: strsm: 1355.06 GFLOPS (r:0, l:None, u:None)
P: dgemm: 783.69 GFLOPS (r:0, l:None, u:None)
P: dhemm: 784.92 GFLOPS (r:0, l:None, u:None)
P: dherk: 718.89 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 601.41 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 696.44 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1680.62 GFLOPS (r:0, l:None, u:None)
P: chemm: 1673.34 GFLOPS (r:0, l:None, u:None)
P: cherk: 1454.25 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1479.37 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 1501.31 GFLOPS (r:0, l:None, u:None)
P: zgemm: 823.21 GFLOPS (r:0, l:None, u:None)
P: zhemm: 829.11 GFLOPS (r:0, l:None, u:None)
P: zherk: 759.7 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 731.7 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 763.26 GFLOPS (r:0, l:None, u:None)
[       OK ] (3/5) EESSI_BLAS_imkl_mt %module_name=['FlexiBLAS/3.3.1-GCC-13.2.0', 'BLIS/0.9.0-GCC-13.2.0', 'imkl/2023.1.0'] %scale=1_8_node /af71efb6 @snellius_e
essi:cpu_genoa+default
P: sgemm: 1577.63 GFLOPS (r:0, l:None, u:None)
P: shemm: 1246.74 GFLOPS (r:0, l:None, u:None)
P: sherk: 1019.86 GFLOPS (r:0, l:None, u:None)
P: strmm: 1098.66 GFLOPS (r:0, l:None, u:None)
P: strsm: 867.81 GFLOPS (r:0, l:None, u:None)
P: dgemm: 724.53 GFLOPS (r:0, l:None, u:None)
P: dhemm: 584.89 GFLOPS (r:0, l:None, u:None)
P: dherk: 311.49 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 533.33 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 351.66 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1628.25 GFLOPS (r:0, l:None, u:None)
P: chemm: 1425.93 GFLOPS (r:0, l:None, u:None)
P: cherk: 1233.56 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1333.8 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 1122.86 GFLOPS (r:0, l:None, u:None)
P: zgemm: 806.34 GFLOPS (r:0, l:None, u:None)
P: zhemm: 688.55 GFLOPS (r:0, l:None, u:None)
P: zherk: 666.01 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 680.35 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 489.11 GFLOPS (r:0, l:None, u:None)
[       OK ] (4/5) EESSI_BLAS_OpenBLAS_mt %module_name=['FlexiBLAS/3.3.1-GCC-13.2.0', 'BLIS/0.9.0-GCC-13.2.0', 'OpenBLAS/0.3.24-GCC-13.2.0'] %scale=1_8_node /205
14c40 @snellius_eessi:cpu_genoa+default
P: sgemm: 1301.87 GFLOPS (r:0, l:None, u:None)
P: shemm: 1308.94 GFLOPS (r:0, l:None, u:None)
P: sherk: 1332.77 GFLOPS (r:0, l:None, u:None)
P: strmm: 1445.53 GFLOPS (r:0, l:None, u:None)
P: strsm: 1267.73 GFLOPS (r:0, l:None, u:None)
P: dgemm: 620.23 GFLOPS (r:0, l:None, u:None)
P: dhemm: 618.15 GFLOPS (r:0, l:None, u:None)
P: dherk: 594.06 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 701.89 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 569.24 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1652.29 GFLOPS (r:0, l:None, u:None)
P: chemm: 1659.62 GFLOPS (r:0, l:None, u:None)
P: cherk: 1527.36 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1664.68 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 1390.31 GFLOPS (r:0, l:None, u:None)
P: zgemm: 850.86 GFLOPS (r:0, l:None, u:None)
P: zhemm: 855.21 GFLOPS (r:0, l:None, u:None)
P: zherk: 785.98 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 806.1 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 714.65 GFLOPS (r:0, l:None, u:None)
[       OK ] (5/5) EESSI_BLAS_OpenBLAS_mt %module_name=['FlexiBLAS/3.3.1-GCC-12.3.0', 'BLIS/0.9.0-GCC-12.3.0', 'OpenBLAS/0.3.23-GCC-12.3.0'] %scale=1_8_node /269
9cea7 @snellius_eessi:cpu_genoa+default
P: sgemm: 1295.88 GFLOPS (r:0, l:None, u:None)
P: shemm: 74.08 GFLOPS (r:0, l:None, u:None)
P: sherk: 72.93 GFLOPS (r:0, l:None, u:None)
P: strmm: 1448.61 GFLOPS (r:0, l:None, u:None)
P: strsm: 1273.03 GFLOPS (r:0, l:None, u:None)
P: dgemm: 622.72 GFLOPS (r:0, l:None, u:None)
P: dhemm: 36.24 GFLOPS (r:0, l:None, u:None)
P: dherk: 35.36 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 704.95 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 555.48 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1656.18 GFLOPS (r:0, l:None, u:None)
P: chemm: 74.85 GFLOPS (r:0, l:None, u:None)
P: cherk: 73.71 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1672.58 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 1359.01 GFLOPS (r:0, l:None, u:None)
P: zgemm: 855.67 GFLOPS (r:0, l:None, u:None)
P: zhemm: 37.42 GFLOPS (r:0, l:None, u:None)
P: zherk: 36.92 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 805.9 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 705.3 GFLOPS (r:0, l:None, u:None)
[----------] all spawned checks have finished

[  PASSED  ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu Jul 17 10:00:13 2025+0200
Log file(s) saved in '/gpfs/home5/satishk/projects/reframe.log'
[satishk@tcn3 projects]$ 

Sorry for the confusion, it does run. The earlier error happened because I ran from a ReFrame from the 2024 stack and we have a hierarchical setup here.

@smoors
Copy link
Collaborator Author

smoors commented Aug 13, 2025

Interesting observation: if I run with -t 1_8_node, I only get the _mt tests. If I run with -t 1_node, I also get the _st tests. Was that intentional?

yes, this is done to run the single-threaded test on a node with no other jobs running on it. a similar thing was done for the OSU test.

measure_memory_usage = variable(bool, value=False)
exact_memory = variable(bool, value=False)
user_executable_opts = variable(str, value='')
thread_binding = variable(str, value='None') # takes priority over compact_thread_binding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why do we need both this and compact_thread_binding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was under the impression that you can't override the thread_binding variable in a test class, but that's actually not true. you can override it, but not in a base class.

fixed in 8a948e4

hooks.add_buildenv_module(self)

thread_binding = self.thread_binding.lower()
if thread_binding == 'true' or (thread_binding == 'none' and self.compact_thread_binding):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If thread_binding is a variable, what if it's not True or False, but someone set it to compact? Or made a type Ture?. I think we should catch all cases we know (true / compact => do compact binding, false => do nothing, anything else => do nothing and warn the user that an invalid value was set for thread_binding).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i opted for a hard error in case an invalid value was set

fixed in 8a948e4

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran with reframe -c eessi/testsuite/tests/libs/blas/ --run -t '1_8_node|1_core', everything worked perfectly fine.

I left some minor comments / questions.

@smoors
Copy link
Collaborator Author

smoors commented Oct 22, 2025

@casparvl thanks for the great comments, should all be addressed now.

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rerunning one final time, but looks good to me. I can already confirm the Metalwalls warning was resolved.

@casparvl
Copy link
Collaborator

As a final note: everything worked. I do still see some weird differences in the performance of this test for the 2023a modules for some BLAS routines. E.g.

[       OK ] ( 5/12) EESSI_BLAS_OpenBLAS_mt %scale=1_8_node %module_name=['buildenv/default-foss-2022b', 'OpenBLAS/0.3.21-GCC-12.2.0', 'BLIS/0.9.0-GCC-12.2.0'] /458f9a96 @snellius:rome+default
P: sgemm: 923.99 GFLOPS (r:0, l:None, u:None)
P: shemm: 948.58 GFLOPS (r:0, l:None, u:None)
P: sherk: 785.52 GFLOPS (r:0, l:None, u:None)
P: strmm: 965.33 GFLOPS (r:0, l:None, u:None)
P: strsm: 845.65 GFLOPS (r:0, l:None, u:None)
P: dgemm: 441.79 GFLOPS (r:0, l:None, u:None)
P: dhemm: 444.63 GFLOPS (r:0, l:None, u:None)
P: dherk: 395.95 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 465.16 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 350.9 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1199.81 GFLOPS (r:0, l:None, u:None)
P: chemm: 1191.39 GFLOPS (r:0, l:None, u:None)
P: cherk: 994.41 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1090.88 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 953.42 GFLOPS (r:0, l:None, u:None)
P: zgemm: 604.27 GFLOPS (r:0, l:None, u:None)
P: zhemm: 602.93 GFLOPS (r:0, l:None, u:None)
P: zherk: 472.98 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 550.21 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 492.32 GFLOPS (r:0, l:None, u:None)
...
[       OK ] ( 7/12) EESSI_BLAS_OpenBLAS_mt %scale=1_8_node %module_name=['buildenv/default-foss-2023a', 'OpenBLAS/0.3.23-GCC-12.3.0', 'BLIS/0.9.0-GCC-12.3.0'] /2f9278f9 @snellius:rome+default
P: sgemm: 950.08 GFLOPS (r:0, l:None, u:None)
P: shemm: 75.68 GFLOPS (r:0, l:None, u:None)
P: sherk: 73.89 GFLOPS (r:0, l:None, u:None)
P: strmm: 970.76 GFLOPS (r:0, l:None, u:None)
P: strsm: 854.4 GFLOPS (r:0, l:None, u:None)
P: dgemm: 443.53 GFLOPS (r:0, l:None, u:None)
P: dhemm: 37.91 GFLOPS (r:0, l:None, u:None)
P: dherk: 34.89 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 457.9 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 345.59 GFLOPS (r:0, l:None, u:None)
P: cgemm: 1173.61 GFLOPS (r:0, l:None, u:None)
P: chemm: 78.38 GFLOPS (r:0, l:None, u:None)
P: cherk: 76.37 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 1069.47 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 956.5 GFLOPS (r:0, l:None, u:None)
P: zgemm: 603.02 GFLOPS (r:0, l:None, u:None)
P: zhemm: 39.26 GFLOPS (r:0, l:None, u:None)
P: zherk: 37.44 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 552.48 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 495.06 GFLOPS (r:0, l:None, u:None)

Note the poor performance of shemm and sherk for 2023a. Also note that it is actually identical to the single-threaded performance:

[       OK ] (10/12) EESSI_BLAS_OpenBLAS_mt %scale=1_core %module_name=['buildenv/default-foss-2023a', 'OpenBLAS/0.3.23-GCC-12.3.0', 'BLIS/0.9.0-GCC-12.3.0'] /329a9db8 @snellius:rome+default
P: sgemm: 77.09 GFLOPS (r:0, l:None, u:None)
P: shemm: 76.13 GFLOPS (r:0, l:None, u:None)
P: sherk: 73.71 GFLOPS (r:0, l:None, u:None)
P: strmm: 75.85 GFLOPS (r:0, l:None, u:None)
P: strsm: 76.59 GFLOPS (r:0, l:None, u:None)
P: dgemm: 38.53 GFLOPS (r:0, l:None, u:None)
P: dhemm: 38.54 GFLOPS (r:0, l:None, u:None)
P: dherk: 35.59 GFLOPS (r:0, l:None, u:None)
P: dtrmm: 36.4 GFLOPS (r:0, l:None, u:None)
P: dtrsm: 36.36 GFLOPS (r:0, l:None, u:None)
P: cgemm: 78.86 GFLOPS (r:0, l:None, u:None)
P: chemm: 78.64 GFLOPS (r:0, l:None, u:None)
P: cherk: 76.54 GFLOPS (r:0, l:None, u:None)
P: ctrmm: 75.49 GFLOPS (r:0, l:None, u:None)
P: ctrsm: 73.47 GFLOPS (r:0, l:None, u:None)
P: zgemm: 39.75 GFLOPS (r:0, l:None, u:None)
P: zhemm: 39.5 GFLOPS (r:0, l:None, u:None)
P: zherk: 38.49 GFLOPS (r:0, l:None, u:None)
P: ztrmm: 38.14 GFLOPS (r:0, l:None, u:None)
P: ztrsm: 37.56 GFLOPS (r:0, l:None, u:None)

I'd assume the issue is not the test here. @smoors what do you think? Or does this ring a bell and do you think it could be somewhere in the test setup? Also: do you see the same behavior on your system? (note: I've used the EESSI software stack for these tests).

If you also agree @smoors : I think it's worth checking out, but it's not worth blocking this PR over. Maybe we just copy the above to an issue and take it from there...

@smoors
Copy link
Collaborator Author

smoors commented Oct 22, 2025

i see this too on 2023a using our local clusters. 2024a looks fine though.
it seems to be a common problem for all herk and hemm routines.

checking the output files, the problem arises for larger matrices:

%
% operation:              herk
% parameter combination:  ln 
% datatype:               z 
% storage combination:    ccc (default)
% induced method:         auto 
% problem size range:     200 2000 200 
% m dim specifier:        -1 (default)
% k dim specifier:        -1 (default)
% number of repeats:      5 
% alpha scalar:           1.0 (default)
% beta scalar:            1.0 (default)
% ---
% implementation:         flexiblas
% number of threads:      4
% thread affinity:        unset
%                                                                                                                                                                                                                                                                           
data_mt_zherk_flexiblas(   10, 1:3 ) = [     0     0     0.00 ];
data_mt_zherk_flexiblas(   10, 1:3 ) = [  2000  2000    56.00 ];
data_mt_zherk_flexiblas(    9, 1:3 ) = [  1800  1800    55.89 ];
data_mt_zherk_flexiblas(    8, 1:3 ) = [  1600  1600    55.70 ];
data_mt_zherk_flexiblas(    7, 1:3 ) = [  1400  1400    55.54 ];
data_mt_zherk_flexiblas(    6, 1:3 ) = [  1200  1200   185.03 ];
data_mt_zherk_flexiblas(    5, 1:3 ) = [  1000  1000   165.87 ];
data_mt_zherk_flexiblas(    4, 1:3 ) = [   800   800   167.62 ];
data_mt_zherk_flexiblas(    3, 1:3 ) = [   600   600   150.67 ];
data_mt_zherk_flexiblas(    2, 1:3 ) = [   400   400   134.79 ];
data_mt_zherk_flexiblas(    1, 1:3 ) = [   200   200    82.96 ];

the gemm routines for example don't have this problem:

%
% operation:              gemm
% parameter combination:  nn 
% datatype:               z 
% storage combination:    ccc (default)
% induced method:         auto 
% problem size range:     200 2000 200 
% m dim specifier:        -1 (default)
% n dim specifier:        -1 (default)
% k dim specifier:        -1 (default)
% number of repeats:      5 
% alpha scalar:           1.0 (default)
% beta scalar:            1.0 (default)
% ---
% implementation:         flexiblas
% number of threads:      4
% thread affinity:        unset
%
data_mt_zgemm_flexiblas(   10, 1:4 ) = [     0     0     0     0.00 ];
data_mt_zgemm_flexiblas(   10, 1:4 ) = [  2000  2000  2000   219.80 ];
data_mt_zgemm_flexiblas(    9, 1:4 ) = [  1800  1800  1800   216.88 ];
data_mt_zgemm_flexiblas(    8, 1:4 ) = [  1600  1600  1600   217.37 ];
data_mt_zgemm_flexiblas(    7, 1:4 ) = [  1400  1400  1400   214.77 ];
data_mt_zgemm_flexiblas(    6, 1:4 ) = [  1200  1200  1200   215.09 ];
data_mt_zgemm_flexiblas(    5, 1:4 ) = [  1000  1000  1000   209.53 ];
data_mt_zgemm_flexiblas(    4, 1:4 ) = [   800   800   800   208.93 ];
data_mt_zgemm_flexiblas(    3, 1:4 ) = [   600   600   600   195.78 ];
data_mt_zgemm_flexiblas(    2, 1:4 ) = [   400   400   400   186.46 ];
data_mt_zgemm_flexiblas(    1, 1:4 ) = [   200   200   200   157.28 ];

it looks indeed not a problem with the test but with this version.
this is exactly why we need a test like this. definitely worth looking into.

@casparvl casparvl merged commit 4e1b995 into EESSI:main Oct 22, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants