A leaner gesv_rbt_async with improved execution speed and accuracy by asenzz · Pull Request #62 · icl-utk-edu/magma

asenzz · 2025-08-31T10:49:39Z

This pull request introduces two solver functions using GESV RBT:

magma_int_t
magma_zgesv_rbt_async(
        const magma_bool_t refine, const magma_int_t n, const magma_int_t nrhs,
        const magmaDoubleComplex *const dA_, const magma_int_t lda,
        magmaDoubleComplex *const dB_, const magma_int_t ldb,
        magma_int_t *const info,
        const magma_int_t iter_max, const double bwdmax,
        magma_queue_t queue)

magma_int_t
magma_zgesv_rbt_refine_async(
        const magma_int_t n, const magma_int_t nrhs,
        const magmaDoubleComplex *const dA_, const magma_int_t lda,
        magmaDoubleComplex *const dB_, const magma_int_t ldb,
        magma_int_t *info,
        const magma_int_t iter_max, const double bwdmax,
        magma_queue_t queue)

As you can see they introduce 3 new parameters

iter_max - maximum number of refinement iterations, old behaviour is iter_max set to 30.
bwdmax - refinement threshold, error above this threshold will cause more refinement iterations up until iter_max is reached, set to zero to make sure all iterations specified in iter_max are executed.
queue - an already constructed MAGMA queue, that can be reused further in order to avoid spending cycles on initializing and uninitializing a new queue every time the function is called.

I wrote this function in order to speed up the tuning of support vector regression parameters where I find it perform better than the rest of solvers present in MAGMA. This version has a slightly better accuracy since it saves the best solution from all iterations, instead of using the last iteration solution. The fact that it uses async functions internally and an already initialized queue saved me about 30% of measure time when called over the process of tuning parameters, in the magnitude of 3000 to 10000 calls on a Dell R730 with 4xV100 Nvidia cards. It can use a specific CUDA stream created by the user since it take a magma_queue_t as an argument.
I haven't written unit tests for these methods but they should be similar to the old magma_zgesv_rbt function. I tested the accuracy of this implementation in my project, Tempus.

Use magma_zgesv_rbt_refine_async to refine an already present solution in dB_. I haven't tested this function.

…be updated.

CMakeLists.txt

src/zgetrs_nopiv_gpu.cpp

src/zgetrf_nopiv_gpu.cpp

src/zgesv_rbt.cpp

src/zgesv_nopiv_gpu.cpp

include/magma_auxiliary.h

There is no guarantee that the version number of the rocm-core library will match that of other ROCm libraries. It is both simpler and more robust to directly check the version number of the relevent library. For this reason, Debian has declined to package rocm-core. All other AMD GPU dependencies of magma are available in Debian Trixie. The Debian builds of magama-rocm currently use libamdhip64-dev 5.7.1, libhipblas-dev 5.5.1, and libhipsparse-dev 5.7.1. The Ubuntu 24.04 LTS was frozen with those package versions in its universe repositories. There is no single value that could accurately describe the ROCM_VERSION on Debian or Ubuntu. At the moment, a failure to find rocm-core is treated by magma as a version less than 6.0.0. Without this change, the Debian build of magma will begin to fail when either of the libamdhip64-dev or libhipblas-dev libraries are updated to a newer version. Formatting and documentation fixes to "A leaner gesv_rbt_async with improved execution speed and accuracy (PR icl-utk-edu#62)"

Don't know why it is here. Removing it builds fine on my end.

…-dated flock references.

… Makefile

…ke diagonal real.

bind C name should be magma_sync_wtime

Signed-off-by: cyy <cyyever@outlook.com>

…ompiler extensions (gnu, etc.)

…is still in sparse/control (for now); rm copy in sparse/src.

…o gesv_rbt_async

asenzz · 2026-01-02T09:39:56Z

I must of closed this by mistake, apologies. The changes are in the gesv_rbt_async branch.

nbeams · 2026-01-26T16:49:23Z

Hi @asenzz, thanks for your interest in contributing to MAGMA! After some internal discussion, we feel adding an interface to the asynchronous memory management would be too big of a departure from MAGMA's usual interface at the moment. For routines where users have requested versions without memory allocation costs, we have the "expert" interfaces (see, e.g., magma_zgetrf_expert_gpu_work, magma_zgetri_expert_gpu_work, etc.). These routines can then be used by the standard interface routines, or used directly by expert users who wish to manage their own workspace memory.

If you want to try to change this PR to use that kind of interface, you would be welcome to do so, but of course we understand if you don't want to take the time to do this. Thanks again for your interest in MAGMA, and we hope it continues to be useful to you!

asenzz · 2026-01-26T17:01:17Z

Alright Natalie, I will try to update the _async methods to the _expert_gpu_work kind of format when time permits. Best regards

Port of gesv_rbt_async() from the Bitbucket repo, unit tests need to …

b9a811a

…be updated.

asenzz mentioned this pull request Aug 31, 2025

Improved Random butterfly transform #45

Open

mgates3 requested changes Sep 2, 2025

View reviewed changes

nbeams and others added 26 commits September 2, 2025 19:22

Remove deprecated v1 interface

9ccac3a

Drop CMP0037 to fix cmake 4.0 build error

be231a8

Don't know why it is here. Removing it builds fine on my end.

remove blas_fix, which was to support macOS Accelerate

87bf295

remove macOS support

1b5739d

remove ACML support, since AMD dropped it circa 2017. Also remove out…

8f9121c

…-dated flock references.

make: remove extraneous check from make.inc files that is repeated in…

4c76e7c

… Makefile

make: add AMD AOCL / BLIS & FLAME config

f8c541b

Fix testing_zhetrf for complex [cz] case. Symmetrize with conj and ma…

49aaf1d

…ke diagonal real.

Cleanup testing_zhetrf. Make spacing more consistent, use std::swap.

afdf8d3

Update magma2.F90

471406c

bind C name should be magma_sync_wtime

Improve C/C++ standard setting in CMake

ebd7251

Fix fpic

27c3254

Signed-off-by: cyy <cyyever@outlook.com>

cmake: require c99 and c++14; prohibit decay to older standards and c…

675053b

…ompiler extensions (gnu, etc.)

fortran: use c_sizeof from f2008. Fixes icl-utk-edu#55.

c26c19b

add support to Blackwell GPUs (might want to add more sm_xx)

6914cea

add blackwell to make.inc files

004a8b2

remove very old CUDA archs

b9f0a0f

also add sm 12.0 when Blackwell is selected

6ae9acd

update release notes

ac4262a

cmake: add Blackwell

0955da3

archive files that are not in Makefile.src. Note magma_zmlumerge.cpp …

21d102f

…is still in sparse/control (for now); rm copy in sparse/src.

make: remove out-dated hg commands

b89e846

fix compilation issue with cuda 13

0176416

remove gbtf2 kernels that use cooperative groups

d42d1b5

remove unwanted fortran wrappers

d29cb95

asenzz force-pushed the gesv_rbt_async branch from 6d41c06 to d29cb95 Compare September 2, 2025 19:29

asenzz closed this Oct 26, 2025

asenzz force-pushed the gesv_rbt_async branch from d29cb95 to c0792ae Compare October 26, 2025 07:07

asenzz added 2 commits October 26, 2025 08:26

Merge remote-tracking branch 'refs/remotes/origin/gesv_rbt_async' int…

d53ec34

…o gesv_rbt_async

Merge branch 'icl-utk-edu:master' into gesv_rbt_async

8b2d25e

asenzz reopened this Jan 2, 2026

asenzz requested a review from mgates3 January 2, 2026 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A leaner gesv_rbt_async with improved execution speed and accuracy#62

A leaner gesv_rbt_async with improved execution speed and accuracy#62
asenzz wants to merge 29 commits intoicl-utk-edu:masterfrom
asenzz:gesv_rbt_async

asenzz commented Aug 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asenzz commented Jan 2, 2026 •

edited

Loading

Uh oh!

nbeams commented Jan 26, 2026

Uh oh!

asenzz commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

asenzz commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asenzz commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbeams commented Jan 26, 2026

Uh oh!

asenzz commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

asenzz commented Aug 31, 2025 •

edited

Loading

asenzz commented Jan 2, 2026 •

edited

Loading