Skip to content

A leaner gesv_rbt_async with improved execution speed and accuracy#62

Open
asenzz wants to merge 29 commits intoicl-utk-edu:masterfrom
asenzz:gesv_rbt_async
Open

A leaner gesv_rbt_async with improved execution speed and accuracy#62
asenzz wants to merge 29 commits intoicl-utk-edu:masterfrom
asenzz:gesv_rbt_async

Conversation

@asenzz
Copy link

@asenzz asenzz commented Aug 31, 2025

This pull request introduces two solver functions using GESV RBT:

magma_int_t
magma_zgesv_rbt_async(
        const magma_bool_t refine, const magma_int_t n, const magma_int_t nrhs,
        const magmaDoubleComplex *const dA_, const magma_int_t lda,
        magmaDoubleComplex *const dB_, const magma_int_t ldb,
        magma_int_t *const info,
        const magma_int_t iter_max, const double bwdmax,
        magma_queue_t queue)

magma_int_t
magma_zgesv_rbt_refine_async(
        const magma_int_t n, const magma_int_t nrhs,
        const magmaDoubleComplex *const dA_, const magma_int_t lda,
        magmaDoubleComplex *const dB_, const magma_int_t ldb,
        magma_int_t *info,
        const magma_int_t iter_max, const double bwdmax,
        magma_queue_t queue)

As you can see they introduce 3 new parameters

  • iter_max - maximum number of refinement iterations, old behaviour is iter_max set to 30.
  • bwdmax - refinement threshold, error above this threshold will cause more refinement iterations up until iter_max is reached, set to zero to make sure all iterations specified in iter_max are executed.
  • queue - an already constructed MAGMA queue, that can be reused further in order to avoid spending cycles on initializing and uninitializing a new queue every time the function is called.

I wrote this function in order to speed up the tuning of support vector regression parameters where I find it perform better than the rest of solvers present in MAGMA. This version has a slightly better accuracy since it saves the best solution from all iterations, instead of using the last iteration solution. The fact that it uses async functions internally and an already initialized queue saved me about 30% of measure time when called over the process of tuning parameters, in the magnitude of 3000 to 10000 calls on a Dell R730 with 4xV100 Nvidia cards. It can use a specific CUDA stream created by the user since it take a magma_queue_t as an argument.
I haven't written unit tests for these methods but they should be similar to the old magma_zgesv_rbt function. I tested the accuracy of this implementation in my project, Tempus.

Use magma_zgesv_rbt_refine_async to refine an already present solution in dB_. I haven't tested this function.

nbeams and others added 26 commits September 2, 2025 19:22
There is no guarantee that the version number of the rocm-core library
will match that of other ROCm libraries. It is both simpler and more
robust to directly check the version number of the relevent library.
For this reason, Debian has declined to package rocm-core. All
other AMD GPU dependencies of magma are available in Debian Trixie.

The Debian builds of magama-rocm currently use libamdhip64-dev 5.7.1,
libhipblas-dev 5.5.1, and libhipsparse-dev 5.7.1. The Ubuntu 24.04 LTS
was frozen with those package versions in its universe repositories.
There is no single value that could accurately describe the ROCM_VERSION
on Debian or Ubuntu.

At the moment, a failure to find rocm-core is treated by magma as a
version less than 6.0.0. Without this change, the Debian build of
magma will begin to fail when either of the libamdhip64-dev or
libhipblas-dev libraries are updated to a newer version.

Formatting and documentation fixes to "A leaner gesv_rbt_async with improved execution speed and accuracy (PR icl-utk-edu#62)"
Don't know why it is here. Removing it builds fine on my end.
bind C name should be magma_sync_wtime
Signed-off-by: cyy <cyyever@outlook.com>
…is still in sparse/control (for now); rm copy in sparse/src.
@asenzz
Copy link
Author

asenzz commented Jan 2, 2026

I must of closed this by mistake, apologies. The changes are in the gesv_rbt_async branch.

@asenzz asenzz reopened this Jan 2, 2026
@asenzz asenzz requested a review from mgates3 January 2, 2026 09:40
@nbeams
Copy link
Contributor

nbeams commented Jan 26, 2026

Hi @asenzz, thanks for your interest in contributing to MAGMA! After some internal discussion, we feel adding an interface to the asynchronous memory management would be too big of a departure from MAGMA's usual interface at the moment. For routines where users have requested versions without memory allocation costs, we have the "expert" interfaces (see, e.g., magma_zgetrf_expert_gpu_work, magma_zgetri_expert_gpu_work, etc.). These routines can then be used by the standard interface routines, or used directly by expert users who wish to manage their own workspace memory.

If you want to try to change this PR to use that kind of interface, you would be welcome to do so, but of course we understand if you don't want to take the time to do this. Thanks again for your interest in MAGMA, and we hope it continues to be useful to you!

@asenzz
Copy link
Author

asenzz commented Jan 26, 2026

Alright Natalie, I will try to update the _async methods to the _expert_gpu_work kind of format when time permits. Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants