eckit::linalg::sparse::LinearAlgebraTorch backend#165
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #165 +/- ##
===========================================
- Coverage 66.35% 66.33% -0.02%
===========================================
Files 1126 1126
Lines 57668 57668
Branches 4403 4403
===========================================
- Hits 38264 38253 -11
- Misses 19404 19415 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This looks neat! As a side, it would be nice to clean up and remove unused backends (armadillo, viennacl, ... ) |
06d774e to
b548c96
Compare
b548c96 to
8eb5368
Compare
|
This PR brings GPU to production environments (possibly). At the time of adding GPU-powered mir matrix multiplication to AIFS (late 2024) I didn't want the typicl produciton envionrment without this advantage, hence the PR. Now I've updated this PR which comes at a good time for the reviewing of the linear algebra backends. I've just done some light testing on ag (which is modern), and the test build (for me) was:
prgenv/nvidia seems incompatible with the python3/new deployment. You might find other options that work. There are a number of devices supported mapped from this LinearAlgebraSparse backend, but of note:
This PR only tests the default (cpu) device, because that's not in scope. |
|
@Ozaq maybe as part of the streamiling of the LinearAlgebra parts for release/2.0.0 (pending the review, of course) |
|
I've added now the dense backend version (for eg. spherical harmonics). This was just for completeness -- please review as necessary |
There was a problem hiding this comment.
Pull request overview
This PR adds PyTorch-based linear algebra backends (both dense and sparse) to enable GPU-accelerated matrix operations in eckit. The implementation leverages PyTorch's lower-level Torch library to support various hardware accelerators including CUDA, HIP, MPS, XPU, XLA, and Meta devices, with CPU as the default fallback.
Changes:
- Added Torch backend support for both dense and sparse linear algebra operations with multiple device type options
- Created helper functions in detail/Torch.h/cc for tensor creation and device management
- Added CMake configuration and tests for the Torch backend
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Adds TORCH feature option with Torch package dependency |
| src/eckit/linalg/CMakeLists.txt | Adds Torch source files to build and links torch library |
| src/eckit/linalg/detail/Torch.h | Declares helper functions for Torch tensor operations and device management |
| src/eckit/linalg/detail/Torch.cc | Implements tensor conversion functions and device selection logic |
| src/eckit/linalg/dense/LinearAlgebraTorch.h | Declares dense linear algebra backend using Torch |
| src/eckit/linalg/dense/LinearAlgebraTorch.cc | Implements dense operations (dot, gemv, gemm) with device support |
| src/eckit/linalg/sparse/LinearAlgebraTorch.h | Declares sparse linear algebra backend using Torch |
| src/eckit/linalg/sparse/LinearAlgebraTorch.cc | Implements sparse operations (spmv, spmm) with device support |
| tests/linalg/CMakeLists.txt | Adds test configurations for Torch dense and sparse backends |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
add include guard Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…arAlgebraTorch (1) single place for backend device name logic (detail::Torch), (2) const device/scalar type at construction, (3) limit MPS device (Apple) to dense functionality and single precision (current status)
This PR adds a sparse lienar algebra backend to allow for GPU-based matrix multiplications (and other operations), which translates into a significant performance increase for interpolations, in the right conditions (right environment, advanced use of mir.)
It makes use of a deployed version of PyTorch (findable by CMake), specifically its lower level component "Torch", which is part of the same package (this is how it is released to the public.) I've exposed all possible hardware configuration options, contemporary. But obviously, the better development is to improve the whole workflow to avoid copies to/from the CPU/GPU, so this develolpment is purelly a stepping stone -- it has already allowed me to run both mars-client (C) and pgen on GPUs, and of course mir. It would be great to follow this up with a publication. Possibly, this could be configurable in earthkit-regrid for maximum marketing :-)
I've held back this development for several months, but I couldn't find a definite response on when to post it -- here it is.
🌦️ >> Documentation << 🌦️
https://sites.ecmwf.int/docs/dev-section/eckit/pull-requests/PR-165