You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add two LCAO base group GPU version compilation options in toolchain (#6014)
* Add optional LCAO base GPU versions supported by cusolvermp
* Add optional LCAO base GPU versions supported by elpa
* Add optional LCAO base GPU versions supported by elpa
* Add L40S as GPUVER value for sm_89 architecture
* Delete a few lines of content to enable Nvidia to compile
* Add a specified Fortran mpi compiler for elpa to use
* Add CUDA path for use by ELPA-GPU
* Add optional LCAO base GPU versions supported by elpa
* Modify a small issue
* Change to manually specifying the link libraries for CAL and cusolverMp
* Add the use of 'cusolvermp' or 'elpa' methods to compile ABACUS GPU-LCAO
* Add the use of 'cusolvermp' or 'elpa' methods to compile ABACUS GPU-LCAO
* Add the use of 'cusolvermp' or 'elpa' methods to compile ABACUS GPU-LCAO
* Add modification
- ELPA compiler flags modification
- GPU_VER setting modification: user should specify the GPU compability number, but not the GPU name
- Modify toolchain_[gnu,intel].sh and build_abacus_[gnu,intel].sh to use the above modification
* minor adjustment
* update README
* give back cmake default option
* update README and cusolvermp
* Update README.md
---------
Co-authored-by: JamesMisaka <[email protected]>
Copy file name to clipboardExpand all lines: toolchain/README.md
+76-22Lines changed: 76 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Version 2025.1
4
4
5
-
## Author
5
+
## Main Developer
6
6
7
7
[QuantumMisaka](https://github.com/QuantumMisaka)
8
8
(Zhaoqing Liu) @PKU@AISI
@@ -26,8 +26,9 @@ and give setup files that you can use to compile ABACUS.
26
26
-[x] Automatic installation of [CEREAL](https://github.com/USCiLab/cereal) and [LIBNPY](https://github.com/llohse/libnpy) (by github.com)
27
27
-[x] Support for [LibRI](https://github.com/abacusmodeling/LibRI) by submodule or automatic installation from github.com (but installed LibRI via `wget` seems to have some problem, please be cautious)
28
28
-[x] A mirror station by Bohrium database, which can download CEREAL, LibNPY, LibRI and LibComm by `wget` in China Internet.
29
-
-[x] Support for GPUcompilation, users can add `-DUSE_CUDA=1`in builder scripts.
29
+
-[x] Support for GPU-PW and GPU-LCAO compilation (elpa, cusolvermp is developing), and `-DUSE_CUDA=1`is needed builder scripts.
30
30
-[x] Support for AMD compiler and math lib `AOCL` and `AOCC` (not fully complete due to flang and AOCC-ABACUS compliation error)
31
+
-[ ] Support for more GPU device out of Nvidia.
31
32
-[ ] Change the downloading url from cp2k mirror to other mirror or directly downloading from official website. (doing)
32
33
-[ ] Support a JSON or YAML configuration file for toolchain, which can be easily modified by users.
33
34
-[ ] A better README and Detail markdown file.
@@ -138,7 +139,9 @@ Dependencies below are optional, which is NOT installed by default:
138
139
-`LibComm` 0.1.1
139
140
140
141
Users can install them by using `--with-*=install` in toolchain*.sh, which is `no` in default. Also, user can specify the absolute path of the package by `--with-*=path/to/package` in toolchain*.sh to allow toolchain to use the package.
141
-
> Notice: LibRI, LibComm and Libnpy is on actively development, you should check-out the package version when using this toolchain. Also, LibRI and LibComm can be installed by github submodule, that is also work for libnpy, which is more recommended.
142
+
> Notice: LibTorch always suffer from GLIBC_VERSION problem, if you encounter this, please downgrade LibTorch version to 1.12.1 in scripts/stage4/install_torch.sh
143
+
>
144
+
> Notice: LibRI, LibComm, Rapidjson and Libnpy is on actively development, you should check-out the package version when using this toolchain.
142
145
143
146
Users can easily compile and install dependencies of ABACUS
144
147
by running these scripts after loading `gcc` or `intel-mkl-mpi`
@@ -187,6 +190,74 @@ or you can also do it in a more completely way:
# -DCMAKE_CUDA_COMPILER=${path to cuda toolkit}/bin/nvcc \ # add if needed
213
+
......
214
+
```
215
+
which will enable GPU version of ABACUS, and the `ks_solver cusolver` method can be directly used for PW and LCAO calculation.
216
+
217
+
Notice: You CANNOT use `icpx` compiler for GPU version of ABACUS for now, see discussion here [#2906](https://github.com/deepmodeling/abacus-develop/issues/2906) and [#4976](https://github.com/deepmodeling/abacus-develop/issues/4976)
218
+
219
+
If you wants to use ABACUS GPU-LCAO by `cusolvermp` or `elpa` for multiple-GPU calculation, please compile according to the following usage:
220
+
221
+
1. For the elpa method, add
222
+
```shell
223
+
export CUDA_PATH=/path/to/CUDA
224
+
# install_abacus_toolchain.sh part options
225
+
--enable-cuda \
226
+
--gpu-ver=(GPU-compatibility-number) \
227
+
```
228
+
to the `toolchain_*.sh`, and then follow the normal step to install the dependencies using `./toolchain_*.sh`. For checking the GPU compatibility number, you can refer to the [CUDA compatibility](https://developer.nvidia.com/cuda-gpus).
229
+
230
+
Afterwards, make sure these option are enable in your `build_abacus_*.sh` script
231
+
```shell
232
+
-DUSE_ELPA=ON \
233
+
-DUSE_CUDA=ON \
234
+
```
235
+
then just build the abacus executable program by compiling it with `./build_abacus_*.sh`.
236
+
237
+
The ELPA method need more parameter setting, but it doesn't seem to be affected by the CUDA toolkits version, and it is no need to manually install and package.
238
+
239
+
2. For the cusolvermp method, toolchain_*.sh does not need to be changed, just follow it directly install dependencies using `./toolchain_*.sh`, and then add
Just enough to build the abacus executable program by compiling it with `./build_abacus_*.sh`.
252
+
253
+
You can refer to the linking video for auxiliary compilation and installation. [Bilibili](https://www.bilibili.com/video/BV1eqr5YuETN/).
254
+
255
+
The cusolverMP requires installation from sources such as apt or yum, which is suitable for containers or local computers.
256
+
The second choice is using [NVIDIA HPC_SDK](https://developer.nvidia.com/hpc-sdk-downloads) for installation, which is relatively simple, but the package from NVIDIA HPC_SDK may not be suitable, especially for muitiple-GPU parallel running. To better use cusolvermp and its dependency (libcal, ucx, ucc) in multi-GPU running, please contact your server manager.
257
+
258
+
After compiling, you can specify `device GPU` in INPUT file to use GPU version of ABACUS.
OpenMPI in version 5 has huge update, lead to compatibility problem. If one wants to use the OpenMPI in version 4 (4.1.6), one can specify `--with-openmpi-4th=yes` in *toolchain_gnu.sh*
255
326
256
-
### GPU version of ABACUS
257
-
258
-
For GPU version of ABACUS (do not GPU version installer of ELPA, which is still doing work), add following options in build*.sh:
-DCMAKE_CUDA_COMPILER=${path to cuda toolkit}/bin/nvcc \
267
-
......
268
-
```
269
-
270
-
Notice: You CANNOT use `icpx` compiler for GPU version of ABACUS for now, see discussion here [#2906](https://github.com/deepmodeling/abacus-develop/issues/2906) and [#4976](https://github.com/deepmodeling/abacus-develop/issues/4976)
271
-
272
-
If you wants to use ABACUS GPU-LCAO by `cusolvermp` or `elpa`, please contact the coresponding developer, toolchain do not fully support them now.
273
327
274
328
### Shell problem
275
329
@@ -325,4 +379,4 @@ of each packages, which may let the installation more fiexible.
"--gpu-ver currently only supports K20X, K40, K80, P100, V100, A100, Mi50, Mi100, Mi250, and no as options"
458
-
exit 1
459
-
;;
460
-
esac
451
+
export GPUVER="${user_input}"
461
452
;;
462
453
--target-cpu=*)
463
454
user_input="${1#*=}"
@@ -684,7 +675,7 @@ else
684
675
esac
685
676
fi
686
677
# If MATH_MODE is mkl ,then openblas, scalapack and fftw is not needed
687
-
#zhaoqing in 2023-09-17
678
+
#QuantumMisaka in 2023-09-17
688
679
if [ "${MATH_MODE}"="mkl" ];then
689
680
if [ "${with_openblas}"!="__DONTUSE__" ];then
690
681
echo"Using MKL, so openblas is disabled."
@@ -700,6 +691,17 @@ if [ "${MATH_MODE}" = "mkl" ]; then
700
691
fi
701
692
fi
702
693
694
+
# Select the correct compute number based on the GPU architecture
695
+
# QuantumMisaka in 2025-03-19
696
+
export ARCH_NUM="${GPUVER//.}"
697
+
if [[ "$ARCH_NUM"=~ ^[1-9][0-9]*$ ]] || [ $ARCH_NUM="no" ];then
698
+
echo"Notice: GPU compilation is enabled, and GPU compatibility is set via --gpu-ver to sm_${ARCH_NUM}."
699
+
else
700
+
report_error ${LINENO} \
701
+
"When GPU compilation is enabled, the --gpu-ver variable should be properly set regarding to GPU compatibility. For check your GPU compatibility, visit https://developer.nvidia.com/cuda-gpus. For example: A100 -> 8.0 (or 80), V100 -> 7.0 (or 70), 4090 -> 8.9 (or 89)"
702
+
exit 1
703
+
fi
704
+
703
705
# If CUDA or HIP are enabled, make sure the GPU version has been defined.
704
706
if [ "${ENABLE_CUDA}"="__TRUE__" ] || [ "${ENABLE_HIP}"="__TRUE__" ];then
705
707
if [ "${GPUVER}"="no" ];then
@@ -708,9 +710,10 @@ if [ "${ENABLE_CUDA}" = "__TRUE__" ] || [ "${ENABLE_HIP}" = "__TRUE__" ]; then
0 commit comments