Skip to content

Commit a295d38

Browse files
Toolchain 202501 (#5980)
* update cmake * add aocc support * update mpich * update VERSION * update openmpi, allow user to switch version easily * update elpa * create aocl script * aocc install setup * bug fix and update readme * fix openmpi switch * modification * add openmpi configure option * update elpa setting (gpu setting for 2070s) * update libxc version and download * minor update * update README * minor update * minor checkout * deepmd-v3 add-in test note * AMD-AOCC-AOCL update and minor fixed * fix bug in aocl.sh
1 parent c93cd1a commit a295d38

23 files changed

+585
-118
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ STRU_READIN_ADJUST.cif
1717
build
1818
dist
1919
.idea
20-
toolchain.tar.gz
2120
time.json
2221
*.pyc
2322
__pycache__

toolchain/README.md

Lines changed: 81 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# The ABACUS Toolchain
22

3-
Version 2024.3
3+
Version 2025.1
44

55
## Author
66

@@ -27,12 +27,13 @@ and give setup files that you can use to compile ABACUS.
2727
- [x] Support for [LibRI](https://github.com/abacusmodeling/LibRI) by submodule or automatic installation from github.com (but installed LibRI via `wget` seems to have some problem, please be cautious)
2828
- [x] A mirror station by Bohrium database, which can download CEREAL, LibNPY, LibRI and LibComm by `wget` in China Internet.
2929
- [x] Support for GPU compilation, users can add `-DUSE_CUDA=1` in builder scripts.
30+
- [x] Support for AMD compiler and math lib `AOCL` and `AOCC` (not fully complete due to flang and AOCC-ABACUS compliation error)
3031
- [ ] Change the downloading url from cp2k mirror to other mirror or directly downloading from official website. (doing)
32+
- [ ] Support a JSON or YAML configuration file for toolchain, which can be easily modified by users.
3133
- [ ] A better README and Detail markdown file.
3234
- [ ] Automatic installation of [DEEPMD](https://github.com/deepmodeling/deepmd-kit).
3335
- [ ] Better compliation method for ABACUS-DEEPMD and ABACUS-DEEPKS.
3436
- [ ] Modulefile generation scripts.
35-
- [ ] Support for AMD compiler and math lib like `AOCL` and `AOCC`
3637

3738

3839
## Usage Online & Offline
@@ -49,6 +50,8 @@ There are also well-modified script to run *install_abacus_toolchain.sh* for `gn
4950
> ./toolchain_gnu.sh
5051
# for intel-mkl
5152
> ./toolchain_intel.sh
53+
# for amd aocc-aocl
54+
> ./toolchain_amd.sh
5255
# for intel-mkl-mpich
5356
> ./toolchain_intel-mpich.sh
5457
```
@@ -94,7 +97,7 @@ The above station will be updated handly but one should notice that the version
9497
If one want to install ABACUS by toolchain OFFLINE,
9598
one can manually download all the packages from [cp2k-static/download](https://www.cp2k.org/static/downloads) or official website
9699
and put them in *build* directory by formatted name
97-
like *fftw-3.3.10.tar.gz*, or *openmpi-5.0.5.tar.bz2*,
100+
like *fftw-3.3.10.tar.gz*, or *openmpi-5.0.6.tar.bz2*,
98101
then run this toolchain.
99102
All package will be detected and installed automatically.
100103
Also, one can install parts of packages OFFLINE and parts of packages ONLINE
@@ -109,19 +112,23 @@ just by using this toolchain
109112

110113
The needed dependencies version default:
111114

112-
- `cmake` 3.30.0
115+
- `cmake` 3.31.2
113116
- `gcc` 13.2.0 (which will always NOT be installed, But use system)
114-
- `OpenMPI` 4.1.6 (5.0.5 can be used but have some problem in OpenMP parallel computation in ELPA)
115-
- `MPICH` 4.2.2
117+
- `OpenMPI` 5.0.6 (Version 5 OpenMPI is good but will have compability problem, user can manually downarade to Version 4 in toolchain scripts)
118+
- `MPICH` 4.3.0
116119
- `OpenBLAS` 0.3.28 (Intel toolchain need `get_vars.sh` tool from it)
117120
- `ScaLAPACK` 2.2.1 (a developing version)
118121
- `FFTW` 3.3.10
119-
- `LibXC` 6.2.2
120-
- `ELPA` 2024.05.001
122+
- `LibXC` 7.0.0
123+
- `ELPA` 2025.01.001
121124
- `CEREAL` 1.3.2
122125
- `RapidJSON` 1.1.0
123-
And Intel-oneAPI need user or server manager to manually install from Intel.
124-
[Intel-oneAPI](https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/toolkits.html)
126+
And:
127+
- Intel-oneAPI need user or server manager to manually install from Intel.
128+
- - [Intel-oneAPI](https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/toolkits.html)
129+
- AMD AOCC-AOCL need user or server manager to manually install from AMD.
130+
- - [AOCC](https://www.amd.com/zh-cn/developer/aocc.html)
131+
- - [AOCL](https://www.amd.com/zh-cn/developer/aocl.html)
125132

126133
Dependencies below are optional, which is NOT installed by default:
127134

@@ -130,7 +137,7 @@ Dependencies below are optional, which is NOT installed by default:
130137
- `LibRI` 0.2.0
131138
- `LibComm` 0.1.1
132139

133-
Users can install them by using `--with-*=install` in toolchain*.sh, which is `no` in default.
140+
Users can install them by using `--with-*=install` in toolchain*.sh, which is `no` in default. Also, user can specify the absolute path of the package by `--with-*=path/to/package` in toolchain*.sh to allow toolchain to use the package.
134141
> Notice: LibRI, LibComm and Libnpy is on actively development, you should check-out the package version when using this toolchain. Also, LibRI and LibComm can be installed by github submodule, that is also work for libnpy, which is more recommended.
135142
136143
Users can easily compile and install dependencies of ABACUS
@@ -151,6 +158,8 @@ If compliation is successful, a message will be shown like this:
151158
> ./build_abacus_gnu.sh
152159
> To build ABACUS by intel-toolchain, just use:
153160
> ./build_abacus_intel.sh
161+
> To build ABACUS by amd-toolchain in gcc-aocl, just use:
162+
> ./build_abacus_amd.sh
154163
> or you can modify the builder scripts to suit your needs.
155164
```
156165

@@ -180,11 +189,70 @@ or you can also do it in a more completely way:
180189

181190
## Common Problems and Solutions
182191

183-
### LibRI and LibComm for EXX
192+
### Intel-oneAPI problem
193+
194+
#### OneAPI 2025.0 problem
195+
196+
Generally, OneAPI 2025.0 can be useful to compile basic function of ABACUS, but one will encounter compatible problem related to something. Here is the treatment
197+
- related to rapidjson:
198+
- - Not to use rapidjson in your toolchain
199+
- - or use the master branch of [RapidJSON](https://github.com/Tencent/rapidjson)
200+
- related to LibRI: not to use LibRI or downgrade your OneAPI.
201+
202+
#### ELPA problem via Intel-oneAPI toolchain in AMD server
203+
204+
The default compiler for Intel-oneAPI is `icpx` and `icx`, which will cause problem when compling ELPA in AMD server. (Which is a problem and needed to have more check-out)
205+
206+
The best way is to change `icpx` to `icpc`, `icx` to `icc`. user can manually change it in *toolchain_intel.sh* via `--with-intel-classic=yes`
207+
208+
Notice: `icc` and `icpc` from Intel Classic Compiler of Intel-oneAPI is not supported for 2024.0 and newer version. And Intel-OneAPI 2023.2.0 can be found in QE website. You need to download Base-toolkit for MKL and HPC-toolkit for MPi and compiler for Intel-OneAPI 2023.2.0, while in Intel-OneAPI 2024.x, only the HPC-toolkit is needed.
209+
210+
You can get Intel-OneAPI in [QE-managed website](https://pranabdas.github.io/espresso/setup/hpc/#installing-intel-oneapi-libraries), and use this code to get Intel oneAPI Base Toolkit and HPC Toolkit:
211+
```shell
212+
wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/992857b9-624c-45de-9701-f6445d845359/l_BaseKit_p_2023.2.0.49397_offline.sh
213+
wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/0722521a-34b5-4c41-af3f-d5d14e88248d/l_HPCKit_p_2023.2.0.49440_offline.sh
214+
```
215+
216+
Related discussion here [#4976](https://github.com/deepmodeling/abacus-develop/issues/4976)
217+
218+
#### link problem in early 2023 version oneAPI
219+
220+
Sometimes Intel-oneAPI have problem to link `mpirun`,
221+
which will always show in 2023.2.0 version of MPI in Intel-oneAPI.
222+
Try `source /path/to/setvars.sh` or install another version of IntelMPI may help.
223+
224+
which is fixed in 2024.0.0 version of Intel-oneAPI,
225+
And will not occur in Intel-MPI before 2021.10.0 (Intel-oneAPI before 2023.2.0)
226+
227+
More problem and possible solution can be accessed via [#2928](https://github.com/deepmodeling/abacus-develop/issues/2928)
228+
229+
### AMD AOCC-AOCL problem
230+
231+
You cannot use AOCC to complie abacus now, see [#5982](https://github.com/deepmodeling/abacus-develop/issues/5982) .
232+
233+
However, use AOCC-AOCL to compile dependencies is permitted and usually get boosting in ABACUS effciency. But you need to get rid of `flang` while compling ELPA. Toolchain itself help you make this `flang` shade in default, and you can manully use `flang` by setting `--with-flang=yes` in `toolchain_amd.sh` to have a try.
184234

185-
- GCC toolchain with OpenMPI cannot compile LibComm v0.1.1 due to the different MPI variable type from MPICH and IntelMPI, see discussion here [#5033](https://github.com/deepmodeling/abacus-develop/issues/5033), you can switch to GCC-MPICH or Intel toolchain
235+
Notice: ABACUS via GCC-AOCL in AOCC-AOCL toolchain have no application with DeePKS, DeePMD and LibRI.
236+
237+
### OpenMPI problem
238+
239+
#### in EXX and LibRI
240+
241+
- GCC toolchain with OpenMPI cannot compile LibComm v0.1.1 due to the different MPI variable type from MPICH and IntelMPI, see discussion here [#5033](https://github.com/deepmodeling/abacus-develop/issues/5033), you can try use a newest branch of LibComm by
242+
```
243+
git clone https://gitee.com/abacus_dft/LibComm -b MPI_Type_Contiguous_Pool
244+
```
245+
or pull the newest master branch of LibComm
246+
```
247+
git clone https://github.com/abacusmodeling/LibComm
248+
```
249+
. yet another is switching to GCC-MPICH or Intel toolchain
186250
- It is recommended to use Intel toolchain if one wants to include EXX feature in ABACUS, which can have much better performance and can use more than 16 threads in OpenMP parallelization to accelerate the EXX process.
187251

252+
#### OpenMPI-v5
253+
254+
OpenMPI in version 5 has huge update, lead to compatibility problem. If one wants to use the OpenMPI in version 4 (4.1.6), one can specify `--with-openmpi-4th=yes` in *toolchain_gnu.sh*
255+
188256
### GPU version of ABACUS
189257

190258
For GPU version of ABACUS (do not GPU version installer of ELPA, which is still doing work), add following options in build*.sh:
@@ -242,26 +310,6 @@ When you encounter problem like `GLIBCXX_3.4.29 not found`, it is sure that your
242310

243311
After my test, you need `gcc`>11.3.1 to enable deepmd feature in ABACUS.
244312

245-
### Intel-oneAPI problem
246-
247-
#### ELPA problem via Intel-oneAPI toolchain in AMD server
248-
249-
The default compiler for Intel-oneAPI is `icpx` and `icx`, which will cause problem when compling ELPA in AMD server. (Which is a problem and needed to have more check-out)
250-
251-
The best way is to change `icpx` to `icpc`, `icx` to `icc`. user can manually change it in toolchain*.sh via `--with-intel-classic=yes`
252-
253-
Notice: `icc` and `icpc` from Intel Classic Compiler of Intel-oneAPI is not supported for 2024.0 and newer version. And Intel-OneAPI 2023.2.0 can be found in website. See discussion here [#4976](https://github.com/deepmodeling/abacus-develop/issues/4976)
254-
255-
#### link problem in early 2023 version oneAPI
256-
257-
Sometimes Intel-oneAPI have problem to link `mpirun`,
258-
which will always show in 2023.2.0 version of MPI in Intel-oneAPI.
259-
Try `source /path/to/setvars.sh` or install another version of IntelMPI may help.
260-
261-
which is fixed in 2024.0.0 version of Intel-oneAPI,
262-
And will not occur in Intel-MPI before 2021.10.0 (Intel-oneAPI before 2023.2.0)
263-
264-
More problem and possible solution can be accessed via [#2928](https://github.com/deepmodeling/abacus-develop/issues/2928)
265313

266314
## Advanced Installation Usage
267315

toolchain/build_abacus_gnu-aocl.sh

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
#!/bin/bash
2+
#SBATCH -J build
3+
#SBATCH -N 1
4+
#SBATCH -n 16
5+
#SBATCH -o install.log
6+
#SBATCH -e install.err
7+
# JamesMisaka in 2025.03.09
8+
9+
# Build ABACUS by amd-openmpi toolchain
10+
11+
# module load openmpi aocc aocl
12+
13+
ABACUS_DIR=..
14+
TOOL=$(pwd)
15+
INSTALL_DIR=$TOOL/install
16+
source $INSTALL_DIR/setup
17+
cd $ABACUS_DIR
18+
ABACUS_DIR=$(pwd)
19+
#AOCLhome=/opt/aocl # user can specify this parameter
20+
21+
BUILD_DIR=build_abacus_gnu
22+
rm -rf $BUILD_DIR
23+
24+
PREFIX=$ABACUS_DIR
25+
ELPA=$INSTALL_DIR/elpa-2025.01.001/cpu
26+
CEREAL=$INSTALL_DIR/cereal-1.3.2/include/cereal
27+
LIBXC=$INSTALL_DIR/libxc-7.0.0
28+
RAPIDJSON=$INSTALL_DIR/rapidjson-1.1.0/
29+
# LAPACK=$AOCLhome/lib
30+
# SCALAPACK=$AOCLhome/lib
31+
# FFTW3=$AOCLhome
32+
# LIBRI=$INSTALL_DIR/LibRI-0.2.1.0
33+
# LIBCOMM=$INSTALL_DIR/LibComm-0.1.1
34+
# LIBTORCH=$INSTALL_DIR/libtorch-2.1.2/share/cmake/Torch
35+
# LIBNPY=$INSTALL_DIR/libnpy-1.0.1/include
36+
# DEEPMD=$HOME/apps/anaconda3/envs/deepmd # v3.0 might have problem
37+
38+
# if clang++ have problem, switch back to g++
39+
40+
cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
41+
-DCMAKE_CXX_COMPILER=clang++ \
42+
-DMPI_CXX_COMPILER=mpicxx \
43+
-DELPA_DIR=$ELPA \
44+
-DCEREAL_INCLUDE_DIR=$CEREAL \
45+
-DLibxc_DIR=$LIBXC \
46+
-DENABLE_LCAO=ON \
47+
-DENABLE_LIBXC=ON \
48+
-DUSE_OPENMP=ON \
49+
-DUSE_ELPA=ON \
50+
-DENABLE_RAPIDJSON=ON \
51+
-DRapidJSON_DIR=$RAPIDJSON \
52+
# -DLAPACK_DIR=$LAPACK \
53+
# -DSCALAPACK_DIR=$SCALAPACK \
54+
# -DFFTW3_DIR=$FFTW3 \
55+
# -DENABLE_DEEPKS=1 \
56+
# -DTorch_DIR=$LIBTORCH \
57+
# -Dlibnpy_INCLUDE_DIR=$LIBNPY \
58+
# -DENABLE_LIBRI=ON \
59+
# -DLIBRI_DIR=$LIBRI \
60+
# -DLIBCOMM_DIR=$LIBCOMM \
61+
# -DDeePMD_DIR=$DEEPMD \
62+
63+
# if one want's to include deepmd, your system gcc version should be >= 11.3.0 for glibc requirements
64+
65+
cmake --build $BUILD_DIR -j `nproc`
66+
cmake --install $BUILD_DIR 2>/dev/null
67+
68+
# generate abacus_env.sh
69+
cat << EOF > "${TOOL}/abacus_env.sh"
70+
#!/bin/bash
71+
source $INSTALL_DIR/setup
72+
export PATH="${PREFIX}/bin":\${PATH}
73+
EOF
74+
75+
# generate information
76+
cat << EOF
77+
========================== usage =========================
78+
Done!
79+
To use the installed ABACUS version
80+
You need to source ${TOOL}/abacus_env.sh first !
81+
"""
82+
EOF

toolchain/build_abacus_gnu.sh

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,7 @@
44
#SBATCH -n 16
55
#SBATCH -o install.log
66
#SBATCH -e install.err
7-
# install ABACUS with libxc and deepks
8-
# JamesMisaka in 2023.08.31
7+
# JamesMisaka in 2025.03.09
98

109
# Build ABACUS by gnu-toolchain
1110

@@ -24,16 +23,16 @@ rm -rf $BUILD_DIR
2423
PREFIX=$ABACUS_DIR
2524
LAPACK=$INSTALL_DIR/openblas-0.3.28/lib
2625
SCALAPACK=$INSTALL_DIR/scalapack-2.2.1/lib
27-
ELPA=$INSTALL_DIR/elpa-2024.05.001/cpu
26+
ELPA=$INSTALL_DIR/elpa-2025.01.001/cpu
2827
FFTW3=$INSTALL_DIR/fftw-3.3.10
2928
CEREAL=$INSTALL_DIR/cereal-1.3.2/include/cereal
30-
LIBXC=$INSTALL_DIR/libxc-6.2.2
29+
LIBXC=$INSTALL_DIR/libxc-7.0.0
3130
RAPIDJSON=$INSTALL_DIR/rapidjson-1.1.0/
3231
# LIBRI=$INSTALL_DIR/LibRI-0.2.1.0
3332
# LIBCOMM=$INSTALL_DIR/LibComm-0.1.1
3433
# LIBTORCH=$INSTALL_DIR/libtorch-2.1.2/share/cmake/Torch
3534
# LIBNPY=$INSTALL_DIR/libnpy-1.0.1/include
36-
# DEEPMD=$HOME/apps/anaconda3/envs/deepmd
35+
# DEEPMD=$HOME/apps/anaconda3/envs/deepmd # v3.0 might have problem
3736

3837
cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
3938
-DCMAKE_CXX_COMPILER=g++ \
@@ -57,8 +56,6 @@ cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
5756
# -DLIBRI_DIR=$LIBRI \
5857
# -DLIBCOMM_DIR=$LIBCOMM \
5958
# -DDeePMD_DIR=$DEEPMD \
60-
# -DTensorFlow_DIR=$DEEPMD \
61-
6259

6360
# # add mkl env for libtorch to link
6461
# if one want to install libtorch, mkl should be load in build process

toolchain/build_abacus_intel-mpich.sh

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,12 @@
44
#SBATCH -n 16
55
#SBATCH -o install.log
66
#SBATCH -e install.err
7-
# build and install ABACUS with libxc, also can with deepks and deepmd
8-
# JamesMisaka in 2023.08.31
7+
# JamesMisaka in 2025.03.09
98

109
# Build ABACUS by intel-toolchain with mpich
1110

1211
# module load mkl compiler
13-
# source path/to/vars.sh
12+
# source path/to/setvars.sh
1413

1514
ABACUS_DIR=..
1615
TOOL=$(pwd)
@@ -23,15 +22,15 @@ BUILD_DIR=build_abacus_intel-mpich
2322
rm -rf $BUILD_DIR
2423

2524
PREFIX=$ABACUS_DIR
26-
ELPA=$INSTALL_DIR/elpa-2024.05.001/cpu
25+
ELPA=$INSTALL_DIR/elpa-2025.01.001/cpu
2726
CEREAL=$INSTALL_DIR/cereal-1.3.2/include/cereal
28-
LIBXC=$INSTALL_DIR/libxc-6.2.2
27+
LIBXC=$INSTALL_DIR/libx-7.0.0
2928
RAPIDJSON=$INSTALL_DIR/rapidjson-1.1.0/
3029
# LIBTORCH=$INSTALL_DIR/libtorch-2.1.2/share/cmake/Torch
3130
# LIBNPY=$INSTALL_DIR/libnpy-1.0.1/include
3231
# LIBRI=$INSTALL_DIR/LibRI-0.2.1.0
3332
# LIBCOMM=$INSTALL_DIR/LibComm-0.1.1
34-
# DEEPMD=$HOME/apps/anaconda3/envs/deepmd
33+
# DEEPMD=$HOME/apps/anaconda3/envs/deepmd # v3.0 might have problem
3534

3635
cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
3736
-DCMAKE_CXX_COMPILER=icpx \
@@ -53,7 +52,6 @@ cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
5352
# -DLIBRI_DIR=$LIBRI \
5453
# -DLIBCOMM_DIR=$LIBCOMM \
5554
# -DDeePMD_DIR=$DEEPMD \
56-
# -DTensorFlow_DIR=$DEEPMD \
5755

5856

5957
# if one want's to include deepmd, your system gcc version should be >= 11.3.0 for glibc requirements

0 commit comments

Comments
 (0)