forked from abacusmodeling/abacus-develop
-
Notifications
You must be signed in to change notification settings - Fork 145
Closed
Labels
EXX and lr-TDDFTRelated to EXX or lr-TDDFTRelated to EXX or lr-TDDFT
Description
Describe the bug
when running ABACUS with OMP_NUM_THREADS=12 nohup mpirun -n 2 --map-by socket --bind-to none abacus | tee output.log & , the program crashed at first step of SCF iterration using HSE functional. I use the -DDEBUG_INFO=ON to provide more details for debug
Expected behavior
No response
To Reproduce
before using toolchain, i have modified the script install_openmpi.sh and install_elpa.sh to enable the support of cuda awared mpi and cusolvermp and disabled compilation of gpu version of elpa.
configure of openmpi
./configure CFLAGS="${CFLAGS}" \
--prefix=${pkg_install_dir} \
--libdir="${pkg_install_dir}/lib" \
--with-zlib=${ZLIB} \
--with-libevent=internal \
--with-cuda=${CUDA_PATH} \
--with-ucx=${UCX} \
--with-ucc=${UCC} \
${EXTRA_CONFIGURE_FLAGS} \
> configure.log 2>&1 || tail -n ${LOG_LINES} configure.log
configure of elpa
for TARGET in "cpu" ; do
[ "$TARGET" = "nvidia" ] && [ "$ENABLE_CUDA" != "__TRUE__" ] && continue
# disable cpu if cuda is enabled
# [ "$TARGET" != "nvidia" ] && [ "$ENABLE_CUDA" = "__TRUE__" ] && continue
echo "Installing from scratch into ${pkg_install_dir}/${TARGET}"
mkdir -p "build_${TARGET}"
cd "build_${TARGET}"
if [ "${with_amd}" != "__DONTUSE__" ] && [ "${WITH_FLANG}" = "yes" ] ; then
echo "AMD fortran compiler detected, enable special option operation"
the toolchain_gnu.sh
./install_abacus_toolchain.sh \
--with-gcc=install \
--with-intel=no \
--with-openblas=install \
--with-openmpi=install \
--with-cmake=install \
--with-scalapack=install \
--with-libxc=install \
--with-fftw=install \
--with-elpa=install \
--with-cereal=install \
--with-rapidjson=install \
--with-libtorch=install \
--with-libnpy=install \
--with-libri=install \
--with-libcomm=install \
--with-4th-openmpi=no \
--enable-cuda \
--gpu-ver=86 \
| tee compile.log
the build_abacus_gnu.sh
cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DCMAKE_CXX_COMPILER=g++ \
-DMPI_CXX_COMPILER=mpicxx \
-DLAPACK_DIR=$LAPACK \
-DSCALAPACK_DIR=$SCALAPACK \
-DUSE_ELPA=ON \
-DELPA_DIR=$ELPA \
-DCEREAL_INCLUDE_DIR=$CEREAL \
-DFFTW3_DIR=$FFTW3 \
-DLibxc_DIR=$LIBXC \
-DENABLE_LCAO=ON \
-DENABLE_LIBXC=ON \
-DUSE_OPENMP=ON \
-DENABLE_RAPIDJSON=ON \
-DRapidJSON_DIR=$RAPIDJSON \
-DUSE_CUDA=ON \
-DUSE_CUDA_MPI=ON \
-DENABLE_DEEPKS=ON \
-DTorch_DIR=$LIBTORCH \
-Dlibnpy_INCLUDE_DIR=$LIBNPY \
-DENABLE_LIBRI=ON \
-DLIBRI_DIR=$LIBRI \
-DLIBCOMM_DIR=$LIBCOMM \
-DENABLE_CUSOLVERMP=ON \
-DCAL_CUSOLVERMP_PATH=$CUDA_PATH/lib64 \
-DDEBUG_INFO=ON
Environment
No response
Additional Context
Task list for Issue attackers (only for developers)
- Verify the issue is not a duplicate.
- Describe the bug.
- Steps to reproduce.
- Expected behavior.
- Error message.
- Environment details.
- Additional context.
- Assign a priority level (low, medium, high, urgent).
- Assign the issue to a team member.
- Label the issue with relevant tags.
- Identify possible related issues.
- Create a unit test or automated test to reproduce the bug (if applicable).
- Fix the bug.
- Test the fix.
- Update documentation (if necessary).
- Close the issue and inform the reporter (if applicable).
Metadata
Metadata
Assignees
Labels
EXX and lr-TDDFTRelated to EXX or lr-TDDFTRelated to EXX or lr-TDDFT