Skip to content

Conversation

@QuantumMisaka
Copy link
Collaborator

@QuantumMisaka QuantumMisaka commented Mar 8, 2025

Reminder

  • Have you linked an issue with this pull request?
  • Have you added adequate unit tests and/or case tests for your pull request?
  • Have you noticed possible changes of behavior below or in the linked issue?
  • Have you explained the changes of codes in core modules of ESolver, HSolver, ElecState, Hamilt, Operator or Psi? (ignore if not applicable)

Linked Issue

  • Software version update : ELPA, OpenMPI, MPICH, CMake, LibXC
  • Add Compiler: AOCC (option: --with_amd=system)
  • Add Mathlib: AOCL (option: --math_mode=aocl)
  • Adjust minor description based on CP2K toolchain

Unit Tests and/or Case Tests for my changes

  • A unit test is added for each new feature or bug fix.

What's changed?

  • Example: My changes might affect the performance of the application under certain conditions, and I have tested the impact on various scenarios...

Any changes of core modules? (ignore if not applicable)

  • Example: I have added a new virtual function in the esolver base class in order to ...

@QuantumMisaka QuantumMisaka marked this pull request as draft March 8, 2025 05:47
@QuantumMisaka
Copy link
Collaborator Author

@goodchong @dzzz2001 Community and me need a detailed tutorial for GPU-ELPA and ABACUS-GPU-LCAO installation, also for solve #5872

@QuantumMisaka
Copy link
Collaborator Author

QuantumMisaka commented Mar 8, 2025

Cannot Solve problem while using AOCC in ELPA compling

flang: error: unsupported option '--whole-archive'
flang: error: unsupported option '--no-whole-archive'
flang: error: unknown argument: '-soname'
flang: error: no such file or directory: 'libelpa_openmp.so.19'

I''ve no knowledge if ELPA support to be compiled by AOCC. Temporally give up

@QuantumMisaka
Copy link
Collaborator Author

#5982 Marks the error in AOCC/AOCL compiling while avoiding the problem from flang

Also, I need some help (or effort) to incorporate LAPACK in toolchain for Fully utilize the AOCL package

I have no idea to fix it. All these is this version of toolchain update

@QuantumMisaka QuantumMisaka marked this pull request as ready for review March 8, 2025 16:09
@mohanchen mohanchen added the Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS label Mar 9, 2025
@mohanchen
Copy link
Collaborator

@QuantumMisaka
Copy link
Collaborator Author

QuantumMisaka commented Mar 9, 2025

How about the following tutorials, maybe we should update them?

https://mcresearch.github.io/abacus-user-guide/abacus-gpu-lcao.htmlc https://mcresearch.github.io/abacus-user-guide/abacus-hpc.html https://mcresearch.github.io/abacus-user-guide/abacus-gpu.html

I'll try it. This PR can be merged later, but we may need guide for installing ELPA-GPU, as the guide in ELPA official website is not very clear

@QuantumMisaka
Copy link
Collaborator Author

How about the following tutorials, maybe we should update them?

https://mcresearch.github.io/abacus-user-guide/abacus-gpu-lcao.htmlc https://mcresearch.github.io/abacus-user-guide/abacus-hpc.html https://mcresearch.github.io/abacus-user-guide/abacus-gpu.html

The reference page https://github.com/marekandreas/elpa/blob/master/documentation/INSTALL is 404, and the compliation tutorial (especially a robust tuturial suitable for all nvidia-GPU platform) of GPU-LCAO-CUSOLVERMP & GPU-LCAO-ELPA is also insufficient.

@QuantumMisaka
Copy link
Collaborator Author

LibXC 7.0.0 compliation problem is fixed by #5905. So I'll add 7.0.0 update

@QuantumMisaka
Copy link
Collaborator Author

QuantumMisaka commented Mar 10, 2025

With the help of @yizeyi18, the AOCL environment management is fully completed and will be commited in this PR, But the clang++ in AOCC still cannot be the compiler of ABACUS due to the same error in #5982

@QuantumMisaka
Copy link
Collaborator Author

Based on testing results, the AOCC-AOCL toolchain with ABACUS installation by GNU-AOCL can be done by toolchain while avoiding flang in AOCC. So this is the default toolchain plan. User can turn-on flang by --with-flang=yes if like to.

Reference Task: MPI16-OMP1 LCAO-genelpa for Fe5C2(510) [Fe80C36], ABACUS commit 1fa5e3a , Hardware: AMD-EPYC-7b12

  • gcc-aocl (aocc-toolchain avoid flang): 38s/scf-step
  • gcc-openblas 40s/scf-step
  • gcc-aocl (gcc-toolchain) 41s/scf-step
  • aocc-openblas-aocl (38-42)s/scf-step (unstable)
  • intel-oneapi 46s/scf-step
  • gcc-aocl (with flang): 50s/scf-step

@QuantumMisaka
Copy link
Collaborator Author

This version of toolchain will not update in this PR until severe bug is found

@QuantumMisaka
Copy link
Collaborator Author

@mohanchen I consider that this PR can be merged, and GPU-LCAO installation update will be the next target in the future PR

@mohanchen
Copy link
Collaborator

Excellent, I will merge it.

@mohanchen mohanchen merged commit a295d38 into deepmodeling:develop Mar 14, 2025
14 checks passed
@QuantumMisaka QuantumMisaka deleted the toolchain-202501 branch March 28, 2025 11:14
Fisherd99 pushed a commit to Fisherd99/abacus-BSE that referenced this pull request Mar 31, 2025
* update cmake

* add aocc support

* update mpich

* update VERSION

* update openmpi, allow user to switch version easily

* update elpa

* create aocl script

* aocc install setup

* bug fix and update readme

* fix openmpi switch

* modification

* add openmpi configure option

* update elpa setting (gpu setting for 2070s)

* update libxc version and download

* minor update

* update README

* minor update

* minor checkout

* deepmd-v3 add-in test note

* AMD-AOCC-AOCL update and minor fixed

* fix bug in aocl.sh
dyzheng pushed a commit that referenced this pull request Apr 1, 2025
* update cmake

* add aocc support

* update mpich

* update VERSION

* update openmpi, allow user to switch version easily

* update elpa

* create aocl script

* aocc install setup

* bug fix and update readme

* fix openmpi switch

* modification

* add openmpi configure option

* update elpa setting (gpu setting for 2070s)

* update libxc version and download

* minor update

* update README

* minor update

* minor checkout

* deepmd-v3 add-in test note

* AMD-AOCC-AOCL update and minor fixed

* fix bug in aocl.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants