Merge pull request #39 from ohearnk/QUICK-25.03-docs

ohearnk · web-flow · commit e7dd9a927642 · 2025-05-30T11:53:57.000-04:00
Documentation updates for QUICK-25.03 release
diff --git a/docs/source/about.rst b/docs/source/about.rst
@@ -1,7 +1,7 @@
 About QUICK QM Package
 ======================
 
-QUICK is a GPU enabled *ab initio* and density functional theory software
+QUICK is a GPU-enabled *ab initio* and density functional theory software
 capable of performing electronic structure calculations on general
 organic/biomolecular systems. It was initially developed by Ed Brothers. His
 work included the development of the Hartree-Fock and density functional theory
@@ -17,8 +17,8 @@ aspects to improve the CUDA versions. Gina Sitaraman, Leopold Grinberg, Mahdieh
 Ghazimirsaeed, and Trinayan Baruah from AMD contributed by providing important
 suggestions to improve the performance of the HIP versions.
 
-Madu Manathunga, Kurt A. O'Hearn, Akhil Shajan, Andy Götz, and Kennie Merz
-currently develop and maintain the code. 
+Kurt A. O'Hearn, Vikrant Tripathy, Akhil Shajan, Madu Manathunga, Andy Götz,
+and Kennie Merz currently develop and maintain the code. 
 
 Contact: `quick.merzlab@gmail.com <quick.merzlab@gmail.com>`_
 
diff --git a/docs/source/all-quick-documentations.rst b/docs/source/all-quick-documentations.rst
@@ -2,6 +2,7 @@ All QUICK documentation versions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 • `QUICK development version (active development, unreleased) <https://quick-docs.readthedocs.io/en/latest/>`_
+• `QUICK-25.03 <https://quick-docs.readthedocs.io/en/25.3.0/>`_ (also released with AmberTools25)
 • `QUICK-24.03 <https://quick-docs.readthedocs.io/en/24.3.0/>`_ (also released with AmberTools24)
 • `QUICK-23.08 <https://quick-docs.readthedocs.io/en/23.8.0/>`_
 • `QUICK-22.03 <https://quick-docs.readthedocs.io/en/22.3.0/>`_ (also released with AmberTools22)
diff --git a/docs/source/basis-sets.rst b/docs/source/basis-sets.rst
@@ -31,9 +31,15 @@
   aug-PC-1
   aug-PC-2
 
-Note 1: We follow the same basis set names reported at the `basis set exchange web page <https://www.basissetexchange.org/>`_. 
+Note 1: We follow the same basis set names reported at the
+`basis set exchange web page <https://www.basissetexchange.org/>`_. 
 
-Note 2: The current version of the QUICK ERI engine only support basis functions up to *f*. Therefore, energy and gradient calculations with functions up to *f* are possible. By default, *f* functions are disabled in the GPU code. Open-shell gradient calculations with *f* functions are not yet available on GPU.
-
-Note 3: ECPs are currently not supported by QUICK. Due to this reason, we have excluded elements that require ECPs from the above basis sets that are included with QUICK.
+Note 2: The current version of the QUICK two elecron repulsion integral (ERI)
+engine only support basis functions up to *f*. Therefore, energy and gradient
+calculations with functions up to *f* are possible. By default, *f* functions
+are disabled in the GPU code.  Open-shell gradient calculations with *f*
+functions are not yet available on GPU.
 
+Note 3: Effective core potentials (ECPs) are currently not supported by QUICK.
+Due to this reason, we have excluded elements that require ECPs from the above
+basis sets that are included with QUICK.
diff --git a/docs/source/cmake-options.rst b/docs/source/cmake-options.rst
@@ -1,12 +1,15 @@
 CMake Build System Options
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This page gives a summary of CMake options that can be used with QUICK. Note that like all CMake options, these options are sticky. Once passed to CMake, they will remain set unless you set them to a different value (with -D), unset them (with -U), or delete the build directory.
+This page gives a summary of CMake options that can be used with QUICK. Note
+that like all CMake options, these options are sticky. Once passed to CMake,
+they will remain set unless you set them to a different value (with -D), unset
+them (with -U), or delete the build directory.
 
 General options
 ***************
 
-• *-DCOMPILER=<GNU|INTEL|AUTO>*: Allows selection of the compiler toolchain to use. *-DCOMPILER=AUTO* enables default CMake behaviour. 
+• *-DCOMPILER=<GNU|CLANG|INTELLLVM|PGI|AUTO>*: Allows selection of the compiler toolchain to use. *-DCOMPILER=AUTO* enables default CMake behaviour. *NOTE:* the INTELLLVM and PGI options should be used for the Intel oneAPI and NVIDIA HPC SDK (NVHPC) compilers, respectively. For Clang, the Fortran compiler (flang) is incompatible with the QUICK code, so a mixed GNU/Clang build is performed (C/C++ compilers for Clang, Fortran for GCC (gfortran)).
 • *-DENABLEF=TRUE*: Enables the compilation of time consuming F functions in the ERI code of the GPU versions. **NOTE**: The current version of the F function code takes very long to compile (hours) and requires a large amount of RAM. Work is planned to optimize this in future releases.
 • *-DCMAKE_BUILD_TYPE=<Debug|Release>*: Controls whether to build debug or release versions.
 • *-DOPTIMIZE=<TRUE|FALSE>*: Controls whether to enable compiler optimizations. On by default.
@@ -20,14 +23,16 @@ External library control
 • *-DFORCE_INTERNAL_LIBS=blas*: Forces use of the internal BLAS library even if a system one is available.
 • *-DFORCE_DISABLE_LIBS=mkl*: Disable use of system MKL to replace BLAS and LAPACK.
 • *-DCMAKE_PREFIX_PATH=<path>*: Use the given path as a prefix where dependencies are installed. Libraries and headers will be searched for in <path>/lib and <path>/include.
-• *-DMKL_HOME=<path>*: Look for Intel MKL in the given directory. The environment variable MKL_HOME is also searched.
+• *-DMKL_HOME=<path>*: Look for Intel MKL in the given directory. The environment variable MKL_HOME is also searched. *NOTE:* When using this flag, the additional flag *-DTRUST_SYSTEM_LIBS=TRUE* must also be appended.
+• *-DMKL_MULTI_THREADED=<TRUE|FALSE>*: Specify whether the Intel MKL library should be used as single or multi-threaded.
 • *-DMAGMA=TRUE*: Enable matrix diagonalization using Magma library in HIP/HIP-MPI version. 
 • *-DMAGMA_PATH=<path>*: Look for Magma library in the given directory. 
 
 Parallel versions
 *****************
 
-By default QUICK will only build the serial version. This can be changed with these options:
+By default QUICK will only build the serial version. This can be changed with
+these options:
 
 • *-DMPI=TRUE*: Also build MPI versions of all programs.
 • *-DCUDA=TRUE*: Also build CUDA versions of all programs. If both MPI and CUDA are active at the same time, a MPI+CUDA version will additionally be built.
diff --git a/docs/source/developer-guide.rst b/docs/source/developer-guide.rst
@@ -486,8 +486,9 @@ Note 1: Current version of QUICK ERI engine only support basis functions up to
 *d* (up to f support for CUDA/MPI+CUDA if enabled). Therefore, do not add high
 angular momentum basis sets and attempt to use f/g functions.
 
-Note 2: ECPs are not supported by |QUICK_VERSION|. Therefore care must be taken
-not to add elements that require ECPs as this would lead to wrong results.
+Note 2: Effective core potentials (ECPs) are not supported by |QUICK_VERSION|.
+Therefore care must be taken not to add elements that require ECPs as this
+would lead to wrong results.
 
 Adding new test cases into test suite
 -------------------------------------
diff --git a/docs/source/features-limitations.rst b/docs/source/features-limitations.rst
@@ -19,8 +19,8 @@ Features
 • Supports QM/MM calculations with Amber22 and later
 • Fortran API to use QUICK as QM energy and force engine
 • Message Passing Interface (MPI) distributed parallelization for CPU platforms
-• Massively parallel, single GPU implementation via CUDA and HIP for Nvidia and AMD GPUs (HIP available in QUICK-23.08, currently disabled)
-• Distributed, multi-GPU support via MPI+CUDA/MPI+HIP, also across multiple compute nodes
+• Massively parallel, single GPU implementation via CUDA and HIP for NVIDIA and AMD GPUs
+• Distributed, multi-node, multi-GPU support via MPI+CUDA/MPI+HIP codes
 
 Limitations
 ***********
@@ -32,6 +32,5 @@ Limitations
 • Effective core potentials (ECPs) are not supported
 • DFT calculations are performed exclusively using the SG1 grid system
 • No meta-GGA nor range-separated hybrid functionals are supported at present
-• HIP/MPI+HIP support disabled for this release due to GPU code rewrites (f basis function support), please use QUICK version 23.08b for HIP support
 
 *Last updated by Andreas Goetz on 04/25/2024.*
diff --git a/docs/source/installation-guide.rst b/docs/source/installation-guide.rst
@@ -4,75 +4,64 @@ Installation Guide
 ==================
 
 QUICK has been compiled and tested on x86 and ARM CPU architectures, and on
-Nvidia and AMD GPU architectures.
+NVIDIA and AMD GPU architectures.
 
-**NOTE:** For GPU builds, the compilation of the GPU enabled ERI code can take
-a significant amount of time (several minutes for default builds and several
-hours for f-function basis set support) - be patient, the compiler is working
-hard to generate lightning fast code for you.
-
-**NOTE:** HIP/MPI+HIP support is disabled for this release.  Please use QUICK
-version 23.08b for HIP support
+**NOTE:** For GPU builds, the compilation of the GPU-enabled two electron
+repulsion integral (ERI) code can take a significant amount of time (several
+minutes for default builds and several hours for f-function basis set support)
+-- be patient as the compiler is working hard to generate highly-performant code.
 
 Compatible Compilers and Hardware
 ---------------------------------
 
-In general QUICK works well with a range of compilers (GNU, Intel), math
-libraries (Intel MKL, reference BLAS/LAPACK, MAGMA), MPI implementations
-(OpenMPI, Intel MPI), and GPU SDK versions (CUDA and ROCm/HIP).  We have
-specifically tested |QUICK_VERSION| with following compilers, libraries, and
-tools.
-
-**Linux:**
-
- 1. GNU GCC v7.3.0; OpenMPI v3.1.1; CUDA v9.2.88; CMake v3.11.4
- 2. GNU GCC v8.3.0; OpenMPI v3.1.4; CUDA v10.2.89; CMake v3.15.1
- 3. GNU GCC v11.3.0; OpenMPI v4.1.4; CUDA v11.8; CMake v3.23.1
- 4. GNU GCC v12.3.0; OpenMPI v5.0.0; CUDA v12.3; CMake v3.26.3
- 5. Clang v14.0 / GNU GCC v11.3.0 (Fortran); OpenMPI v3.1.4; CUDA v10.2.89; CMake v3.15.3
- 6. Intel v2021b; Intel MPI v2021b; CUDA v11.8; CMake v3.23.1
- 7. Intel OneAPI/LLVM v2022.2.1; Intel MPI v2022.2.1; CMake v3.18.4
-
-**NOTE:** QUICK GPU builds require at least CUDA v7.x and ROCm v5.1.x for CUDA
-and HIP versions, respectively. Please consult the Release Notes for the
-respective GPU SDKs on supported GPU devices.
-
-|QUICK_VERSION| CUDA version has been tested on the following GPU cards: A100,
-RTX3080Ti, RTX2080Ti, RTX8000, RTX6000, RTX2080, T4, V100, Titan V, P100, M40,
-GTX1080, K80, and K40.
-
-|QUICK_VERSION| HIP version is currently disabled. Please use QUICK-23.08b if you want to use AMD GPUs.
-
-.. |QUICK_VERSION| HIP version has been tested on the following GPU cards: MI100,
-   MI210, and MI250. As of QUICK-23.03, the performance on MI210 and MI250 cards is
-   not optimized but the code runs properly. 
+In general QUICK works well with a range of compilers (GNU, Clang, Intel, NVHPC
+SDK/PGI), math libraries (Intel MKL, reference BLAS/LAPACK, MAGMA), MPI
+implementations (OpenMPI, MPICH, Intel MPI), and GPU SDK versions (CUDA,
+ROCm/HIP). |QUICK_VERSION| is automatically tested on Github with following
+combinations of OS versions, compilers, libraries, and tools:
+
+ - Ubuntu v22.04.05 (x86_64), GNU GCC v10.5.0; OpenMPI v4.1.2; CMake v3.31.6
+ - Ubuntu v22.04.05 (x86_64), GNU GCC v11.4.0; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), GNU GCC v12.3.0; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), GNU GCC v13.3.0; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), GNU GCC v14.2.0; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), GNU GCC v14.2.0; MPICH v4.2.0; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), Clang v17.0.6; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), Clang v18.1.3; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), Intel oneAPI v2024.2.1; Intel MPI (CCL) v2021.14; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), Intel oneAPI v2025.0.1; Intel MPI (CCL) v2021.14; CMake v3.31.6
+ - Ubuntu v24.04.2 (x86_64), NVIDIA HPC SDK v25.1 (PGI); OpenMPI v4.1.7rc1; CMake v3.31.6
+ - Ubuntu v24.04.2 (ARM), GNU GCC v14.2.0; OpenMPI v4.1.6; CMake v3.31.6
+ - Ubuntu v24.04.2 (ARM), GNU GCC v14.2.0; MPICH v4.2.0; CMake v3.31.6
+ - MacOS 13 (x86_64), GNU GCC v14.2.0_1; OpenMPI v; CMake v3.31.6 (Homebrew)
+ - MacOS 13 (x86_64), GNU GCC v15.0.7; OpenMPI v; CMake v3.31.6 (Homebrew)
+ - MacOS 14 (ARM), GNU GCC v14.2.0_1; OpenMPI v; CMake v3.31.6 (Homebrew)
+ - MacOS 14 (ARM), GNU GCC v15.0.7; OpenMPI v; CMake v3.31.6 (Homebrew)
+
+**NOTE:** QUICK GPU builds require CUDA >= v7.x or ROCm <= v5.4.2, >= v6.2.1
+for CUDA and HIP versions, respectively. Please consult the Release Notes for
+the respective GPU SDKs on supported GPU devices and compatible software
+dependencies (compilers, etc.).
+
+|QUICK_VERSION| CUDA version has been tested on the following GPUs: H200, H100,
+A100, RTX3080TI, RTX2080TI, RTX8000, RTX6000, RTX2080, T4, V100, Titan V, P100,
+M40, GTX1080, K80, and K40.
+
+|QUICK_VERSION| HIP version has been tested on the following GPUs: MI100,
+MI210, MI250, and MI300A.
 
 **NOTE:** We recommend that the CUDA/MPI+CUDA and HIP/MPI+HIP versions be
 executed only on dedicated GPU cards where no other tasks are being run.
-Performance is better on datacenter GPUs than on consumer GPUs.  For MPI+CUDA
-and MPI+HIP versions, we also recommend that only one CPU core (MPI task) is
-used per GPU; this can be done by setting the number of processes (*e.g.*, in
-the *mpirun* command) equal to the number of GPUs.
-
-**Intel-based Macbooks:**
-
-Software stack (compiler installed via Macports):
-
- 1. macOS 11.7.3; GNU/10.4.0, 11.3.0, 12.2.0; OpenMPI 4.1.4
- 2. macOS 13.2; GNU 12.2.0, OpenMPI 4.1.4
-
-**ARM-based Macbooks (M3 Pro CPU):**
-
-Software stack (compiler installed via Macports):
-
- 1. macOS Sonoma 14.4.1; GNU GCC 12.3.0; OpenMPI 4.1.6
- 2. macOS Sonoma 14.4.1; GNU GCC (Fortran); Clang 17.0.6; OpenMPI 4.1.6
+Performance is better on datacenter GPUs than on consumer GPUs.  For the
+MPI+CUDA and MPI+HIP versions, we also recommend that only one CPU core (MPI
+process) is used per GPU; this can be done by setting the number of processes
+(*e.g.*, in the *mpirun* command) equal to the number of GPUs.
     
 
 Installation
 ------------
 
-Installation of QUICK requires that at least CMake/3.9.0 be installed in the
+Installation of QUICK requires that at least CMake v3.12.0 be installed in the
 target machine. To install QUICK using CMake, one must first create a build
 directory (separate from the source directory). After installation you can
 safely delete this build directory if you want to save disk space. Assuming the
@@ -89,7 +78,7 @@ CUDA version
 
 Assuming you have created a directory named *builddir* in the ``QUICK_HOME``
 directory and you want to install QUICK into directory ``QUICK_INSTALL``, use
-GNU compiler tool chain, and want to compile for the Nvidia Volta
+GNU compiler tool chain, and want to compile for the NVIDIA Volta
 microarchitecture, all QUICK versions can be configured and built as follows:
 
 .. code-block:: none
@@ -138,8 +127,9 @@ Path to ROCm installation can be specified using ``-DHIP_TOOLKIT_ROOT_DIR`` but
 this is optional. Flags ``-DMAGMA`` and ``-DMAGMA_ROOT`` are used to enable
 MAGMA library support for matrix diagonalization and specify the MAGMA
 installation directory, respectively. The use of MAGMA is optional but highly
-recommended since the diagonalization is performed on host (CPU) by default
-(which can be very slow). 
+recommended for older ROCm versions (< v5.3.0) since matrix diagonalization is
+performed on host (CPU) in QUICK by default due to poor performance in the ROCm
+math libraries (rocSOLVER). 
 
 If the microarchitecture is not specified (i.e. absence of the
 ``-DQUICK_USER_ARCH`` flag), QUICK will be compiled for gfx908 architecture. As of
@@ -194,7 +184,6 @@ here: `hands-on tutorials <hands-on-tutorials.html>`_.
 Uninstallation and Cleaning
 ---------------------------
 
-Simply delete contents inside build and install directories and/or delete the
-build and install directories.
+Delete the build and install directories and their contents.
 
 *Last updated by Andreas Goetz on 04/25/2024.*
diff --git a/docs/source/known-issues.rst b/docs/source/known-issues.rst
@@ -6,7 +6,8 @@ detected the issues listed below. If you find anything other than these, please
 feel free to report any bugs or issues through our GitHub page:
 `https://github.com/merzlab/QUICK/issues <https://github.com/merzlab/QUICK/issues>`_.
 
-Feel free to ask questions or start a discussion on the Discussions section of our GitHub page: `https://github.com/merzlab/QUICK/discussions <https://github.com/merzlab/QUICK/discussions>`_.
+Feel free to ask questions or start a discussion on the Discussions section of
+our GitHub page: `https://github.com/merzlab/QUICK/discussions <https://github.com/merzlab/QUICK/discussions>`_.
 
 Compile time
 ^^^^^^^^^^^^
@@ -26,13 +27,6 @@ Kepler targeted microarchitectures (<= v11.0 for sm_30, <= v11.8 for
 sm_35/sm_37).  Please consult the Release Notes for your installed CUDA SDK
 version for further details on supported GPU microarchitectures.
 
-2. Compiling HIP/MPI+HIP versions fails for this release (unsupported)
-**********************************************************************
-HIP/MPI+HIP support disabled for this release due to required GPU code rewrites
-(related to added f basis function support).
-
-Solution: Use QUICK v23.08b for HIP/MPI+HIP support until support is restored.
-
 Runtime
 ^^^^^^^
 
diff --git a/docs/source/performance.rst b/docs/source/performance.rst
@@ -18,11 +18,12 @@ Accuracy of energies and gradients
 We have compared energies and gradients computed by QUICK with values computed
 by other quantum chemical packages. HF energies and gradients have displayed
 accuracies of 1.0E-6 Hartree and 1.0E-4 Hartree/Bohr or better,
-respectively, for test systems (see `https://github.com/merzlab/QUICK-tests
-<https://github.com/merzlab/QUICK-tests>`_ for test cases). DFT energies and
-gradients have shown similar accuracies in most cases, however, we have
-observed larger deviations for some molecular systems. Such deviations usually
-arise due to differences in the exchange correlation quadrature grid.
+respectively, for test systems (see
+`https://github.com/merzlab/QUICK-tests <https://github.com/merzlab/QUICK-tests>`_
+for test cases). DFT energies and gradients have shown similar accuracies in
+most cases, however, we have observed larger deviations for some molecular
+systems. Such deviations usually arise due to differences in the exchange
+correlation quadrature grid.
 
 
 Performance of QUICK CUDA single GPU and MPI parallel versions
diff --git a/docs/source/quick_docs_common.rst b/docs/source/quick_docs_common.rst
@@ -1 +1 @@
-.. |QUICK_VERSION| replace:: QUICK-24.03
+.. |QUICK_VERSION| replace:: QUICK-25.03
diff --git a/docs/source/release-notes.rst b/docs/source/release-notes.rst
@@ -3,12 +3,21 @@ Release notes
 
 The new features released with each QUICK version are as follows. 
 
+QUICK-25.03
+***********
+• AMD GPU support restored for HIP/MPI+HIP codes (requires ROCm <= v5.4.2, >= v6.2.1 due to known ROCm bugs)
+• Added Clang and NVHPC SDK (PGI) compiler support and fixes for MacOS builds
+• GPU code improvements (refactoring to unify codes, reduce memory utilization, provide better error checking, and apply fixes)
+• Updated SAD guesses to fix SCF performance regression (to match those with QUICK-21.03 for faster SCF convergence)
+• Various other bug fixes, optimizations, and test updates (expanded automated CI testing on Github)
+• QUICK-25.03 available with AmberTools 2025
+
 QUICK-24.03
 ***********
-• Added ERI engine support for f basis functions to CUDA/MPI+CUDA codes (disabled be default)
+• Added two electron repulsion integral (ERI) engine support for f basis functions to CUDA/MPI+CUDA codes (disabled be default)
 • HIP/MPI+HIP support disabled, please use QUICK version 23.08b for HIP support
 • Added initial Intel OneAPI/LLVM compiler support
-• Added support for the following basis sets: aug-cc-pVTZ, def2-TZVPD, def2-TZVPP, aug-pc-1, pc-2, and aug-pc-2
+• Added support for the following basis sets: aug-cc-pVTZ, def2-TZVPD, def2-TZVPP, aug-PC-1, PC-2, and aug-PC-2
 • Various bug fixes, optimizations, and test updates
 
 QUICK-23.03
diff --git a/docs/source/working_libxc_funcs.rst b/docs/source/working_libxc_funcs.rst

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-.. \|QUICK_VERSION\| replace:: QUICK-24.03`
	`1`	`+.. \|QUICK_VERSION\| replace:: QUICK-25.03`