Skip to content

Commit 32a819d

Browse files
kanduriPrashanth KandurimsimbergRMeli
authored
GROMACS docs (#87)
* gromacs docs migrated from KB to mkdocs * Apply suggestions from code review The other suggestions will be implemented in the follow-up commits Co-authored-by: Mikael Simberg <[email protected]> Co-authored-by: Rocco Meli <[email protected]> * incorporate all review comments * Apply formatting suggestions from code review Co-authored-by: Mikael Simberg <[email protected]> * update code owner --------- Co-authored-by: Prashanth Kanduri <[email protected]> Co-authored-by: Mikael Simberg <[email protected]> Co-authored-by: Rocco Meli <[email protected]>
1 parent f260d4e commit 32a819d

File tree

2 files changed

+166
-2
lines changed

2 files changed

+166
-2
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ docs/software/communication @Madeeks @msimberg
44
docs/software/devtools/linaro @jgphpc
55
docs/software/prgenv/linalg.md @finkandreas @msimberg
66
docs/software/sciapps/cp2k.md @abussy @RMeli
7+
docs/software/sciapps/gromacs.md @kanduri
78
docs/software/ml @boeschf

docs/software/sciapps/gromacs.md

Lines changed: 165 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,167 @@
11
[](){#ref-uenv-gromacs}
22
# GROMACS
3-
!!! todo
4-
complete docs
3+
4+
[GROMACS] (GROningen Machine for Chemical Simulations) is a versatile and widely-used open source package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
5+
6+
It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
7+
8+
!!! note "uenvs"
9+
10+
[GROMACS] is provided on [Alps][ref-alps-platforms] via [uenv][ref-uenv].
11+
Please have a look at the [uenv documentation][ref-uenv] for more information about uenvs and how to use them.
12+
13+
## Licensing terms & conditions
14+
15+
GROMACS is a joint effort, with contributions from developers around the world: users agree to acknowledge use of GROMACS in any reports or publications of results obtained with the Software (see [GROMACS Homepage](https://www.gromacs.org/about.html) for details).
16+
17+
## Key features
18+
19+
1. **Molecular Dynamics Simulations**: GROMACS performs classical MD simulations, which compute the trajectories of atoms based on Newton's laws of motion. It integrates the equations of motion to simulate the behavior of molecular systems, capturing their dynamic properties and conformational changes.
20+
21+
2. **Force Fields**: GROMACS supports a wide range of force fields, including CHARMM, AMBER, OPLS-AA, and GROMOS, which describe the potential energy function and force interactions between atoms. These force fields provide accurate descriptions of the molecular interactions, allowing researchers to study various biological processes and molecular systems.
22+
23+
3. **Parallelization and Performance**: GROMACS is designed for high-performance computing (HPC) and can efficiently utilize parallel architectures, such as multi-core CPUs and GPUs. It employs domain decomposition methods and advanced parallelization techniques to distribute the computational workload across multiple computing resources, enabling fast and efficient simulations.
24+
25+
4. **Analysis and Visualization**: GROMACS offers a suite of analysis tools to extract and analyze data from MD simulations. It provides functionalities for computing properties such as energy, temperature, pressure, radial distribution functions, and free energy landscapes. GROMACS also supports visualization tools, allowing users to visualize and analyze the trajectories of molecular systems.
26+
27+
5. **User-Friendly Interface**: GROMACS provides a command-line interface (CLI) and a set of well-documented input and control files, making it accessible to both novice and expert users. It offers flexibility in defining system parameters, simulation conditions, and analysis options through easily modifiable input files.
28+
29+
6. **Integration with Other Software**: GROMACS can be integrated with other software packages and tools to perform advanced analysis and extend its capabilities. It supports interoperability with visualization tools like VMD and PyMOL, analysis packages like GROMACS Analysis Tools (GROMACS Tools) and MDAnalysis, and scripting languages such as Python, allowing users to leverage a wide range of complementary tools.
30+
31+
## Daint on Alps (GH200)
32+
33+
### Setup
34+
35+
On Alps, we provide pre-built user environments containing GROMACS alongside all the required dependencies for the GH200 hardware setup. To access the `gmx_mpi` executable, we do the following:
36+
37+
```bash
38+
uenv image find # list available images
39+
40+
uenv image pull gromacs/VERSION:TAG. # copy version:tag from the list above
41+
uenv start gromacs/VERSION:TAG --view=gromacs # load the gromacs view
42+
43+
gmx_mpi --version # check GROMACS version
44+
```
45+
46+
The images also provide two alternative views, namely `plumed` and `develop`.
47+
48+
```console
49+
$ uenv status
50+
/user-environment:gromacs-gh200
51+
GPU-optimised GROMACS with and without PLUMED, and the toolchain to build your own GROMACS.
52+
modules: no modules available
53+
views:
54+
develop
55+
gromacs
56+
plumed
57+
```
58+
59+
The `develop` view has all the required dependencies or GROMACS without the program itself. This is meant for those users who want to use a customized variant of GROMACS for their simulation which they build from source. This view makes it convenient for users as it provides the required compilers (GCC) along with the dependencies such as CMake, CUDA, hwloc, Cray MPICH, among many others which their GROMACS can use during build and installation. Users must enable this view each time they want to use their **custom GROMACS installation**.
60+
61+
The `plumed` view contains GROMACS patched with PLUMED. The version of GROMACS in this view may be different from the one in the `gromacs` view due to the compatibility requirements of PLUMED. CSCS will periodically update these user environment images to feature newer versions as they are made available.
62+
63+
The `gromacs` view contains GROMACS 2024.1 that has been configured and tested for the highest performance on the Grace-Hopper nodes.
64+
65+
Use `exit` to leave the user environment and return to the original shell.
66+
67+
### How to run
68+
69+
To start a job, 2 bash scripts are required: a standard SLURM submission script, and a [wrapper to start the CUDA MPS daemon][ref-slurm-gh200-single-rank-per-gpu] (in order to have multiple MPI ranks per GPU).
70+
71+
The wrapper script above needs to be made executable with `chmod +x mps-wrapper.sh`.
72+
73+
The SLURM submission script can be adapted from the template below to use the application and the `mps-wrapper.sh` in conjunction.
74+
75+
```bash title="launch.sbatch"
76+
#!/bin/bash
77+
78+
#SBATCH --job-name="JOB NAME"
79+
#SBATCH --nodes=1 # number of GH200 nodes with each node having 4 CPU+GPU
80+
#SBATCH --ntasks-per-node=8 # 8 MPI ranks per node
81+
#SBATCH --cpus-per-task 32 # 32 OMP threads per MPI rank
82+
#SBATCH --account=ACCOUNT
83+
#SBATCH --hint=nomultithread
84+
#SBATCH --uenv=<GROMACS_UENV>
85+
#SBATCH --view=gromacs
86+
87+
export MPICH_GPU_SUPPORT_ENABLED=1
88+
export FI_CXI_RX_MATCH_MODE=software
89+
90+
export GMX_GPU_DD_COMMS=true
91+
export GMX_GPU_PME_PP_COMMS=true
92+
export GMX_FORCE_UPDATE_DEFAULT_GPU=true
93+
export GMX_ENABLE_DIRECT_GPU_COMM=1
94+
export GMX_FORCE_GPU_AWARE_MPI=1
95+
96+
srun ./mps-wrapper.sh gmx_mpi mdrun -s input.tpr -ntomp 32 -bonded gpu -nb gpu -pme gpu -pin on -v -noconfout -dlb yes -nstlist 300 -gpu_id 0123 -npme 1 -nsteps 10000 -update gpu
97+
```
98+
99+
This can be run using `sbatch launch.sbatch` on the login node with the user environment loaded.
100+
101+
This submission script is only representative. Users must run their input files with a range of parameters to find an optimal set for the production runs. Some hints for this exploration below:
102+
103+
!!! note "Configuration hints"
104+
105+
- Each Grace CPU has 72 cores, but a small number of them are used for the underlying processes such as runtime daemons. So all 72 cores are not available for compute. To be safe, do not exceed more than 64 OpenMP threads on a single CPU even if it leads to a handful of cores idling.
106+
- Each node has 4 Grace CPUs and 4 Hopper GPUs. When running 8 MPI ranks (meaning two per CPU), keep in mind to not ask for more than 32 OpenMP threads per rank. That way no more than 64 threads will be running on a single CPU.
107+
- Try running both 64 OMP threads x 1 MPI rank and 32 OMP threads x 2 MPI ranks configurations for the test problems and pick the one giving better performance. While using multiple GPUs, the latter can be faster by 5-10%.
108+
- `-update gpu` may not be possible for problems that require constraints on all atoms. In such cases, the update (integration) step will be performed on the CPU. This can lead to performance loss of at least 10% on a single GPU. Due to the overheads of additional data transfers on each step, this will also lead to lower scaling performance on multiple GPUs.
109+
- When running on a single GPU, one can either configure the simulation with 1-2 MPI ranks with `-gpu_id` as `0`, or try running the simulation with a small number of parameters and let GROMACS run with defaults/inferred parameters with a command like the following in the SLURM script:
110+
`srun ./mps-wrapper.sh -- gmx_mpi mdrun -s input.tpr -ntomp 64`
111+
- Given the compute throughput of each Grace-Hopper module (single CPU+GPU), **for smaller-sized problems, it is possible that a single-GPU run is the fastest**. This may happen when the overheads of domain decomposition, communication and orchestration exceed the benefits of parallelism across multiple GPUs. In our test cases, a single Grace-Hopper module (1 CPU+GPU) has consistently shown a 6-8x performance speedup over a single node on Piz Daint (Intel Xeon Broadwell + P100).
112+
- Try runs with and without specifying the GPU IDs explicitly with `-gpu_id 0123`. For the multi-node case, removing it might yield the best performance.
113+
114+
## Scaling
115+
116+
Benchmarking is done with large MD simulations of systems of 1.4 million and 3 million atoms, in order to fully saturate the GPUs, from the [HECBioSim Benchmark Suite](https://www.hecbiosim.ac.uk/access-hpc/benchmarks).
117+
118+
In addition, the STMV (~1 million atom) benchmark that NVIDIA publishes on its [website](https://developer.nvidia.com/hpc-application-performance) was also tested for comparison.
119+
120+
The STMV test case is a fairly large problem size, with constraints operating only on a smaller set of atoms (h-bonds) which allows the update step to also take place on GPUs. This makes the simulation almost **fully GPU resident** with the key performance intensive bits namely the long-range forces (PME), short-range non-bonded forces (NB) and bonded forces all running on the GPU. On a single node, this leads to the following scaling on GROMACS 2024.1.
121+
122+
#### STMV - Multiple ranks - Single node up to 4 GPUs
123+
124+
| #GPUs | ns/day | Speedup |
125+
| ------ | ------- | ------- |
126+
| 1 | 42.855 | 1x |
127+
| 2 | 61.583 | 1.44x |
128+
| 4 | 115.316 | 2.69x |
129+
| 8 | 138.896 | 3.24x |
130+
131+
The other benchmark cases from HECBioSim simulates a pair of proteins (hEGFR Dimers/Tetramers of [1IVO](https://www.rcsb.org/structure/1IVO) and [1NQL](https://www.rcsb.org/structure/1NQL)) with a large lipid membrane. This also involves a fairly large number of charged ions which increases the proportion of PME in the total compute workload. For these simulations, constraints are applicable on all atoms, which effectively **prevents the update from happening in the GPU**, thus negatively impacting scaling due large host-to-device data transfers and key computations happening on the CPU. These show the following scaling characteristics on GROMACS 2024.1:
132+
133+
#### 1.4m Atom System - Multiple ranks - Single node
134+
135+
Total number of atoms = 1,403,182
136+
137+
Protein atoms = 43,498 Lipid atoms = 235,304 Water atoms = 1,123,392 Ions = 986
138+
139+
| #GPUs | ns/day | Speedup |
140+
| ------ | ------- | ------- |
141+
| 1 | 31.243 | 1x |
142+
| 4 | 55.936 | 1.79x |
143+
144+
#### 3m Atom System - Single node - Multiple ranks
145+
146+
Total number of atoms = 2,997,924
147+
148+
Protein atoms = 86,996 Lipid atoms = 867,784 Water atoms = 2,041,230 Ions = 1,914
149+
150+
| #GPUs | ns/day | Speedup |
151+
| ------ | ------- | ------- |
152+
| 1 | 14.355 | 1x |
153+
| 4 | 30.289 | 2.11x |
154+
155+
!!! warning "Known Performance/Scaling Issues"
156+
157+
- The currently provided build of GROMACS allows **only one MPI rank to be dedicated for PME** with `-nmpe 1`. This becomes a serious performance limitation for larger systems where the non-PME ranks finish their work before the PME rank leading to unwanted load imbalances across ranks. This limitation is targeted to be fixed in the subsequent releases of our builds of user environments.
158+
- The above problem is especially critical for large problem sizes (1+ million atom systems) but is far less apparent in small and medium sized runs.
159+
- If the problem allows the integration step to take place on the GPU with `-update gpu`, that can lead to significant performance and scaling gains as it allows an even greater part of the computations to take place on the GPU.
160+
- A single node of the GH200 cluster offers 4x CPU+GPU. For problems that can benefit from scaling beyond a single node, use the flag `export FI_CXI_RX_MATCH_MODE=software` in the SBATCH script. The best use of resources in terms of node-hours might be achieved on a single node for most simulations.
161+
162+
## Further documentation
163+
164+
* [GROMACS Homepage][GROMACS]
165+
* [GROMACS Manual](https://manual.gromacs.org/2024.1/index.html)
166+
167+
[GROMACS]: https://www.gromacs.org

0 commit comments

Comments
 (0)