Skip to content

Commit 10b52d4

Browse files
kandurimsimberg
andauthored
Apply formatting suggestions from code review
Co-authored-by: Mikael Simberg <[email protected]>
1 parent 0c81d7a commit 10b52d4

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

docs/software/sciapps/gromacs.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ It is primarily designed for biochemical molecules like proteins, lipids and nuc
77

88
!!! note "uenvs"
99

10-
[GROMACS] is provided on [ALPS][platforms-on-alps] via [uenv][ref-uenv].
10+
[GROMACS] is provided on [Alps][ref-alps-platforms] via [uenv][ref-uenv].
1111
Please have a look at the [uenv documentation][ref-uenv] for more information about uenvs and how to use them.
1212

13-
## Licensing Terms & Conditions
13+
## Licensing terms & conditions
1414

1515
GROMACS is a joint effort, with contributions from developers around the world: users agree to acknowledge use of GROMACS in any reports or publications of results obtained with the Software (see [GROMACS Homepage](https://www.gromacs.org/about.html) for details).
1616

17-
## Key Features
17+
## Key features
1818

1919
1. **Molecular Dynamics Simulations**: GROMACS performs classical MD simulations, which compute the trajectories of atoms based on Newton's laws of motion. It integrates the equations of motion to simulate the behavior of molecular systems, capturing their dynamic properties and conformational changes.
2020

@@ -64,7 +64,7 @@ The `gromacs` view contains GROMACS 2024.1 that has been configured and tested f
6464

6565
Use `exit` to leave the user environment and return to the original shell.
6666

67-
### How to Run
67+
### How to run
6868

6969
To start a job, 2 bash scripts are required: a standard SLURM submission script, and a [wrapper to start the CUDA MPS daemon][ref-slurm-gh200-single-rank-per-gpu] (in order to have multiple MPI ranks per GPU).
7070

@@ -100,7 +100,7 @@ This can be run using `sbatch launch.sbatch` on the login node with the user env
100100

101101
This submission script is only representative. Users must run their input files with a range of parameters to find an optimal set for the production runs. Some hints for this exploration below:
102102

103-
!!! note "Configuration Hints"
103+
!!! note "Configuration hints"
104104

105105
- Each Grace CPU has 72 cores, but a small number of them are used for the underlying processes such as runtime daemons. So all 72 cores are not available for compute. To be safe, do not exceed more than 64 OpenMP threads on a single CPU even if it leads to a handful of cores idling.
106106
- Each node has 4 Grace CPUs and 4 Hopper GPUs. When running 8 MPI ranks (meaning two per CPU), keep in mind to not ask for more than 32 OpenMP threads per rank. That way no more than 64 threads will be running on a single CPU.
@@ -119,7 +119,7 @@ In addition, the STMV (~1 million atom) benchmark that NVIDIA publishes on its [
119119

120120
The STMV test case is a fairly large problem size, with constraints operating only on a smaller set of atoms (h-bonds) which allows the update step to also take place on GPUs. This makes the simulation almost **fully GPU resident** with the key performance intensive bits namely the long-range forces (PME), short-range non-bonded forces (NB) and bonded forces all running on the GPU. On a single node, this leads to the following scaling on GROMACS 2024.1.
121121

122-
#### STMV - Multiple Ranks - Single Node Upto 4 GPUs
122+
#### STMV - Multiple ranks - Single node up to 4 GPUs
123123

124124
| #GPUs | ns/day | Speedup |
125125
| ------ | ------- | ------- |
@@ -130,7 +130,7 @@ The STMV test case is a fairly large problem size, with constraints operating on
130130

131131
The other benchmark cases from HECBioSim simulates a pair of proteins (hEGFR Dimers/Tetramers of [1IVO](https://www.rcsb.org/structure/1IVO) and [1NQL](https://www.rcsb.org/structure/1NQL)) with a large lipid membrane. This also involves a fairly large number of charged ions which increases the proportion of PME in the total compute workload. For these simulations, constraints are applicable on all atoms, which effectively **prevents the update from happening in the GPU**, thus negatively impacting scaling due large host-to-device data transfers and key computations happening on the CPU. These show the following scaling characteristics on GROMACS 2024.1:
132132

133-
#### 1.4m Atom System - Multiple Ranks - Single Node
133+
#### 1.4m Atom System - Multiple ranks - Single node
134134

135135
Total number of atoms = 1,403,182
136136

@@ -141,7 +141,7 @@ Protein atoms = 43,498 Lipid atoms = 235,304 Water atoms = 1,123,392 Ions = 9
141141
| 1 | 31.243 | 1x |
142142
| 4 | 55.936 | 1.79x |
143143

144-
#### 3m Atom System - Single Node - Multiple Ranks
144+
#### 3m Atom System - Single node - Multiple ranks
145145

146146
Total number of atoms = 2,997,924
147147

@@ -159,7 +159,7 @@ Protein atoms = 86,996 Lipid atoms = 867,784 Water atoms = 2,041,230 Ions = 1
159159
- If the problem allows the integration step to take place on the GPU with `-update gpu`, that can lead to significant performance and scaling gains as it allows an even greater part of the computations to take place on the GPU.
160160
- A single node of the GH200 cluster offers 4x CPU+GPU. For problems that can benefit from scaling beyond a single node, use the flag `export FI_CXI_RX_MATCH_MODE=software` in the SBATCH script. The best use of resources in terms of node-hours might be achieved on a single node for most simulations.
161161

162-
## Further Documentation
162+
## Further documentation
163163

164164
* [GROMACS Homepage][GROMACS]
165165
* [GROMACS Manual](https://manual.gromacs.org/2024.1/index.html)

0 commit comments

Comments
 (0)