Skip to content

Commit 7005774

Browse files
committed
enable latex and pr review
1 parent 8f37e2a commit 7005774

File tree

3 files changed

+115
-46
lines changed

3 files changed

+115
-46
lines changed

docs/javascripts/mathjax.js

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
window.MathJax = {
2+
tex: {
3+
inlineMath: [["\\(", "\\)"]],
4+
displayMath: [["\\[", "\\]"]],
5+
processEscapes: true,
6+
processEnvironments: true
7+
},
8+
options: {
9+
ignoreHtmlClass: ".*|",
10+
processHtmlClass: "arithmatex"
11+
}
12+
};
13+
14+
document$.subscribe(() => {
15+
MathJax.startup.output.clearCache()
16+
MathJax.typesetClear()
17+
MathJax.texReset()
18+
MathJax.typesetPromise()
19+
})

docs/software/sciapps/cp2k.md

Lines changed: 93 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,10 @@ PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …). CP2K can
1010
metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimization, and
1111
transition state optimization using NEB or dimer method. See [CP2K Features] for a detailed overview.
1212

13-
!!! note "User Environments"
13+
!!! note "uenvs"
1414

15-
[CP2K] is provided on [ALPS](#platforms-on-alps) via [User Environments](#ref-tool-uenv)
16-
(UENVs). Please have a look at the [User Environments documentation](#ref-tool-uenv) for more information about
17-
UENVs and how to use them.
15+
[CP2K] is provided on [ALPS][platforms-on-alps] via [uenv][ref-tool-uenv].
16+
Please have a look at the [uenv documentation][ref-tool-uenv] for more information about uenvs and how to use them.
1817

1918
## Dependencies
2019

@@ -47,6 +46,8 @@ On our systems, CP2K is built with the following dependencies:
4746

4847
## Running CP2K
4948

49+
### Running on the HPC platform
50+
5051
To start a job, two bash scripts are potentially required: a [slurm] submission script, and a wrapper to start the [CUDA
5152
MPS] daemon so that multiple MPI ranks can use the same GPU.
5253

@@ -71,9 +72,6 @@ export MPICH_MALLOC_FALLBACK=1
7172
export OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1)) # (4)
7273

7374
ulimit -s unlimited
74-
75-
export CP2K_DATA_DIR=<PATH_TO_CP2K_DATA_DIR>
76-
7775
srun --cpu-bind=socket ./mps-wrapper.sh cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>
7876
```
7977

@@ -91,7 +89,7 @@ srun --cpu-bind=socket ./mps-wrapper.sh cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPU
9189

9290

9391
* Change <ACCOUNT> to your project account name
94-
* Change `<CP2K_UENV>` to the name (or path) of the actual CP2K UENV you want to use
92+
* Change `<CP2K_UENV>` to the name (or path) of the actual CP2K uenv you want to use
9593
* Change `<PATH_TO_CP2K_DATA_DIR>` to the actual path to the CP2K data directory
9694
* Change `<CP2K_INPUT>` and `<CP2K_OUTPUT>` to the actual input and output files
9795

@@ -124,25 +122,11 @@ sbatch run_cp2k.sh
124122
per node. Experiments have shown that CP2K performs and scales better when the number of MPI ranks is a power
125123
of 2, even if some cores are left idling.
126124

127-
??? warning "CP2K grid CUDA backend with high angular momenta basis sets"
128-
129-
The CP2K grid CUDA backend is currently buggy on Alps. Using basis sets with high angular momenta ($l \ge 3$)
130-
result in slow calculations, especially for force calculations with meta-GGA functionals.
131-
132-
As a workaround, you can you can disable CUDA acceleration fo the grid backend:
133-
134-
```bash
135-
&GLOBAL
136-
&GRID
137-
BACKEND CPU
138-
&END GRID
139-
&END GLOBAL
140-
```
141125

142126
??? info "Running regression tests"
143127

144-
If you want to run CP2K regression tests with the CP2K executable provided by the UENV, make sure to use the version
145-
of the regression tests corresponding to the version of CP2K provided by the UENV. The regression test data is
128+
If you want to run CP2K regression tests with the CP2K executable provided by the uenv, make sure to use the version
129+
of the regression tests corresponding to the version of CP2K provided by the uenv. The regression test data is
146130
sometimes adjusted, and using the wrong version of the regression tests can lead to test failures.
147131

148132

@@ -302,32 +286,62 @@ sbatch run_cp2k.sh
302286

303287
This RPA input scales well until 32 GH200 nodes.
304288

305-
### Known issues
289+
### Running on Eiger
306290

291+
On Eiger, a similar sbatch script can be used:
307292

308-
#### DBCSR GPU scaling
293+
```bash title="run_cp2k.sh"
294+
#!/bin/bash -l
295+
#SBATCH --job-name=cp2k-job
296+
#SBATCH --time=00:30:00 # (1)
297+
#SBATCH --nodes=1
298+
#SBATCH --ntasks-per-core=1
299+
#SBATCH --ntasks-per-node=32 # (2)
300+
#SBATCH --cpus-per-task=4 # (3)
301+
#SBATCH --account=<ACCOUNT>
302+
#SBATCH --hint=nomultithread
303+
#SBATCH --hint=exclusive
304+
#SBATCH --constraint=mc
305+
#SBATCH --uenv=<CP2K_UENV>
306+
#SBATCH --view=cp2k
309307

310-
On the GH200 architecture, it has been observed that the GPU accelerated version of [DBCSR] does not perform optimally in some cases.
311-
For example, in the `QS/H2O-1024` benchmark above, CP2K does not scale well beyond 2 nodes.
312-
The CPU implementation of DBCSR does not suffer from this. A workaround was implemented in DBCSR, in order to switch
313-
GPU acceleration on/off with an environment variable:
308+
export OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1)) # (4)
314309

315-
```bash
316-
export DBCSR_RUN_ON_GPU=0
310+
ulimit -s unlimited
311+
srun --cpu-bind=socket cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>
317312
```
318313

319-
While GPU acceleration is very good on a small number of nodes, the CPU implementation scales better.
320-
Therefore, for CP2K jobs running on a large number of nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
321-
environment variable.
314+
1. Time format: `HH:MM:SS`
322315

323-
Ssome niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
324-
of DBCSR. Generally, if the function `dbcsr_multiply_generic` takes a significant portion of the timing report
325-
(at the end of the CP2K output file), it is worth investigating the effect of the `DBCSR_RUN_ON_GPU` environment variable.
316+
2. Number of MPI ranks per node
317+
318+
3. Number of CPUs per MPI ranks
319+
320+
4. [OpenBLAS] spawns an extra thread, therefore it is necessary to set `OMP_NUM_THREADS` to `SLURM_CPUS_PER_TASK - 1`
321+
for good performance. With [Intel MKL], this is not necessary and one can set `OMP_NUM_THREADS` to
322+
`SLURM_CPUS_PER_TASK`.
323+
324+
5. [DBCSR] relies on extensive JIT compilation and we store the cache in memory to avoid I/O overhead
325+
326+
* Change <ACCOUNT> to your project account name
327+
* Change `<CP2K_UENV>` to the name (or path) of the actual CP2K uenv you want to use
328+
* Change `<PATH_TO_CP2K_DATA_DIR>` to the actual path to the CP2K data directory
329+
* Change `<CP2K_INPUT>` and `<CP2K_OUTPUT>` to the actual input and output files
330+
331+
!!! warning
332+
333+
The `--cpu-bind=socket` option is necessary to get good performance.
334+
335+
??? info "Running regression tests"
336+
337+
If you want to run CP2K regression tests with the CP2K executable provided by the uenv, make sure to use the version
338+
of the regression tests corresponding to the version of CP2K provided by the uenv. The regression test data is
339+
sometimes adjusted, and using the wrong version of the regression tests can lead to test failures.
326340

327341
## Building CP2K from Source
328342

329343

330-
The [CP2K] UENV provides all the dependencies required to build [CP2K] from source, with several optional features
344+
The [CP2K] uenv provides all the dependencies required to build [CP2K] from source, with several optional features
331345
enabled. You can follow these steps to build [CP2K] from source:
332346

333347
```bash
@@ -355,7 +369,7 @@ CC=mpicc CXX=mpic++ FC=mpifort cmake \
355369
ninja -j 32
356370
```
357371

358-
1. Start the CP2K UENV and load the `develop` view (which provides all the necessary dependencies)
372+
1. Start the CP2K uenv and load the `develop` view (which provides all the necessary dependencies)
359373

360374
2. Go to the CP2K source directory
361375

@@ -368,7 +382,7 @@ ninja -j 32
368382
??? note "Eiger: Intel MKL (before `[email protected]`)"
369383

370384
On `x86` we deployed with `intel-oneapi-mkl` before `[email protected]`.
371-
If you are using a pre-`[email protected]` UENV, add `-DCP2K_SCALAPACK_VENDOR=MKL` to the CMake invocation to find MKL.
385+
If you are using a pre-`[email protected]` uenv, add `-DCP2K_SCALAPACK_VENDOR=MKL` to the CMake invocation to find MKL.
372386

373387
??? note "CUDA architecture for `[email protected]` and earlier"
374388

@@ -377,6 +391,44 @@ ninja -j 32
377391

378392
See [manual.cp2k.org/CMake] for more details.
379393

394+
### Known issues
395+
396+
397+
#### DBCSR GPU scaling
398+
399+
On the GH200 architecture, it has been observed that the GPU accelerated version of [DBCSR] does not perform optimally in some cases.
400+
For example, in the `QS/H2O-1024` benchmark above, CP2K does not scale well beyond 2 nodes.
401+
The CPU implementation of DBCSR does not suffer from this. A workaround was implemented in DBCSR, in order to switch
402+
GPU acceleration on/off with an environment variable:
403+
404+
```bash
405+
export DBCSR_RUN_ON_GPU=0
406+
```
407+
408+
While GPU acceleration is very good on a small number of nodes, the CPU implementation scales better.
409+
Therefore, for CP2K jobs running on a large number of nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
410+
environment variable.
411+
412+
Ssome niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
413+
of DBCSR. Generally, if the function `dbcsr_multiply_generic` takes a significant portion of the timing report
414+
(at the end of the CP2K output file), it is worth investigating the effect of the `DBCSR_RUN_ON_GPU` environment variable.
415+
416+
417+
### CUDA grid backend with high angular momenta basis sets
418+
419+
The CP2K grid CUDA backend is currently buggy on Alps. Using basis sets with high angular momenta ($l \ge 3$)
420+
result in slow calculations, especially for force calculations with meta-GGA functionals.
421+
422+
As a workaround, you can you can disable CUDA acceleration fo the grid backend:
423+
424+
```bash
425+
&GLOBAL
426+
&GRID
427+
BACKEND CPU
428+
&END GRID
429+
&END GLOBAL
430+
```
431+
380432
[CP2K]: https://www.cp2k.org/
381433
[CP2K Features]: https://www.cp2k.org/features
382434
[COSMA]: https://github.com/eth-cscs/COSMA

mkdocs.yml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -157,8 +157,6 @@ markdown_extensions:
157157
# for captioning images
158158
- pymdownx.blocks.caption
159159

160-
# disable mathjax until the "GET /javascripts/mathjax.js HTTP/1.1" code 404 errors are fixed
161-
#extra_javascript:
162-
# - javascripts/mathjax.js
163-
# - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js
164-
160+
extra_javascript:
161+
- javascripts/mathjax.js
162+
- https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js

0 commit comments

Comments
 (0)