Skip to content

Commit 55247f4

Browse files
RMelimsimberg
andauthored
Update CP2K documentation (#51)
* update cp2k documentation * update codeowners * Update .github/CODEOWNERS Co-authored-by: Mikael Simberg <[email protected]> * Update docs/software/sciapps/cp2k.md Co-authored-by: Mikael Simberg <[email protected]> * Update docs/software/sciapps/cp2k.md Co-authored-by: Mikael Simberg <[email protected]> * save * update * Update docs/software/sciapps/cp2k.md * Update docs/software/sciapps/cp2k.md * Update docs/software/sciapps/cp2k.md * Update docs/software/sciapps/cp2k.md * Update docs/software/sciapps/cp2k.md * Update docs/software/sciapps/cp2k.md * Update docs/software/sciapps/cp2k.md * update --------- Co-authored-by: Mikael Simberg <[email protected]>
1 parent f623ff2 commit 55247f4

File tree

2 files changed

+70
-25
lines changed

2 files changed

+70
-25
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
* @bcumming @msimberg @RMeli
2-
docs/software/sciapps/cp2k @abussy @RMeli
2+
docs/software/sciapps/cp2k.md @abussy @RMeli
33
docs/software/communication @msimberg

docs/software/sciapps/cp2k.md

Lines changed: 69 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -63,22 +63,22 @@ MPS] daemon so that multiple MPI ranks can use the same GPU.
6363
#!/bin/bash -l
6464

6565
#SBATCH --job-name=cp2k-job
66-
#SBATCH --time=00:30:00 # (1)
66+
#SBATCH --time=00:30:00 (1)
6767
#SBATCH --nodes=4
6868
#SBATCH --ntasks-per-core=1
69-
#SBATCH --ntasks-per-node=32 # (2)
70-
#SBATCH --cpus-per-task=8 # (3)
69+
#SBATCH --ntasks-per-node=32 (2)
70+
#SBATCH --cpus-per-task=8 (3)
7171
#SBATCH --account=<ACCOUNT>
7272
#SBATCH --hint=nomultithread
7373
#SBATCH --hint=exclusive
7474
#SBATCH --no-requeue
7575
#SBATCH --uenv=<CP2K_UENV>
7676
#SBATCH --view=cp2k
7777

78-
export CUDA_CACHE_PATH="/dev/shm/$USER/cuda_cache" # (5)
79-
export MPICH_GPU_SUPPORT_ENABLED=1 # (6)
78+
export CUDA_CACHE_PATH="/dev/shm/$USER/cuda_cache" # (5)!
79+
export MPICH_GPU_SUPPORT_ENABLED=1 # (6)!
8080
export MPICH_MALLOC_FALLBACK=1
81-
export OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1)) # (4)
81+
export OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1)) # (4)!
8282

8383
ulimit -s unlimited
8484
srun --cpu-bind=socket ./mps-wrapper.sh cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>
@@ -94,7 +94,7 @@ srun --cpu-bind=socket ./mps-wrapper.sh cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPU
9494
for good performance. With [Intel MKL], this is not necessary and one can set `OMP_NUM_THREADS` to
9595
`SLURM_CPUS_PER_TASK`.
9696

97-
5. [DBCSR] relies on extensive JIT compilation and we store the cache in memory to avoid I/O overhead.
97+
5. [DBCSR] relies on extensive JIT compilation, and we store the cache in memory to avoid I/O overhead.
9898
This is set by default on the HPC platform, but it's set here explicitly as it's essential to avoid performance degradation.
9999

100100
6. CP2K's dependencies use GPU-aware MPI, which requires enabling support at runtime.
@@ -308,19 +308,19 @@ On Eiger, a similar sbatch script can be used:
308308
```bash title="run_cp2k.sh"
309309
#!/bin/bash -l
310310
#SBATCH --job-name=cp2k-job
311-
#SBATCH --time=00:30:00 # (1)
311+
#SBATCH --time=00:30:00 (1)
312312
#SBATCH --nodes=1
313313
#SBATCH --ntasks-per-core=1
314-
#SBATCH --ntasks-per-node=32 # (2)
315-
#SBATCH --cpus-per-task=4 # (3)
314+
#SBATCH --ntasks-per-node=32 (2)
315+
#SBATCH --cpus-per-task=4 (3)
316316
#SBATCH --account=<ACCOUNT>
317317
#SBATCH --hint=nomultithread
318318
#SBATCH --hint=exclusive
319319
#SBATCH --constraint=mc
320320
#SBATCH --uenv=<CP2K_UENV>
321321
#SBATCH --view=cp2k
322322

323-
export OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1)) # (4)
323+
export OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1)) # (4)!
324324

325325
ulimit -s unlimited
326326
srun --cpu-bind=socket cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>
@@ -336,8 +336,6 @@ srun --cpu-bind=socket cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>
336336
for good performance. With [Intel MKL], this is not necessary and one can set `OMP_NUM_THREADS` to
337337
`SLURM_CPUS_PER_TASK`.
338338

339-
5. [DBCSR] relies on extensive JIT compilation and we store the cache in memory to avoid I/O overhead
340-
341339
* Change <ACCOUNT> to your project account name
342340
* Change `<CP2K_UENV>` to the name (or path) of the actual CP2K uenv you want to use
343341
* Change `<PATH_TO_CP2K_DATA_DIR>` to the actual path to the CP2K data directory
@@ -355,19 +353,26 @@ srun --cpu-bind=socket cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>
355353

356354
## Building CP2K from Source
357355

356+
!!! warning
357+
The following installation instructions are up-to-date with the latest version of CP2K provided by the uenv.
358+
That is, they work when manually compiling the CP2K source code corresponding to the CP2K version provided by the uenv.
359+
**They are not necessarily up-to-date with the latest version of CP2K available on the `master` branch.**
360+
361+
If you are trying to build CP2K from source, make sure you understand what is different in `master`
362+
compared to the latest version of CP2K provided by the uenv.
358363

359364
The [CP2K] uenv provides all the dependencies required to build [CP2K] from source, with several optional features
360365
enabled. You can follow these steps to build [CP2K] from source:
361366

362367
```bash
363-
uenv start --view=develop <CP2K_UENV> # (1)
368+
uenv start --view=develop <CP2K_UENV> # (1)!
364369

365-
cd <PATH_TO_CP2K_SOURCE> # (2)
370+
cd <PATH_TO_CP2K_SOURCE> # (2)!
366371

367372
mkdir build && cd build
368373
CC=mpicc CXX=mpic++ FC=mpifort cmake \
369374
-GNinja \
370-
-DCMAKE_CUDA_HOST_COMPILER=mpicc \ # (3)
375+
-DCMAKE_CUDA_HOST_COMPILER=mpicc \ # (3)!
371376
-DCP2K_USE_LIBXC=ON \
372377
-DCP2K_USE_LIBINT2=ON \
373378
-DCP2K_USE_SPGLIB=ON \
@@ -378,7 +383,7 @@ CC=mpicc CXX=mpic++ FC=mpifort cmake \
378383
-DCP2K_USE_PLUMED=ON \
379384
-DCP2K_USE_DFTD4=ON \
380385
-DCP2K_USE_DLAF=ON \
381-
-DCP2K_USE_ACCEL=CUDA -DCP2K_WITH_GPU=H100 \ # (4)
386+
-DCP2K_USE_ACCEL=CUDA -DCP2K_WITH_GPU=H100 \ # (4)!
382387
..
383388

384389
ninja -j 32
@@ -406,10 +411,50 @@ ninja -j 32
406411

407412
See [manual.cp2k.org/CMake] for more details.
408413

409-
### Known issues
414+
## Known issues
415+
416+
### DLA-Future
417+
418+
The `cp2k/2025.1` uenv provides CP2K with [DLA-Future] support enabled.
419+
The DLA-Future library is initialized even if you don't [explicitly ask to use it](https://manual.cp2k.org/trunk/technologies/eigensolvers/dlaf.html).
420+
This can lead to some surprising warnings and failures described below.
421+
422+
#### `CUSOLVER_STATUS_INTERNAL_ERROR` during initialization
423+
424+
If you are heavily over-subscribing the GPU by running multiple ranks per GPU, you may encounter the following error:
425+
426+
```
427+
created exception: cuSOLVER function returned error code 7 (CUSOLVER_STATUS_INTERNAL_ERROR): pika(bad_function_call)
428+
terminate called after throwing an instance of 'pika::cuda::experimental::cusolver_exception'
429+
what(): cuSOLVER function returned error code 7 (CUSOLVER_STATUS_INTERNAL_ERROR): pika(bad_function_call)
430+
```
431+
432+
The reason is that too many cuSOLVER handles are created.
433+
If you don't need DLA-Future, you can manually set the number of BLAS and LAPACK handlers to 1 by setting the following environment variables:
434+
435+
```bash
436+
DLAF_NUM_GPU_BLAS_HANDLES=1
437+
DLAF_NUM_GPU_LAPACK_HANDLES=1
438+
```
439+
440+
#### Warning about pika only using one worker thread
441+
442+
When running CP2K with multiple tasks per node and only one core per task, the initialization of DLA-Future may trigger the following warning:
443+
444+
```
445+
The pika runtime will be started with only one worker thread because the
446+
process mask has restricted the available resources to only one thread. If
447+
this is unintentional make sure the process mask contains the resources
448+
you need or use --pika:ignore-process-mask to use all resources. Use
449+
--pika:print-bind to print the thread bindings used by pika.
450+
```
410451

452+
This warning is triggered because the runtime used by DLA-Future, [pika](https://pikacpp.org),
453+
should typically be used with more than one thread and indicates a configuration mistake.
454+
However, if you are not using DLA-Future, the warning is harmless and can be ignored.
455+
The warning cannot be silenced.
411456

412-
#### DBCSR GPU scaling
457+
### DBCSR GPU scaling
413458

414459
On the GH200 architecture, it has been observed that the GPU accelerated version of [DBCSR] does not perform optimally in some cases.
415460
For example, in the `QS/H2O-1024` benchmark above, CP2K does not scale well beyond 2 nodes.
@@ -420,21 +465,21 @@ GPU acceleration on/off with an environment variable:
420465
export DBCSR_RUN_ON_GPU=0
421466
```
422467

423-
While GPU acceleration is very good on a small number of nodes, the CPU implementation scales better.
424-
Therefore, for CP2K jobs running on a large number of nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
468+
While GPU acceleration is very good on few nodes, the CPU implementation scales better.
469+
Therefore, for CP2K jobs running on many nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
425470
environment variable.
426471

427-
Ssome niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
472+
Some niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
428473
of DBCSR. Generally, if the function `dbcsr_multiply_generic` takes a significant portion of the timing report
429474
(at the end of the CP2K output file), it is worth investigating the effect of the `DBCSR_RUN_ON_GPU` environment variable.
430475

431476

432-
#### CUDA grid backend with high angular momenta basis sets
477+
### CUDA grid backend with high angular momenta basis sets
433478

434479
The CP2K grid CUDA backend is currently buggy on Alps. Using basis sets with high angular momenta ($l \ge 3$)
435480
result in slow calculations, especially for force calculations with meta-GGA functionals.
436481

437-
As a workaround, you can you can disable CUDA acceleration fo the grid backend:
482+
As a workaround, you can disable CUDA acceleration for the grid backend:
438483

439484
```bash
440485
&GLOBAL

0 commit comments

Comments
 (0)