Skip to content

Commit b566918

Browse files
committed
update
1 parent 9cc3620 commit b566918

File tree

1 file changed

+13
-12
lines changed

1 file changed

+13
-12
lines changed

docs/software/sciapps/cp2k.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ srun --cpu-bind=socket ./mps-wrapper.sh cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPU
9494
for good performance. With [Intel MKL], this is not necessary and one can set `OMP_NUM_THREADS` to
9595
`SLURM_CPUS_PER_TASK`.
9696

97-
5. [DBCSR] relies on extensive JIT compilation and we store the cache in memory to avoid I/O overhead.
97+
5. [DBCSR] relies on extensive JIT compilation, and we store the cache in memory to avoid I/O overhead.
9898
This is set by default on the HPC platform, but it's set here explicitly as it's essential to avoid performance degradation.
9999

100100
6. CP2K's dependencies use GPU-aware MPI, which requires enabling support at runtime.
@@ -411,15 +411,15 @@ ninja -j 32
411411

412412
See [manual.cp2k.org/CMake] for more details.
413413

414-
### Known issues
414+
## Known issues
415415

416-
#### DLA-Future
416+
### DLA-Future
417417

418418
The `cp2k/2025.1` uenv provides CP2K with [DLA-Future] support enabled.
419419
The DLA-Future library is initialized even if you don't [explicitly ask to use it](https://manual.cp2k.org/trunk/technologies/eigensolvers/dlaf.html).
420420
This can lead to some surprising warnings and failures described below.
421421

422-
##### `CUSOLVER_STATUS_INTERNAL_ERROR` during initialization
422+
#### `CUSOLVER_STATUS_INTERNAL_ERROR` during initialization
423423

424424
If you are heavily over-subscribing the GPU by running multiple ranks per GPU, you may encounter the following error:
425425

@@ -437,7 +437,7 @@ DLAF_NUM_GPU_BLAS_HANDLES=1
437437
DLAF_NUM_GPU_LAPACK_HANDLES=1
438438
```
439439

440-
##### Warning about pika only using one worker thread
440+
#### Warning about pika only using one worker thread
441441

442442
When running CP2K with multiple tasks per node and only one core per task, the initialization of DLA-Future may trigger the following warning:
443443

@@ -449,11 +449,12 @@ you need or use --pika:ignore-process-mask to use all resources. Use
449449
--pika:print-bind to print the thread bindings used by pika.
450450
```
451451

452-
This warning is triggered because the runtime used by DLA-Future, [pika](https://pikacpp.org), should typically be used with more than one thread and indicates a configuration mistake.
452+
This warning is triggered because the runtime used by DLA-Future, [pika](https://pikacpp.org),
453+
should typically be used with more than one thread and indicates a configuration mistake.
453454
However, if you are not using DLA-Future, the warning is harmless and can be ignored.
454455
The warning cannot be silenced.
455456

456-
#### DBCSR GPU scaling
457+
### DBCSR GPU scaling
457458

458459
On the GH200 architecture, it has been observed that the GPU accelerated version of [DBCSR] does not perform optimally in some cases.
459460
For example, in the `QS/H2O-1024` benchmark above, CP2K does not scale well beyond 2 nodes.
@@ -464,21 +465,21 @@ GPU acceleration on/off with an environment variable:
464465
export DBCSR_RUN_ON_GPU=0
465466
```
466467

467-
While GPU acceleration is very good on a small number of nodes, the CPU implementation scales better.
468-
Therefore, for CP2K jobs running on a large number of nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
468+
While GPU acceleration is very good on few nodes, the CPU implementation scales better.
469+
Therefore, for CP2K jobs running on many nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
469470
environment variable.
470471

471-
Ssome niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
472+
Some niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
472473
of DBCSR. Generally, if the function `dbcsr_multiply_generic` takes a significant portion of the timing report
473474
(at the end of the CP2K output file), it is worth investigating the effect of the `DBCSR_RUN_ON_GPU` environment variable.
474475

475476

476-
#### CUDA grid backend with high angular momenta basis sets
477+
### CUDA grid backend with high angular momenta basis sets
477478

478479
The CP2K grid CUDA backend is currently buggy on Alps. Using basis sets with high angular momenta ($l \ge 3$)
479480
result in slow calculations, especially for force calculations with meta-GGA functionals.
480481

481-
As a workaround, you can you can disable CUDA acceleration fo the grid backend:
482+
As a workaround, you can disable CUDA acceleration for the grid backend:
482483

483484
```bash
484485
&GLOBAL

0 commit comments

Comments
 (0)