You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for good performance. With [Intel MKL], this is not necessary and one can set `OMP_NUM_THREADS` to
95
95
`SLURM_CPUS_PER_TASK`.
96
96
97
-
5.[DBCSR] relies on extensive JIT compilation and we store the cache in memory to avoid I/O overhead.
97
+
5.[DBCSR] relies on extensive JIT compilation, and we store the cache in memory to avoid I/O overhead.
98
98
This is set by default on the HPC platform, but it's set here explicitly as it's essential to avoid performance degradation.
99
99
100
100
6. CP2K's dependencies use GPU-aware MPI, which requires enabling support at runtime.
@@ -411,15 +411,15 @@ ninja -j 32
411
411
412
412
See [manual.cp2k.org/CMake] for more details.
413
413
414
-
###Known issues
414
+
## Known issues
415
415
416
-
####DLA-Future
416
+
### DLA-Future
417
417
418
418
The `cp2k/2025.1` uenv provides CP2K with [DLA-Future] support enabled.
419
419
The DLA-Future library is initialized even if you don't [explicitly ask to use it](https://manual.cp2k.org/trunk/technologies/eigensolvers/dlaf.html).
420
420
This can lead to some surprising warnings and failures described below.
421
421
422
-
#####`CUSOLVER_STATUS_INTERNAL_ERROR` during initialization
422
+
#### `CUSOLVER_STATUS_INTERNAL_ERROR` during initialization
423
423
424
424
If you are heavily over-subscribing the GPU by running multiple ranks per GPU, you may encounter the following error:
425
425
@@ -437,7 +437,7 @@ DLAF_NUM_GPU_BLAS_HANDLES=1
437
437
DLAF_NUM_GPU_LAPACK_HANDLES=1
438
438
```
439
439
440
-
#####Warning about pika only using one worker thread
440
+
#### Warning about pika only using one worker thread
441
441
442
442
When running CP2K with multiple tasks per node and only one core per task, the initialization of DLA-Future may trigger the following warning:
443
443
@@ -449,11 +449,12 @@ you need or use --pika:ignore-process-mask to use all resources. Use
449
449
--pika:print-bind to print the thread bindings used by pika.
450
450
```
451
451
452
-
This warning is triggered because the runtime used by DLA-Future, [pika](https://pikacpp.org), should typically be used with more than one thread and indicates a configuration mistake.
452
+
This warning is triggered because the runtime used by DLA-Future, [pika](https://pikacpp.org),
453
+
should typically be used with more than one thread and indicates a configuration mistake.
453
454
However, if you are not using DLA-Future, the warning is harmless and can be ignored.
454
455
The warning cannot be silenced.
455
456
456
-
####DBCSR GPU scaling
457
+
### DBCSR GPU scaling
457
458
458
459
On the GH200 architecture, it has been observed that the GPU accelerated version of [DBCSR] does not perform optimally in some cases.
459
460
For example, in the `QS/H2O-1024` benchmark above, CP2K does not scale well beyond 2 nodes.
@@ -464,21 +465,21 @@ GPU acceleration on/off with an environment variable:
464
465
export DBCSR_RUN_ON_GPU=0
465
466
```
466
467
467
-
While GPU acceleration is very good on a small number of nodes, the CPU implementation scales better.
468
-
Therefore, for CP2K jobs running on a large number of nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
468
+
While GPU acceleration is very good on few nodes, the CPU implementation scales better.
469
+
Therefore, for CP2K jobs running on many nodes, it is worth investigating the use of the `DBCSR_RUN_ON_GPU`
469
470
environment variable.
470
471
471
-
Ssome niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
472
+
Some niche application cases such as the `QS_low_scaling_postHF` benchmarks only run efficiently with the CPU version
472
473
of DBCSR. Generally, if the function `dbcsr_multiply_generic` takes a significant portion of the timing report
473
474
(at the end of the CP2K output file), it is worth investigating the effect of the `DBCSR_RUN_ON_GPU` environment variable.
474
475
475
476
476
-
####CUDA grid backend with high angular momenta basis sets
477
+
### CUDA grid backend with high angular momenta basis sets
477
478
478
479
The CP2K grid CUDA backend is currently buggy on Alps. Using basis sets with high angular momenta ($l \ge 3$)
479
480
result in slow calculations, especially for force calculations with meta-GGA functionals.
480
481
481
-
As a workaround, you can you can disable CUDA acceleration fo the grid backend:
482
+
As a workaround, you can disable CUDA acceleration for the grid backend:
0 commit comments