Skip to content

Commit 67481d5

Browse files
authored
Add more infrastructure for timing
Adds a new module for timing under infra/. This module builds upon the Pacer library, but extends it with support for specifying timing levels and automatically adding Kokkos fences. It also includes more advanced timer options that are useful in special circumstances. Finally, it adds integration with NVTX ranges on NVIDIA GPUs for profiling with vendor-specific tools.
2 parents 395f2a4 + c04bb97 commit 67481d5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+481
-102
lines changed

components/omega/OmegaBuild.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ macro(common)
2929
option(OMEGA_DEBUG "Turn on error message throwing (default OFF)." OFF)
3030
option(OMEGA_LOG_FLUSH "Turn on unbuffered logging (default OFF)." OFF)
3131
option(OMEGA_TEST_CDASH "Turn on CDash support (default ON)." ON)
32+
option(OMEGA_EXTERNAL_PROF "Integration of Omega timers with external profiling tools (default OFF)." OFF)
3233

3334
if("${OMEGA_BUILD_TYPE}" STREQUAL "Debug" OR "${OMEGA_BUILD_TYPE}" STREQUAL "DEBUG")
3435
set(OMEGA_DEBUG ON)

components/omega/configs/Default.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
Omega:
2+
Timing:
3+
Level: 2
4+
AutoFence: true
5+
TimingBarriers: false
6+
PrintAllRanks: false
27
TimeIntegration:
38
CalendarType: No Leap
49
TimeStepper: Forward-Backward
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
(omega-dev-timing)=
2+
3+
# Timing
4+
5+
The timing infrastructure builds upon the E3SM Pacer library for wall clock timing.
6+
Pacer integrates, whenever possible, platform-specific marker APIs.
7+
8+
# Initialization
9+
10+
To initialize the Pacer library call
11+
```c++
12+
Pacer::initialize(Comm);
13+
```
14+
where `Comm` is an MPI communicator.
15+
16+
# Writing-out timing data
17+
18+
To write out timing data call
19+
```c++
20+
Pacer::print(FilePrefix, PrintAllRanks);
21+
```
22+
where `FilePrefix` is the prefix for all output files and an optional argument `PrintAllRanks` determines
23+
if all MPI ranks should print their data.
24+
25+
# Finalization
26+
27+
To finalize the Pacer library call
28+
```c++
29+
Pacer::finalize();
30+
```
31+
32+
# Basic timer use
33+
34+
To time a region of code enclose it with calls to `Pacer::start` and `Pacer::stop` functions, like so
35+
```c++
36+
Pacer::start(Name, Level);
37+
// region of code to be timed
38+
Pacer::stop(Name, Level);
39+
```
40+
These functions take a string `Name` and a non-negative integer `Level`.
41+
The added timer will be active only if the timing level set in the config file is greater or equal to `Level`.
42+
43+
# Advanced timing functions
44+
45+
## Conditional MPI barriers
46+
47+
Properly timing MPI communication might require inserting MPI barriers. It might be desirable
48+
to remove those barriers in production runs. Pacer provides a function
49+
```c++
50+
Pacer::timingBarrier(TimerName, Level, Comm)
51+
```
52+
which adds an MPI barrier and puts a timer around it using the communicator `Comm`.
53+
Whether barriers added by this function are actually called can be controlled by the following functions
54+
```c++
55+
Pacer::enableTimingBarriers();
56+
Pacer::disableTimingBarriers();
57+
```
58+
59+
## Adding parent prefixes
60+
61+
It might be desirable to add a prefix to a group of timers based on their parent timer.
62+
To enable this Pacer provides the `addParentPrefix()` and `removeParentPrefix()` functions.
63+
For example, the following call sequence
64+
```c++
65+
Pacer::start("Parent", 0);
66+
Pacer::addParentPrefix();
67+
68+
Pacer::start("Child", 0);
69+
Pacer::stop("Child", 0);
70+
71+
Pacer::removeParentPrefix();
72+
Pacer::start("Parent", 0);
73+
```
74+
results in output where the "Child" timer shows up as "Parent:Child" in the output files.
75+
This is useful when timers are added inside general purpose routines, that are called
76+
from many places in the code, such as halo exchange.
77+
78+
## Disabling timers
79+
80+
It might be desirable programmatically disable or enable timing. To allow that, Pacer provides
81+
the `disableTimers()` and `enableTimers()` functions. In the following call sequence
82+
```c++
83+
Pacer::disableTiming();
84+
85+
Pacer::start("Timer1", 0);
86+
Pacer::stop("Timer1", 0);
87+
88+
Pacer::enableTiming();
89+
90+
Pacer::start("Timer2", 0);
91+
Pacer::stop("Timer2", 0);
92+
```
93+
`Timer1` is not timed while `Timer2` is. This is useful mainly when done conditionally.
94+
For example, the first call to some function takes much longer than subsequent calls,
95+
and having a detailed timing breakdown of the first call is not important. In that case,
96+
it might be desirable to have a separate timer for the first call with its child timers disabled.

components/omega/doc/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ userGuide/Reductions
4949
userGuide/Tracers
5050
userGuide/TridiagonalSolvers
5151
userGuide/VertCoord
52+
userGuide/Timing
5253
```
5354

5455
```{toctree}
@@ -88,6 +89,7 @@ devGuide/Reductions
8889
devGuide/Tracers
8990
devGuide/TridiagonalSolvers
9091
devGuide/VertCoord
92+
devGuide/Timing
9193
```
9294

9395
```{toctree}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
(omega-user-timing)=
2+
3+
# Timing
4+
5+
Omega uses the Pacer library for timing the code and incorporates timers around
6+
various parts of the code.
7+
8+
By default, the timing output is written to two files: `omega.summary` and `omega.timing0`.
9+
The `omega.summary` file presents accumulated timing statistics across all MPI ranks.
10+
The `omega.timing.0` show timing result only from the first rank.
11+
12+
There are four parameters that are set by the user in the input configuration
13+
file that control the timing behavior. These are:
14+
```yaml
15+
Timing:
16+
Level: 2
17+
AutoFence: true
18+
TimingBarriers: false
19+
PrintAllRanks: false
20+
```
21+
The `Level` parameter is a non-negative integer that determines the granularity of timers.
22+
Increasing it will turn on more timers.
23+
Having more timers provides more detailed information, but it also comes with increased overhead,
24+
and may be counter-productive if a high-level look at model performance is sufficient.
25+
26+
The `Autofence` Boolean option determines if Kokkos fences are automatically added before every timer call.
27+
This option **needs** to be true for accurate timing using Omega timers on GPU-based systems.
28+
However, there are circumstances when turning off automatic fences is useful.
29+
The main use case is using external profiling tools.
30+
Another one is measuring the overhead of automatic synchronization for very high timing levels.
31+
32+
The `TimingLevel` Boolean option determines if MPI barriers are added before or after certain timers.
33+
Adding barriers may be necessary to properly measure communication time, but it can add top much overhead in
34+
production runs.
35+
36+
The `PrintAllRanks` Boolean option determines if all ranks should print their timing information. If this
37+
option is set to `true` the output will include additional files `omega.timing.i` with the
38+
timing data from rank `i`.
39+
40+
## Integration with external profiling tools
41+
42+
External profilers often include APIs to mark regions of code for detailed profiling.
43+
On some platforms, Omega timers automatically add these annotations.
44+
Currently, this is only implemented on systems with NVIDIA GPUs using NVTX.
45+
46+
This allows, for example, to use the Nsight Compute kernel profiler to obtain
47+
detailed kernel information for all kernels enclosed in the `Tend:computeVelocityTendencies`
48+
Omega timer.
49+
```bash
50+
mpirun -np 1 ncu --nvtx --nvtx-include "Tend:computeVelocityTendencies/" omega.exe
51+
```

components/omega/external/CMakeLists.txt

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,9 +94,23 @@ if (NOT TARGET pacer)
9494

9595
target_link_libraries(
9696
pacer
97-
PUBLIC
97+
PRIVATE
9898
gptl
99+
Kokkos::kokkos
99100
)
101+
102+
target_compile_definitions(
103+
pacer
104+
PRIVATE
105+
PACER_HAVE_KOKKOS=1
106+
)
107+
if (OMEGA_EXTERNAL_PROF)
108+
target_compile_definitions(
109+
pacer
110+
PRIVATE
111+
PACER_ADD_RANGES=1
112+
)
113+
endif()
100114
endif()
101115

102116
# Add the parmetis and related libraries

0 commit comments

Comments
 (0)