Skip to content

Commit 1f5f659

Browse files
committed
Add timing docs
1 parent 6455f54 commit 1f5f659

File tree

3 files changed

+149
-0
lines changed

3 files changed

+149
-0
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
(omega-dev-timing)=
2+
3+
# Timing
4+
5+
The timing infrastructure builds upon the E3SM Pacer library for wall clock timing.
6+
Pacer integrates, whenever possible, platform-specific marker APIs.
7+
8+
# Initialization
9+
10+
To initialize the Pacer library call
11+
```c++
12+
Pacer::initialize(Comm);
13+
```
14+
where `Comm` is an MPI communicator.
15+
16+
# Writing-out timing data
17+
18+
To write out timing data call
19+
```c++
20+
Pacer::print(FilePrefix, PrintAllRanks);
21+
```
22+
where `FilePrefix` is the prefix for all output files and an optional argument `PrintAllRanks` determines
23+
if all MPI ranks should print their data.
24+
25+
# Finalization
26+
27+
To finalize the Pacer library call
28+
```c++
29+
Pacer::finalize();
30+
```
31+
32+
# Basic timer use
33+
34+
To time a region of code enclose it with calls to `Pacer::start` and `Pacer::stop` functions, like so
35+
```c++
36+
Pacer::start(Name, Level);
37+
// region of code to be timed
38+
Pacer::stop(Name, Level);
39+
```
40+
These functions take a string `Name` and a non-negative integer `Level`.
41+
The added timer will be active only if the timing level set in the config file is greater or equal to `Level`.
42+
43+
# Advanced timing functions
44+
45+
## Conditional MPI barriers
46+
47+
Properly timing MPI communication might require inserting MPI barriers. It might be desirable
48+
to remove those barriers in production runs. Pacer provides a function
49+
```c++
50+
Pacer::timingBarrier(TimerName, Level, Comm)
51+
```
52+
which adds an MPI barrier and puts a timer around it using the communicator `Comm`.
53+
Whether barriers added by this function are actually called can be controlled by the following functions
54+
```c++
55+
Pacer::enableTimingBarriers();
56+
Pacer::disableTimingBarriers();
57+
```
58+
59+
## Adding parent prefixes
60+
61+
It might be desirable to add a prefix to a group of timers based on their parent timer.
62+
To enable this Pacer provides the `addParentPrefix()` and `removeParentPrefix()` functions.
63+
For example, the following call sequence
64+
```c++
65+
Pacer::start("Parent", 0);
66+
Pacer::addParentPrefix();
67+
68+
Pacer::start("Child", 0);
69+
Pacer::stop("Child", 0);
70+
71+
Pacer::removeParentPrefix();
72+
Pacer::start("Parent", 0);
73+
```
74+
results in output where the "Child" timer shows up as "Parent:Child" in the output files.
75+
This is useful when timers are added inside general purpose routines, that are called
76+
from many places in the code, such as halo exchange.
77+
78+
## Disabling timers
79+
80+
It might be desirable programmatically disable or enable timing. To allow that, Pacer provides
81+
the `disableTimers()` and `enableTimers()` functions. In the following call sequence
82+
```c++
83+
Pacer::disableTiming();
84+
85+
Pacer::start("Timer1", 0);
86+
Pacer::stop("Timer1", 0);
87+
88+
Pacer::enableTiming();
89+
90+
Pacer::start("Timer2", 0);
91+
Pacer::stop("Timer2", 0);
92+
```
93+
`Timer1` is not timed while `Timer2` is. This is useful mainly when done conditionally.
94+
For example, the first call to some function takes much longer than subsequent calls,
95+
and having a detailed timing breakdown of the first call is not important. In that case,
96+
it might be desirable to have a separate timer for the first call with its child timers disabled.

components/omega/doc/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ userGuide/Reductions
4949
userGuide/Tracers
5050
userGuide/TridiagonalSolvers
5151
userGuide/VertCoord
52+
userGuide/Timing
5253
```
5354

5455
```{toctree}
@@ -88,6 +89,7 @@ devGuide/Reductions
8889
devGuide/Tracers
8990
devGuide/TridiagonalSolvers
9091
devGuide/VertCoord
92+
devGuide/Timing
9193
```
9294

9395
```{toctree}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
(omega-user-timing)=
2+
3+
# Timing
4+
5+
Omega uses the Pacer library for timing the code and incorporates timers around
6+
various parts of the code.
7+
8+
By default, the timing output is written to two files: `omega.summary` and `omega.timing0`.
9+
The `omega.summary` file presents accumulated timing statistics across all MPI ranks.
10+
The `omega.timing.0` show timing result only from the first rank.
11+
12+
There are four parameters that are set by the user in the input configuration
13+
file that control the timing behavior. These are:
14+
```yaml
15+
Timing:
16+
Level: 2
17+
AutoFence: true
18+
TimingBarriers: false
19+
PrintAllRanks: false
20+
```
21+
The `Level` parameter is a non-negative integer that determines the granularity of timers.
22+
Increasing it will turn on more timers.
23+
Having more timers provides more detailed information, but it also comes with increased overhead,
24+
and may be counter-productive if a high-level look at model performance is sufficient.
25+
26+
The `Autofence` Boolean option determines if Kokkos fences are automatically added before every timer call.
27+
This option **needs** to be true for accurate timing using Omega timers on GPU-based systems.
28+
However, there are circumstances when turning off automatic fences is useful.
29+
The main use case is using external profiling tools.
30+
Another one is measuring the overhead of automatic synchronization for very high timing levels.
31+
32+
The `TimingLevel` Boolean option determines if MPI barriers are added before or after certain timers.
33+
Adding barriers may be necessary to properly measure communication time, but it can add top much overhead in
34+
production runs.
35+
36+
The `PrintAllRanks` Boolean option determines if all ranks should print their timing information. If this
37+
option is set to `true` the output will include additional files `omega.timing.i` with the
38+
timing data from rank `i`.
39+
40+
## Integration with external profiling tools
41+
42+
External profilers often include APIs to mark regions of code for detailed profiling.
43+
On some platforms, Omega timers automatically add these annotations.
44+
Currently, this is only implemented on systems with NVIDIA GPUs using NVTX.
45+
46+
This allows, for example, to use the Nsight Compute kernel profiler to obtain
47+
detailed kernel information for all kernels enclosed in the `Tend:computeVelocityTendencies`
48+
Omega timer.
49+
```bash
50+
mpirun -np 1 ncu --nvtx --nvtx-include "Tend:computeVelocityTendencies/" omega.exe
51+
```

0 commit comments

Comments
 (0)