Skip to content

Commit b903e88

Browse files
WeiqunZhangatmyers
andauthored
Documentation for Profiling: Hot Spots and Load Balance (#3622)
Add more documentation on identifying hot spots and load imbalance in profiling results. --------- Co-authored-by: Andrew Myers <[email protected]>
1 parent cc4c4ff commit b903e88

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

Docs/sphinx_documentation/source/AMReX_Profiling_Tools.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,47 @@ it is also recommended to wrap any ``BL_PROFILE_TINY_FLUSH();`` calls in
9393
informative ``amrex::Print()`` lines to ensure accurate identification of each
9494
set of timers.
9595

96+
Hot Spots and Load Balance
97+
~~~~~~~~~~~~~~~~~~~~~~~~~~
98+
99+
The output of TinyProfiler can help us to identify hot spots. For example,
100+
the following output shows the top three hot spots of a linear solver test
101+
running on 4 MPI processes.
102+
103+
.. highlight:: console
104+
105+
::
106+
107+
--------------------------------------------------------------------------------------------
108+
Name NCalls Excl. Min Excl. Avg Excl. Max Max %
109+
--------------------------------------------------------------------------------------------
110+
MLPoisson::Fsmooth() 560 0.4775 0.4793 0.4815 34.97%
111+
MLPoisson::Fapply() 114 0.1103 0.113 0.1167 8.48%
112+
FabArray::Xpay() 109 0.1 0.1013 0.1038 7.54%
113+
114+
In this test, there are 16 boxes evenly distributed among 4 MPI processes. The
115+
output above shows that the load is perfectly balanced. However, if the load
116+
is not balanced, the results can be very different and sometimes
117+
misleading. For example, if we put 2, 2, 6 and 6 boxes on processes 0, 1, 2
118+
and 3, respectively, the top three hot spots now include two MPI
119+
communication functions, ``FillBoundary`` and ``ParallelCopy``.
120+
121+
.. highlight:: console
122+
123+
::
124+
125+
--------------------------------------------------------------------------------------------
126+
Name NCalls Excl. Min Excl. Avg Excl. Max Max %
127+
--------------------------------------------------------------------------------------------
128+
FillBoundary_finish() 607 0.01568 0.3367 0.6574 41.97%
129+
MLPoisson::Fsmooth() 560 0.2133 0.4047 0.5973 38.13%
130+
FabArray::ParallelCopy_finish() 231 0.002977 0.09748 0.1895 12.10%
131+
132+
The reason that the MPI communication appears slow is that the lightly
133+
loaded processes have to wait for messages sent by the heavily loaded
134+
processes. See also :ref:`sec:profopts` for a diagnostic option that may
135+
provide more insight on the load imbalance.
136+
96137
.. _sec:full:profiling:
97138

98139
Full Profiling

0 commit comments

Comments
 (0)