You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SPC: allow counters to be attached solely through MPI_T and reduce overhead
- only make MCA parameters available if SPC is enabled
- do not compile SPC code if SPC is disabled
- move includes into ompi_spc.c
- allow counters to be enabled through MPI_T without setting MCA parameter
- inline counter update calls that are likely in the critical path
- fix test to succeed even if encountering invalid pvars
- move timer_[start|stop] to header and move attachment info into ompi_spc_t
There is no need to store the name in the ompi_spc_t struct too, we can use that space
for the attachment info instead to avoid accessing another cache line.
- make timer/watermark flags a property of the spc description
This is meant to making adding counters easier in the future by
centralizing the necessary information. By storing a copy of these flags
in the ompi_spc_t structure (without adding to its size) reduces
cache pollution for timer/watermark events.
- allocate ompi_spc_t objects with cache-alignment
This prevents objects from spanning multiple cache lines and thus
ensures that only one cache line is loaded per update.
- fix handling of timer and timer conversion
- only call opal_timer_base_get_cycles if necesary to reduce overhead
- Remove use of OPAL_UNLIKELY to improve code generated by GCC
It appears that GCC makes less effort in optimizing the unlikely path
and generates bloated code.
- Allocate ompi_spc_events statically to reduce loads in critical path
- duplicate comm_world only when dumping is requested
Signed-off-by: Joseph Schuchart <[email protected]>
0 commit comments