- 
                Notifications
    You must be signed in to change notification settings 
- Fork 928
Timers
A lot of discussions were made in the issue#3003 and related issues/PRs regarding timers used in Open MPI. This page marshals the information and explains the current implementation as of March, 2019 for Open MPI developers.
In this page, the word "timer" is defined as a function to give the current time. The time is expressed as an amount of time since a point in the past. In the MPI world, it can be used to implement the MPI_WTIME routine.
Several timers are available depending on systems. Each timer has its characteristics.
Time should increase monotonically. In other words, time should not go back into the past.
If a timer is implemented using a CPU cycle counter and a system has multiple cores, time may not increase monotonically when a process is migrated to another core, especially a core on another socket.
Time should increase at a constant rate compared to real time.
If a timer is implemented using a CPU cycle counter, time may not increase at a constant rate when the frequency of the CPU changes.
How small the tick is. If a timer is used for the MPI_WTIME routine, the resolution is reflected to the MPI_WTICK routine.
How much time is needed to get the current time.
Whether the timer is affected by a system time correction, like one by a NTP daemon. If affected, time may go back into the past and may jump discontinuously.
Whether the timer is synchronized among compute nodes. This is reflected to the MPI_WTIME_IS_GLOBAL attribute key.
Many hardware architectures provide high resolution and low overhead timers.
If a hardware-native timer is based on a CPU cycle counter, we should pay attention to core migration of a process and CPU frequency change.
The x86-64 architecture provides the RDTSCP and the RDTSC instructions, which read the TSC (time stamp counter). They are complex.
The TSC is implemented differently across CPU models.
| TSC type | constant rate tick? | monotonic time? | 
|---|---|---|
| (original) TSC | no | per core (?) | 
| constant TSC | yes | per core (?) | 
| invariant TSC | yes | per socket (?) | 
A problem of the invariant TSC is that the instruction to determine the frequency is privileged.
The Armv8-A architecture provides the Generic Timer and the Generic Timer feature includes a system counter.
The system counter in the Generic Timer:
- Is a system level. Therefore all CPU cores in a compute node see the same counter.
- Measures the passing of time in real-time.
- Increases at a fixed frequency, typically in the range 1-50MHz, except in lower-power operating modes. The CNTFRQ_EL0register holds a copy of the current clock frequency.
- Starts operating from zero.
- Can be obtained by reading the CNTVCT_EL0register.
See ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile.
...
Keyword: time base facility
The SPARC-V9 architecture provides the TICK register.
The counter field of the TICK register:
- Is a 63-bit counter that counts CPU clock cycles.
- Can be read by the RDTICKinstruction.
See The SPARC Architecture Manual, Version 9
Usually an OS provides library functions to get the current time.
Software-managed timers may be affected by a system time correction, like one by a NTP daemon.
The clock_gettime function returns the current time and the clock_getres returns the resolution (precision). The first argument clock_id is used to select a type of a clock. For example, CLOCK_REALTIME represents the clock measuring real time for the system since the Epoch. This clock is affected by discontinuous jumps in the system time. CLOCK_MONOTONIC represents the monotonic clock for the system since an unspecified point in the past. This clock is not affected by discontinuous jumps in the system time.
The clock_gettime and the clock_getres functions are defined in POSIX.1-2001 and later. OS X has problem with clock_gettime?
Some OSes implement these functions as system calls and therefore they are high overhead. GNU/Linux implements these functions using vDSO on some architectures to avoid the overhead.
The clock_gettime function returns the current time, expressed as seconds and microseconds since the Epoch. The clock may be affected by discontinuous jumps in the system time.
The gettimeofday function is defined in POSIX.1-2001 but is marked as obsolete in POSIX.1-2008.
Some OSes implement this function as a system call and therefore it is high overhead. GNU/Linux implements this function using vDSO on some architectures to avoid the overhead.
The times function returns the CPU time spent executing instructions of the calling process and the CPU time spent in the system while executing tasks on behalf of the calling process.
The times function are defined in POSIX.1-2001 and later.
Timers are used in several places in the Open MPI code.
The MPI_WTIME routine returns an elapsed wall-clock time since some time in the past. The MPI_WTICK routine returns the resolution of the MPI_WTIME routine.
- Accuracy is important.
- High resolution and low overhead are better.
- The values should not be affected by a system time correction.
We need to trip the event library at some interval in the opal_progress function.
- Accuracy and high resolution are not important.
- Low overhead is important.
...
...
- 
OPAL_HAVE_CLOCK_GETTIME
 If theclock_gettimefunction is provided the OS, the value is 1. Otherwise, the value is 0. This macro is defined in$build_dir/opal/include/opal_config.h.
- 
OPAL_TIMER_MONOTONIC
 If theopal_sys_timer_get_cyclesfunction always returns monotonically increasing values in a node, the value is 1. Otherwise, the value is 0. This macro is once defined inopal/include/opal/sys/timer.has 1 and is redefined as 0 inopal/include/opal/sys/*/timer.hfor some architectures.
- 
OPAL_HAVE_SYS_TIMER_GET_CYCLES
 If theopal_sys_timer_get_cyclesfunction is implemented for the architecture, the value is 1. Otherwise, the value is 0. This macro is defined inopal/include/opal/sys/*/timer.h.
- 
OPAL_HAVE_SYS_TIMER_IS_MONOTONIC
 If theopal_sys_timer_is_monotonicfunction is implemented for the architecture, the value is 1. For some architectures, this macro is defined as 1 (with the architecture-dependantopal_sys_timer_is_monotonicfunction) inopal/include/opal/sys/*/timer.h. For other architectures, this macros is defined as 1 (with theopal_sys_timer_is_monotonicfunction which returns the value ofOPAL_TIMER_MONOTONIC) inopal/include/opal/sys/timer.h.
These macros are currently used in opal/mca/timer/linux/.
- 
opal_sys_timer_get_cycles
 This function returns a cycle count. The cycle count may not be the cycle count of the CPU itself, if there is another sufficiently close counter with better behavior characteristics (like the Time Base counter on many Power/PowerPC platforms). This function is defined only if the value of theOPAL_HAVE_SYS_TIMER_GET_CYCLESmacro is 1.
- 
opal_sys_timer_freq
 This function returns the frequency of the cycle counter in use, NOT the frequency of the main CPU. This function is currently defined only forarm64.
- 
opal_sys_timer_is_monotonic
 This function returns whether theopal_sys_timer_get_cyclesfunction returns monotonic time. This function is always defined because the default function is defined inopal/include/opal/sys/timer.h.
These functions are defined in opal/include/opal/sys/*/timer.h if available.
These functions are currently used in opal/mca/timer/linux/.
- 
OPAL_TIMER_CYCLE_NATIVE
 If theopal_timer_base_get_cyclefunction is implemented directly using an architecture-dependent cycle counter or computed from some other data (such as a high-resolution timer), the value is 1. Otherwise, the value is 0.
- 
OPAL_TIMER_CYCLE_SUPPORTED
 If theopal_timer_base_get_cyclefunction is implemented for the OS, the value is 1. Otherwise, the value is 0.
- 
OPAL_TIMER_USEC_NATIVE
 ...
- 
OPAL_TIMER_USEC_SUPPORTED
 If theopal_timer_base_get_usecfunction is implemented for the OS, the value is 1. Otherwise, the value is 0.
These macros are defined in opal/mca/timer/*/timer_*.h.
- 
opal_timer_base_get_cycles
 This function returns a cycle count. The cycle count may not be the cycle count of the CPU itself, if there is another sufficiently close counter with better behavior characteristics (like the Time Base counter on many Power/PowerPC platforms).
- 
opal_timer_base_get_usec
 This function returns the current time in micro second.
- 
opal_timer_base_get_freq
 This function returns the frequency of the cycle counter in use, NOT the frequency of the main CPU.
These functions are defined in opal/mca/timer/*/timer_*.h (as inline) or opal/mca/timer/*/timer_*_component.c (as non-inline) if available.
- 
opal_timer_linux_get_cycles_clock_gettime
 ...
- 
opal_timer_linux_get_usec_clock_gettime
 ...
- 
opal_timer_linux_get_cycles_sys_timer
 ...
- 
opal_timer_linux_get_usec_sys_timer
 ...
- 
mca_timer_base_monotonic
 ...
- Originally MPI_WTIMEwas implemented usinggettimeofday.
- In the commit ee75c45ec5, it was changed to use opal_timer_base_get_usecifOPAL_TIMER_USEC_NATIVEis 1. In this instance,OPAL_TIMER_USEC_NATIVEfor Linux was 0.
- In the PR#285, OPAL_TIMER_USEC_NATIVEfor Linux was changed toOPAL_HAVE_SYS_TIMER_GET_CYCLESandMPI_WTIMEwas changed to useopal_timer_base_get_cyclesifOPAL_TIMER_CYCLE_NATIVEis 1. By this commit,MPI_WTIMEwas broken in the case that the CPU frequency changes during MPI program execution.
- In the issue#3003, the problem was reported.
- In the PR#3184, MPI_WTIMEwas changed to usegettimeofdayas a workaround.
- In the PR#3201, MPI_WTIMEwas changed to useclock_gettimeon Linux.