Header only measure library based on std::chrono and RDTSC instruction. You can choose thread or recursion safe implementation. Human readable or csv reporting is available.
Optionally QureryPerformanceCounter support (MSVC only). Additional backends and reporters can be defined.
Easy to use, high precision, can work in the live environment.
Requirements: C++11, CMake for unittest
MIT License, see LICENSE.txt
#include "dmeasure/dmeasure.h"
uint64_t MyFunction()
{
DMEASURE(MyFunction);
// or DMEASURE_S("title+with-special characters");
volatile uint64_t sum = 0;
for (uint64_t i = 0; i < 123456; ++i)
{
DMEASURE(Loop);
// or DMEASURE_RDTSC(Loop); if the precision is important
sum += i;
}
return sum;
}
int main()
{
MyFunction();
...
dmeasure::PrintMeasure();
return 0;
}output:
------------------------------------- cpp times --------------------------------------
Name Calls Total (ns) Average (ns)
--------------------------------------------------------------------------------------
MyFunction 1 4'294'900 4'294'900
Loop 123456 2'137'100 17
If you want to test your code in a separated environment, you should use the TimeIt function:
double x = 0.0;
double runtimeInSec = dmeasure::TimeIt([&x](){ x = sin(x); }, 5, 123456).GetMin();
std::cout << "runtime of sin(x) = " << dmeasure::MeasureUtils::TimeToStr(runtimeInSec) << std::endl;If your function is recursive and/or called from a multithread environment you must use an other form
void MyFunction()
{
// the second parameter:
// RSafe: recursion safe,
// TSafe: thread safe,
// TRSafe: thread and recursion safe implementation
DMEASURE_SAFE_S("Measure with thread and recursion safe implementation", TRSafe);
....
}or the same in expanded form
void MyFunction()
{
// the "static" is important!
static dmeasure::Measure::TRSafe::MeasureRecord record("Measure with scope");
{
dmeasure::Measure::TRSafe::Scope scope(&record);
....
}
}If the measurement title is dynamically generated then the expanded form is always recommended
void MyFunction(int i)
{
// GetDynamicRecord is very slow, don't call it from time critical code!
const std::string dinamicGeneratedTitle = std::string("MyFunction_") + std::to_string(i);
const auto record = dmeasure::Measure::TSafe::GetDynamicRecord(dinamicGeneratedTitle.c_str());
for (int j=0; j<i; ++j)
{
dmeasure::Measure::TSafe::Scope scope(record);
...
}
}Currently three measure techniques are supported: CppMeasure, QPCMeasure and RdtscMeasure.
You can choose one of them by defining DMEASURE_TYPE before include measure.h,
the default is the CppMeasure, which based on std::chrono::high_resolution_clock
This is based on std::chrono::high_resolution_clock and it is available in any c++11 compiler.
As you can see in measure.h, this is the default measure technology,
but you can use the CppMeasure functions explicitly if you want:
void MyFunction()
{
DMEASURE_CPP(MyFunctionTime);
....
}or
void MyFunction()
{
static dmeasure::CppMeasure::TRSafe::MeasureRecord record("Measure with scope");
{
dmeasure::CppMeasure::TRSafe::Scope scope(&record);
....
}
}This is the lowest level measurement service,
based on the Read Time-Stamp Counter instruction of the processor,
with a typical frequency of 2-4 Ghz.
This is the fastest option, but only available in x86/x64 platform.
void MyFunction()
{
DMEASURE_RDTSC(MyFunctionTime);
....
}or
void MyFunction()
{
static dmeasure::RdtscMeasure::TRSafe::MeasureRecord record("Measure with scope");
{
dmeasure::RdtscMeasure::TRSafe::Scope scope(&record);
....
}
}If you are using Visual Studio, you can use QPCMeasure,
which is based on QueryPerformanceCounter and QueryPerformanceFrequency.
The typical frequency of the QPC is 10 Mhz. Every function has a QPC equivalent:
void MyFunction()
{
DMEASURE_QPC(MyFunctionTime);
// or DMEASURE_QPC_S("title+with-special characters");
....
}or
void MyFunction()
{
static dmeasure::QPCMeasure::TRSafe::MeasureRecord record("Measure with scope");
{
dmeasure::QPCMeasure::TRSafe::Scope scope(&record);
....
}
}- Measurement is never perfectly accurate
- The measuring code always slows down the program a bit, although this is usually insignificant.
- The result of the measurement depends on many factors: the type of processor, the state of the cache, other programs, the behaviour of the operating system, the speed of the memory, ...
- In many cases, speed depends on memory access, not processor speed.
- Always close all applications before measuring
- Always measure the optimised code
- Before measuring, turn off the computer's power saving mode
- If your processor has "Efficient-core", it may be slower than the normal core, and this will confuse the measurement.
- Some processors may vary the frequency of the processor, this will confuse the measurement.
- In a multithreaded environment, not using thread-safe functions is acceptable in most cases, because collisions are very rare. (this is the smallest problem...)
- In a multithreaded environment, the measuring code measures the total runtime of the function, not just the active time.
- If your function is recursive, you MUST use recursion safe (and thread and recursion safe) functions, the regular functions are NOT working!
- Usually, it is not the specific numerical values that should be considered, but their ratio and which code fragments are the most costly.