Skip to content

Releases: jmuehlig/perf-cpp

v0.13.0

28 Mar 22:16

Choose a tag to compare

  • Header Restructuring: Headers have been reorganized into counter/, sample/, metric/, analyzer/, and util/ subdirectories and renamed from .h to .hpp. The previous .h headers remain as forwarding includes with deprecation notices and will be removed in v1.0.
  • Breaking: EventCounter::add() and start() (including Sampler::start() and all multi-thread/core variants) now return void instead of bool. Errors are communicated via exceptions; the return values were unused.
  • Compile Flag for AUX Buffer Support: Added PERFCPP_NO_SAMPLE_AUX compile flag to disable auxiliary buffer sampling on systems with Linux kernels older than 5.5 that lack PERF_SAMPLE_AUX support. Thanks to @rconnorlawson.
  • Perf File Export: Fixed bugs in perf format when materializing samples into file that can be read via perf [mem] report.
  • NMI Watchdog Detection: Hardware counter detection now accounts for the NMI watchdog permanently consuming one hw-PMU counter, fixing incorrect counter counts on systems with the watchdog enabled.
  • RAPL Power Metrics: Added built-in watts-pkg, watts-cores, and watts-ram metrics for measuring power consumption via RAPL energy counters (see the documentation).
  • Conan Package: Added Conan 2.x package recipe for easier integration.
  • Config Setter Naming: Standardized Config setter naming — setters no longer use the is_ prefix (e.g., pinned(bool) instead of is_pinned(bool)). The old is_pinned(bool) and is_debug(bool) setters are deprecated and will be removed in v1.0.
  • SampleResult CSV Export: Added SampleResult::to_csv() returning a std::string, complementing the existing file-based overload.
  • Per-Element Results: Added result_of_thread(thread_id), result_of_process(process_id), and result_of_core(core_id) to query individual results from MultiThreadEventCounter, MultiProcessEventCounter, and MultiCoreEventCounter. Process and core variants return std::optional<CounterResult> since the ID may not be present.
  • Documentation: Rewrote and restructured all documentation pages for consistency, conciseness, and correctness. Documentation is now hosted at jmuehlig.github.io/perf-cpp.

v0.12.5

22 Dec 08:48

Choose a tag to compare

  • Bugfix: The library could not compile for specific Linux kernels (see #10).
  • Symbol Translation: Improved translation from instruction pointer to symbol.

v0.12.4

28 Oct 19:18

Choose a tag to compare

  • Bugfix: The library crashed when events loaded from an external CSV file contained empty spaces (see #8). Thanks to @Liteom.
  • Bugfix: The library could not compile for specific Linux kernels not providing PERF_MEM_LVLNUM_UNC and PERF_MEM_SNOOPX_PEER (see #7). Thanks to @Raphalex46 for pointing out.
  • Perf Data Export: Samples can now be written as perf data files using Sampler::to_perf_file(), enabling analysis with standard perf ecosystem tools like perf report (see the documentation). Note that this feature is experimental.

v0.12.3

22 Aug 17:07

Choose a tag to compare

This update simplifies the handling of counter definitions by introducing a default instance.

  • Default Counter Definitions: Supplying a user-defined perf::CounterDefinition to each perf::EventCounter or perf::Sampler is no longer required. If none is provided, a default instance is used automatically. Custom definitions now extend the default set of events instead of duplicating them.

v0.12.2

19 Jul 07:53

Choose a tag to compare

  • Metric Functions: Metrics now support built-in functions such as ratio(A, B) and sum(A, B, C, ...), enabling more expressive and reusable formulas (see the documentation).
  • Optimized Compile-time Event Injection: The generated runtime event registration class is now only created if it does not already exist, reducing unnecessary recompilation.
  • Improved Live Event Accuracy: Live event values now account for partial runtime durations via time scaling, improving accuracy when counters were not active for the full measurement window.

v0.12.1

29 Jun 17:44

Choose a tag to compare

  • Automatic Event Discovery on ARM: Hardware event types are now automatically detected on ARM architectures when initializing a perf::CounterDefinition instance.
  • Hardware Counter Introspection: The number of available physical performance counters per logical core, along with the number of events each counter can multiplex, is now determined automatically when creating a perf::EventCounter.
  • Recursive and Scientific Metrics: Metric expressions can now reference other metrics recursively. Support for scientific notation (e.g., 1e5) in formula-based metrics has also been added.

v0.12.0

25 Jun 06:35

Choose a tag to compare

This release expands symbolic analysis capabilities, introduces FlameGraph generation, and improves hardware event management through both runtime and compile-time support.

  • Symbol Resolution: Instruction pointers captured during sampling can now be resolved to function names using perf::SymbolResolver (see the documentation).
  • FlameGraph Export: Sampling data can be converted into formats compatible with visualization tools such as Brendan Gregg's FlameGraph, Speedscope, and flamegraph.com using perf::analyzer::FlameGraphGenerator (see the documentation).
  • Built-in Event Definitions: A set of x86-specific hardware events is now bundled in events/x86 and can be loaded at runtime using perf::CounterDefinition. This serves as an alternative to the make perf-list target.
  • Compile-time Event Injection: Processor-specific event definitions can now be embedded directly at build time by configuring CMake with -DGEN_PROCESSOR_EVENTS=1. These are immediately available via perf::CounterDefinition (see the documentation).
  • Automatic Event Discovery: Additional event types–including RAPL energy counters and AMD IO MMU events–are now automatically detected during the creation of a perf::CounterDefinition instance (issue #6).

v0.11.1

27 May 20:06

Choose a tag to compare

  • Unified the behaviour of the time and timestamp fields in the sampling API, removing discrepancies between the two.

v0.11.0

26 May 17:50

Choose a tag to compare

This version rolls out a redesigned sampling API.
Recorded data are now grouped into dedicated sub-structures (such as Metadata, InstructionExecution, and DataAccess) inside perf::Sample (see the sampling documentation).

The previous flat API is still available but deprecated and will be removed in v0.12.

  • New Sampling Interface: Work with clearly separated sample sections, exposing additional AMD IBS fields that are not surfaced by the perf_event_open records.
  • Explicit Latency Attributes: Vendor-specific latency signals–cache-access on Intel and cache-miss on AMD–are now surfaced as distinct fields.
  • Heterogeneous-core Support: Sampling can target multiple PMU domains (e.g., cpu_core and cpu_atom) on hybrid Intel processors.

v0.10.0

15 Feb 10:28

Choose a tag to compare

  • New feature: The auxiliary event is added automatically if required by the (Intel-) hardware (see the documentation).
  • New feature: The Memory Access Analyzer allows the description of complex data objects and maps sampled memory addresses in order to report latency and access information (see the documentation).
  • The number of pages for the sampling buffer is now aligned automatically if the number is not configured properly, i.e., a power of two plus one page for the header.
  • New feature: Copy sampled data from the mmap-ed perf buffer into the application-level buffer whenever the buffer comes close to full (see the documentation).