Skip to content

Commit 7f25ad5

Browse files
authored
Merge pull request cms-sw#42665 from Dr15Jones/AllocMonitor
Added AllocMonitor facility
2 parents 7faa1e9 + 34d452b commit 7f25ad5

19 files changed

+1684
-0
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<use name="FWCore/Utilities"/>
2+
<export>
3+
<lib name="1"/>
4+
</export>

PerfTools/AllocMonitor/README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# PerfTools/AllocMonitor Description
2+
3+
## Introduction
4+
5+
This package works with the PerfTools/AllocMonitorPreload package to provide a general facility to watch allocations and deallocations.
6+
This is accomplished by using LD_PRELOAD with libPerfToolsAllocMonitorPreload.so and registering a class inheriting from `AllocMonotorBase`
7+
with `AllocMonitorRegistry`. The preloaded library puts in proxies for the C and C++ allocation methods (and forwards the calls to the
8+
original job methods). These proxies communicate with `AllocMonitorRegistry` which, in turn, call methods of the registered monitors.
9+
10+
## Extending
11+
12+
To add a new monitor, one inherits from `cms::perftools::AllocMonitorBase` and overrides the `allocCalled` and
13+
`deallocCalled` methods.
14+
15+
- `AllocMonitorBase::allocCalled(size_t iRequestedSize, size_t iActualSize)` : `iRequestedSize` is the number of bytes being requested by the allocation call. `iActualSize` is the actual number of bytes returned by the allocator. These can be different because of alignment constraints (e.g. asking for 1 byte but all allocations must be aligned on a particular memory boundary) or internal details of the allocator.
16+
17+
- `AllocMonitorBase::deallocCalled(size_t iActualSize)` : `iActualSize` is the actual size returned when the associated allocation was made. NOTE: the glibc extended interface does not provide a way to find the requested size base on the address returned from an allocation, it only provides the actual size.
18+
19+
When implementing `allocCalled` and `deallocCalled` it is perfectly fine to do allocations/deallocations. The facility
20+
guarantees that those internal allocations will not cause any callbacks to be send to any active monitors.
21+
22+
23+
To add a monitor to the facility, one must access the registry by calling the static method
24+
`cms::perftools::AllocMonitorRegistry::instance()` and then call the member function
25+
`T* createAndRegisterMonitor(ARGS&&... iArgs)`. The function will internally create a monitor of type `T` (being careful
26+
to not cause callbacks during the allocation) and pass the arguments `iArgs` to the constructor.
27+
28+
The monitor is owned by the registry and should not be deleted by any other code. If one needs to control the lifetime
29+
of the monitor, one can call `cms::perftools::AllocMonitorRegistry::deregisterMonitor` to have the monitor removed from
30+
the callback list and be deleted (again, without the deallocation causing any callbacks).
31+
32+
## General usage
33+
34+
To use the facility, one needs to use LD_PRELOAD to load in the memory proxies before the application runs, e.g.
35+
```
36+
LD_PRELOAD=libPerfToolsAllocMonitorPreload.so cmsRun some_config_cfg.py
37+
```
38+
39+
Internally, the program needs to register a monitor with the facility. When using `cmsRun` this can most easily be done
40+
by loading a Service which setups a monitor. If one fails to do the LD_PRELOAD, then when the monitor is registered, the
41+
facility will throw an exception.
42+
43+
It is also possible to use LD_PRELOAD to load another library which auto registers a monitor even before the program
44+
begins. See [PerfTools/MaxMemoryPreload](../MaxMemoryPreload/README.md) for an example.
45+
46+
## Services
47+
48+
### SimpleAllocMonitor
49+
This service registers a monitor when the service is created (after python parsing is finished but before any modules
50+
have been loaded into cmsRun) and reports its accumulated information when the service is destroyed (services are the
51+
last plugins to be destroyed by cmsRun). The monitor reports
52+
- Total amount of bytes requested by all allocation calls
53+
- The maximum amount of _used_ (i.e actual size) allocated memory that was in use by the job at one time.
54+
- Number of calls made to allocation functions while the monitor was running.
55+
- Number of calls made to deallocation functions while the monitor was running.
56+
57+
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job.
58+
59+
60+
### EventProcessingAllocMonitor
61+
This service registers a monitor at the end of beginJob (after all modules have been loaded and setup) and reports its accumulated information at the beginning of endJob (after the event loop has finished but before any cleanup is done). This can be useful in understanding how memory is being used during the event loop. The monitor reports
62+
- Total amount of bytes requested by all allocation calls during the event loop
63+
- The maximum amount of _used_ (i.e. actual size) allocated memory that was in use in the event loop at one time.
64+
- The amount of _used_ memory allocated during the loop that has yet to be reclaimed by calling deallocation.
65+
- Number of calls made to allocation functions during the event loop.
66+
- Number of calls made to deallocation functions during the event loop.
67+
68+
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job.
69+
70+
### HistogrammingAllocMonitor
71+
This service registers a monitor when the service is created (after python parsing is finished but before any modules
72+
have been loaded into cmsRun) and reports its accumulated information when the service is destroyed (services are the
73+
last plugins to be destroyed by cmsRun). The monitor histograms the values into bins of number of bytes where each
74+
bin is a power of 2 larger than the previous. The histograms made are
75+
- Amount of bytes requested by all allocation calls
76+
- Amount of bytes actually used by all allocation calls
77+
- Amount of bytes actually returned by all deallocation calls
78+
79+
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#ifndef AllocMonitor_interface_AllocMonitorBase_h
2+
#define AllocMonitor_interface_AllocMonitorBase_h
3+
// -*- C++ -*-
4+
//
5+
// Package: AllocMonitor/interface
6+
// Class : AllocMonitorBase
7+
//
8+
/**\class AllocMonitorBase AllocMonitorBase.h "AllocMonitorBase.h"
9+
10+
Description: Base class for extensions that monitor allocations
11+
12+
Usage:
13+
The class is required to be thread safe as all member functions
14+
will be called concurrently when used in a multi-threaded program.
15+
16+
If allocations are done within the methods, no callbacks will be
17+
generated as the underlying system will temporarily suspend such
18+
calls on the thread running the method.
19+
20+
*/
21+
//
22+
// Original Author: Christopher Jones
23+
// Created: Mon, 21 Aug 2023 14:03:34 GMT
24+
//
25+
26+
// system include files
27+
#include <stddef.h> //size_t
28+
29+
// user include files
30+
31+
// forward declarations
32+
33+
namespace cms::perftools {
34+
35+
class AllocMonitorBase {
36+
public:
37+
AllocMonitorBase();
38+
virtual ~AllocMonitorBase();
39+
40+
AllocMonitorBase(const AllocMonitorBase&) = delete; // stop default
41+
AllocMonitorBase(AllocMonitorBase&&) = delete; // stop default
42+
AllocMonitorBase& operator=(const AllocMonitorBase&) = delete; // stop default
43+
AllocMonitorBase& operator=(AllocMonitorBase&&) = delete; // stop default
44+
45+
// ---------- member functions ---------------------------
46+
virtual void allocCalled(size_t iRequestedSize, size_t iActualSize) = 0;
47+
virtual void deallocCalled(size_t iActualSize) = 0;
48+
};
49+
} // namespace cms::perftools
50+
#endif
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
#ifndef PerfTools_AllocMonitor_AllocMonitorRegistry_h
2+
#define PerfTools_AllocMonitor_AllocMonitorRegistry_h
3+
// -*- C++ -*-
4+
//
5+
// Package: PerfTools/AllocMonitor
6+
// Class : AllocMonitorRegistry
7+
//
8+
/**\class AllocMonitorRegistry AllocMonitorRegistry.h "AllocMonitorRegistry.h"
9+
10+
Description: [one line class summary]
11+
12+
Usage:
13+
<usage>
14+
15+
*/
16+
//
17+
// Original Author: Christopher Jones
18+
// Created: Mon, 21 Aug 2023 14:12:54 GMT
19+
//
20+
21+
// system include files
22+
#include <memory>
23+
#include <vector>
24+
#include <malloc.h>
25+
#include <stdlib.h>
26+
27+
// user include files
28+
#include "PerfTools/AllocMonitor/interface/AllocMonitorBase.h"
29+
30+
// forward declarations
31+
32+
namespace cms::perftools {
33+
class AllocTester;
34+
35+
class AllocMonitorRegistry {
36+
public:
37+
~AllocMonitorRegistry();
38+
39+
AllocMonitorRegistry(AllocMonitorRegistry&&) = delete; // stop default
40+
AllocMonitorRegistry(const AllocMonitorRegistry&) = delete; // stop default
41+
AllocMonitorRegistry& operator=(const AllocMonitorRegistry&) = delete; // stop default
42+
AllocMonitorRegistry& operator=(AllocMonitorRegistry&&) = delete; // stop default
43+
44+
// ---------- static member functions --------------------
45+
static AllocMonitorRegistry& instance();
46+
static bool necessaryLibraryWasPreloaded();
47+
48+
// ---------- member functions ---------------------------
49+
50+
//The functions are not thread safe
51+
template <typename T, typename... ARGS>
52+
T* createAndRegisterMonitor(ARGS&&... iArgs);
53+
void deregisterMonitor(AllocMonitorBase*);
54+
55+
private:
56+
friend void* ::malloc(size_t) noexcept;
57+
friend void* ::calloc(size_t, size_t) noexcept;
58+
friend void* ::realloc(void*, size_t) noexcept;
59+
friend void* ::aligned_alloc(size_t, size_t) noexcept;
60+
friend void ::free(void*) noexcept;
61+
62+
friend void* ::operator new(std::size_t size);
63+
friend void* ::operator new[](std::size_t size);
64+
friend void* ::operator new(std::size_t count, std::align_val_t al);
65+
friend void* ::operator new[](std::size_t count, std::align_val_t al);
66+
friend void* ::operator new(std::size_t count, const std::nothrow_t& tag) noexcept;
67+
friend void* ::operator new[](std::size_t count, const std::nothrow_t& tag) noexcept;
68+
friend void* ::operator new(std::size_t count, std::align_val_t al, const std::nothrow_t&) noexcept;
69+
friend void* ::operator new[](std::size_t count, std::align_val_t al, const std::nothrow_t&) noexcept;
70+
71+
friend void ::operator delete(void* ptr) noexcept;
72+
friend void ::operator delete[](void* ptr) noexcept;
73+
friend void ::operator delete(void* ptr, std::align_val_t al) noexcept;
74+
friend void ::operator delete[](void* ptr, std::align_val_t al) noexcept;
75+
friend void ::operator delete(void* ptr, std::size_t sz) noexcept;
76+
friend void ::operator delete[](void* ptr, std::size_t sz) noexcept;
77+
friend void ::operator delete(void* ptr, std::size_t sz, std::align_val_t al) noexcept;
78+
friend void ::operator delete[](void* ptr, std::size_t sz, std::align_val_t al) noexcept;
79+
friend void ::operator delete(void* ptr, const std::nothrow_t& tag) noexcept;
80+
friend void ::operator delete[](void* ptr, const std::nothrow_t& tag) noexcept;
81+
friend void ::operator delete(void* ptr, std::align_val_t al, const std::nothrow_t& tag) noexcept;
82+
friend void ::operator delete[](void* ptr, std::align_val_t al, const std::nothrow_t& tag) noexcept;
83+
84+
friend class AllocTester;
85+
86+
// ---------- member data --------------------------------
87+
void start();
88+
bool& isRunning();
89+
90+
struct Guard {
91+
explicit Guard(bool& iOriginal) noexcept : address_(&iOriginal), original_(iOriginal) { *address_ = false; }
92+
~Guard() { *address_ = original_; }
93+
94+
bool running() const noexcept { return original_; }
95+
96+
Guard(Guard const&) = delete;
97+
Guard(Guard&&) = delete;
98+
Guard& operator=(Guard const&) = delete;
99+
Guard& operator=(Guard&&) = delete;
100+
101+
private:
102+
bool* address_;
103+
bool original_;
104+
};
105+
106+
Guard makeGuard() { return Guard(isRunning()); }
107+
108+
void allocCalled_(size_t, size_t);
109+
void deallocCalled_(size_t);
110+
111+
template <typename ALLOC, typename ACT>
112+
auto allocCalled(size_t iRequested, ALLOC iAlloc, ACT iGetActual) {
113+
[[maybe_unused]] Guard g = makeGuard();
114+
auto a = iAlloc();
115+
if (g.running()) {
116+
allocCalled_(iRequested, iGetActual(a));
117+
}
118+
return a;
119+
}
120+
template <typename DEALLOC, typename ACT>
121+
void deallocCalled(DEALLOC iDealloc, ACT iGetActual) {
122+
[[maybe_unused]] Guard g = makeGuard();
123+
if (g.running()) {
124+
deallocCalled_(iGetActual());
125+
}
126+
iDealloc();
127+
}
128+
129+
AllocMonitorRegistry();
130+
std::vector<std::unique_ptr<AllocMonitorBase>> monitors_;
131+
};
132+
133+
template <typename T, typename... ARGS>
134+
T* AllocMonitorRegistry::createAndRegisterMonitor(ARGS&&... iArgs) {
135+
[[maybe_unused]] Guard guard = makeGuard();
136+
start();
137+
138+
auto m = std::make_unique<T>(std::forward<ARGS>(iArgs)...);
139+
auto p = m.get();
140+
monitors_.push_back(std::move(m));
141+
return p;
142+
}
143+
} // namespace cms::perftools
144+
#endif
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
<use name="FWCore/MessageLogger"/>
2+
<use name="FWCore/ServiceRegistry"/>
3+
<use name="PerfTools/AllocMonitor"/>
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
// -*- C++ -*-
2+
//
3+
// Package: PerfTools/AllocMonitor
4+
// Class : EventProcessingAllocMonitor
5+
//
6+
// Implementation:
7+
// [Notes on implementation]
8+
//
9+
// Original Author: Christopher Jones
10+
// Created: Mon, 21 Aug 2023 20:31:57 GMT
11+
//
12+
13+
// system include files
14+
#include <atomic>
15+
16+
// user include files
17+
#include "PerfTools/AllocMonitor/interface/AllocMonitorBase.h"
18+
#include "PerfTools/AllocMonitor/interface/AllocMonitorRegistry.h"
19+
#include "FWCore/ServiceRegistry/interface/ServiceRegistry.h"
20+
#include "FWCore/MessageLogger/interface/MessageLogger.h"
21+
#include "FWCore/ServiceRegistry/interface/ServiceMaker.h"
22+
23+
namespace {
24+
class MonitorAdaptor : public cms::perftools::AllocMonitorBase {
25+
public:
26+
void performanceReport() {
27+
started_.store(false, std::memory_order_release);
28+
29+
auto finalRequested = requested_.load(std::memory_order_acquire);
30+
auto maxActual = maxActual_.load(std::memory_order_acquire);
31+
auto present = presentActual_.load(std::memory_order_acquire);
32+
auto allocs = nAllocations_.load(std::memory_order_acquire);
33+
auto deallocs = nDeallocations_.load(std::memory_order_acquire);
34+
35+
edm::LogSystem("EventProcessingAllocMonitor")
36+
<< "Event Processing Memory Report"
37+
<< "\n total additional memory requested: " << finalRequested
38+
<< "\n max additional memory used: " << maxActual
39+
<< "\n total additional memory not deallocated: " << present << "\n # allocations calls: " << allocs
40+
<< "\n # deallocations calls: " << deallocs;
41+
}
42+
43+
void start() { started_.store(true, std::memory_order_release); }
44+
45+
private:
46+
void allocCalled(size_t iRequested, size_t iActual) final {
47+
if (not started_.load(std::memory_order_acquire)) {
48+
return;
49+
}
50+
nAllocations_.fetch_add(1, std::memory_order_acq_rel);
51+
requested_.fetch_add(iRequested, std::memory_order_acq_rel);
52+
53+
//returns previous value
54+
auto a = presentActual_.fetch_add(iActual, std::memory_order_acq_rel);
55+
a += iActual;
56+
57+
auto max = maxActual_.load(std::memory_order_relaxed);
58+
while (a > max) {
59+
if (maxActual_.compare_exchange_strong(max, a, std::memory_order_acq_rel)) {
60+
break;
61+
}
62+
}
63+
}
64+
void deallocCalled(size_t iActual) final {
65+
if (not started_.load(std::memory_order_acquire)) {
66+
return;
67+
}
68+
nDeallocations_.fetch_add(1, std::memory_order_acq_rel);
69+
auto present = presentActual_.load(std::memory_order_acquire);
70+
if (present >= iActual) {
71+
presentActual_.fetch_sub(iActual, std::memory_order_acq_rel);
72+
}
73+
}
74+
75+
std::atomic<size_t> requested_ = 0;
76+
std::atomic<size_t> presentActual_ = 0;
77+
std::atomic<size_t> maxActual_ = 0;
78+
std::atomic<size_t> nAllocations_ = 0;
79+
std::atomic<size_t> nDeallocations_ = 0;
80+
81+
std::atomic<bool> started_ = false;
82+
};
83+
84+
} // namespace
85+
86+
class EventProcessingAllocMonitor {
87+
public:
88+
EventProcessingAllocMonitor(edm::ParameterSet const& iPS, edm::ActivityRegistry& iAR) {
89+
auto adaptor = cms::perftools::AllocMonitorRegistry::instance().createAndRegisterMonitor<MonitorAdaptor>();
90+
91+
iAR.postBeginJobSignal_.connect([adaptor]() { adaptor->start(); });
92+
iAR.preEndJobSignal_.connect([adaptor]() {
93+
adaptor->performanceReport();
94+
cms::perftools::AllocMonitorRegistry::instance().deregisterMonitor(adaptor);
95+
});
96+
}
97+
};
98+
99+
DEFINE_FWK_SERVICE(EventProcessingAllocMonitor);

0 commit comments

Comments
 (0)