Skip to content

Commit daffd53

Browse files
authored
refactor(profiling): document heap profile sampling (#12483)
The heap profiler does statistical sampling of allocations. There is no explanation in the code (or elsewhere) how the sampling in the profiler works or why the chosen method is justified. We want to know whether it's fair and whether our reported numbers accurately represent the real heap size and the relative portion of the heap taken by different objects of varying sizes. Prior to working on this code, I was particularly concerned about the weighting of sampled values. The method this profiler uses is different than the methods used by either the Go profiler or tcmalloc, which seem to generally do a good job. I've done some testing which seems to indicate that the weighting we do is actually pretty good. So this commit documents _why_ it's okay. My model of sampling borrows pretty heavily from tcmalloc's documentation, which I think does the best job of describing how it ought to work out of any resource I've found so far. I also added a comment describing how the next sampling point is chosen, since it might not be obvious right away from looking at the code how `heap_tracker_next_sample_size`'s math relates to the described sampling method.
1 parent fd86212 commit daffd53

File tree

1 file changed

+67
-3
lines changed

1 file changed

+67
-3
lines changed

ddtrace/profiling/collector/_memalloc_heap.c

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,68 @@
77
#include "_memalloc_reentrant.h"
88
#include "_memalloc_tb.h"
99

10+
/*
11+
How heap profiler sampling works:
12+
13+
This is mostly derived from
14+
https://github.com/google/tcmalloc/blob/master/docs/sampling.md#detailed-treatment-of-weighting-weighting
15+
16+
We want to explain memory used by the program. We can't track every
17+
allocation with reasonable overhead, so we sample. We'd like the heap to
18+
represent what's taking up the most memory. We'd like to see large live
19+
allocations, or when many small allocations in some part of the code add up
20+
to a lot of memory usage. So, we choose to sample based on bytes allocated.
21+
We basically want every byte allocated to have the same probability of being
22+
represented in the profile. Assume we want an average of one byte out of
23+
every R allocated sampled. Call R the "sampling interval". In a simplified
24+
world where every allocation is 1 byte, we can just do a 1/R coin toss for
25+
every allocation. This can be simplified by observing that the interval
26+
between samples done this way follows a geometric distribution with average
27+
R. We can draw from a geometric distribution to pick the next sample point.
28+
For computational simplicity, we use an exponential distribution, which is
29+
essentially the limit of the geometric distribution if we were to divide each
30+
byte into smaller and smaller sub-bytes. We set a target for sampling, T,
31+
drawn from the exponential distribution with average R. We count the number
32+
of bytes allocated, C. For each allocation, we increment C by the size of the
33+
allocation, and when C >= T, we take a sample, reset C to 0, and re-draw T.
34+
35+
If we reported just the sampled allocation's sizes, we would significantly
36+
misrepresent the actual heap size. We're probably going to hit some small
37+
allocations with our sampling, and reporting their actual size would
38+
under-represent the size of the heap. Each sampled allocation represents
39+
roughly R bytes of actual allocated memory. We want to weight our samples
40+
accordingly, and account for the fact that large allocations are more likely
41+
to be sampled than small allocations.
42+
43+
The math for weighting is described in more detail in the tcmalloc docs.
44+
Basically, any sampled allocation should get an average weight of R, our
45+
sampling interval. However, this would under-weight allocations larger than R
46+
bytes, our sampling interval. When we pick the next sampling point, it's
47+
probably going to be in the middle of an allocation. Bytes of the sampled
48+
allocation past that point are going to be skipped by our sampling method,
49+
since we re-draw the target _after_ the allocation. We can correct for this
50+
by looking at how big the allocation was, and how much it would drive the
51+
counter C past the target T. The formula W = R + (C - T) expresses this,
52+
where C is the counter including the sampled allocation. If the allocation
53+
was large, we are likely to have significantly exceeded T, so the weight will
54+
be larger. Conversely, if the allocation was small, C - T will likely be
55+
small, so the allocation gets less weight, and as we get closer to our
56+
hypothetical 1-byte allocations we'll get closer to a weight of R for each
57+
allocation. The current code simplifies this a bit. We can also express the
58+
weight as C + (R - T), and note that on average T should equal R, and just
59+
drop the (R - T) term and use C as the weight. We might want to use the full
60+
formula if more testing shows us to be too inaccurate.
61+
*/
62+
1063
typedef struct
1164
{
12-
/* Granularity of the heap profiler in bytes */
65+
/* Heap profiler sampling interval */
1366
uint64_t sample_size;
14-
/* Current sample size of the heap profiler in bytes */
67+
/* Next heap sample target, in bytes allocated */
1568
uint64_t current_sample_size;
1669
/* Tracked allocations */
1770
traceback_array_t allocs;
18-
/* Allocated memory counter in bytes */
71+
/* Bytes allocated since the last sample was collected */
1972
uint64_t allocated_memory;
2073
/* True if the heap tracker is frozen */
2174
bool frozen;
@@ -78,6 +131,12 @@ memheap_init()
78131
static uint32_t
79132
heap_tracker_next_sample_size(uint32_t sample_size)
80133
{
134+
/* We want to draw a sampling target from an exponential distribution with
135+
average sample_size. We use the standard technique of inverse transform
136+
sampling, where we take uniform randomness, which is easy to get, and
137+
transform it by the inverse of the cumulative distribution function for
138+
the distribution we want to sample.
139+
See https://en.wikipedia.org/wiki/Inverse_transform_sampling. */
81140
/* Get a value between [0, 1[ */
82141
double q = (double)rand() / ((double)RAND_MAX + 1);
83142
/* Get a value between ]-inf, 0[, more likely close to 0 */
@@ -245,6 +304,11 @@ memalloc_heap_track(uint16_t max_nframe, void* ptr, size_t size, PyMemAllocatorD
245304
return false;
246305
}
247306

307+
/* The weight of the allocation is described above, but briefly: it's the
308+
count of bytes allocated since the last sample, including this one, which
309+
will tend to be larger for large allocations and smaller for small
310+
allocations, and close to the average sampling interval so that the sum
311+
of sample live allocations stays close to the actual heap size */
248312
traceback_t* tb = memalloc_get_traceback(max_nframe, ptr, global_heap_tracker.allocated_memory, domain);
249313
if (tb) {
250314
if (global_heap_tracker.frozen)

0 commit comments

Comments
 (0)