|
| 1 | + |
| 2 | +# Benchmarking memory usage in R |
| 3 | + |
| 4 | +Profiling memory in R has never been a trivial task. |
| 5 | +In this post, I would like to emphasize that currently popular methods are quite inaccurate and should therefore be used with caution. More importantly, they should not be used for drawing conclusions about the actual memory usage of R functions. |
| 6 | + |
| 7 | +The root cause of the inaccuracy with many memory profiling tools in R is that they measure memory allocated by R (including R's C code). They do not take into account memory allocated using C. |
| 8 | + |
| 9 | +## Memory allocation in R |
| 10 | + |
| 11 | +Following example should make it very clear. |
| 12 | + |
| 13 | +Below R chunk is the content of `memtest.R` file. |
| 14 | +```r |
| 15 | +code = " |
| 16 | + int nx = LENGTH(x); |
| 17 | + double *y = (double*)( |
| 18 | + LOGICAL(r_alloc)[0] ? |
| 19 | + R_alloc(nx, sizeof(*y)) : // allocated by R's C |
| 20 | + malloc(nx * sizeof(*y)) // allocated by C |
| 21 | + ); |
| 22 | + double *xp = REAL(x); |
| 23 | + // populate y |
| 24 | + for (int i=0; i<nx; i++) |
| 25 | + y[i] = xp[i]; |
| 26 | + // do something with y |
| 27 | + for (int i=1; i<nx; i++) |
| 28 | + y[i] = y[i-1]+y[i]; |
| 29 | + // sum double array to ensure compiler wont optimize it away |
| 30 | + double sum = 0.0; |
| 31 | + for (int i=0; i<nx; i++) |
| 32 | + sum += y[i]; |
| 33 | + SEXP res = PROTECT(Rf_allocVector(REALSXP, 1)); |
| 34 | + REAL(res)[0] = sum; |
| 35 | + if (!LOGICAL(r_alloc)[0]) |
| 36 | + free(y); |
| 37 | + UNPROTECT(1); |
| 38 | + return res; |
| 39 | +" |
| 40 | +funx = inline::cfunction(signature(x="numeric", r_alloc="logical"), code, language="C") |
| 41 | +set.seed(108) |
| 42 | +x = rnorm(1e8) |
| 43 | +``` |
| 44 | + |
| 45 | +## Check equal |
| 46 | + |
| 47 | +First, we will ensure that the results are the same, regardless of whether we allocate temporary working memory using R or C: |
| 48 | + |
| 49 | +```sh |
| 50 | +Rscript -e 'source("memtest.R"); funx(x, r_alloc=TRUE)' |
| 51 | +#[1] 1.160649e+12 |
| 52 | +Rscript -e 'source("memtest.R"); funx(x, r_alloc=FALSE)' |
| 53 | +#[1] 1.160649e+12 |
| 54 | +``` |
| 55 | + |
| 56 | +## Memory benchmark using `bench` |
| 57 | + |
| 58 | +Next, we will use the currently most popular package for profiling memory, `bench`: |
| 59 | + |
| 60 | +```sh |
| 61 | +Rscript -e 'source("memtest.R"); bench::mark(funx(x, r_alloc=TRUE))' |
| 62 | +## A tibble: 1 × 13 |
| 63 | +# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time |
| 64 | +# <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> |
| 65 | +#1 funx(x, r_al… 577ms 577ms 1.73 763MB 1.73 1 1 577ms |
| 66 | +Rscript -e 'source("memtest.R"); bench::mark(funx(x, r_alloc=FALSE))' |
| 67 | +## A tibble: 1 × 13 |
| 68 | +# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time |
| 69 | +# <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> |
| 70 | +#1 funx(x, r_al… 589ms 589ms 1.70 0B 0 1 0 589ms |
| 71 | +``` |
| 72 | + |
| 73 | +As we can see in the output of `mark` function, `mem_alloc` is reported to be 0B when we use `malloc`, while for `R_alloc` it reports 763MB. The difference we observe here should serve as a warning. It is because `bench::mark` tracks memory allocations managed by R's memory allocator and doesn't inherently account for memory allocated directly through C functions like `malloc` or `calloc`. If one intends to use the `mark` function to draw conclusions about memory usage, it's crucial to also examine the source code of the function being benchmarked. |
| 74 | + |
| 75 | +It is worth to note that `?mark` explains this issue: |
| 76 | + |
| 77 | +> `mem_alloc` - `bench_bytes` Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. by `malloc()` or `new` directly is not tracked, take care to avoid misinterpreting the results if running code that may do this. |
| 78 | +
|
| 79 | +Unfortunately, people are not aware of it and often publish memory usage benchmarks believing they are accurate. |
| 80 | + |
| 81 | +## Memory benchmark using `cgmemtime` |
| 82 | + |
| 83 | +Lastly, we will use an external process to measure memory, [cgmemtime](https://github.com/gsauthof/cgmemtime), proposed by Matt Dowle in 2014 during his work on [2B rows data.frame grouping benchmark](https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping). |
| 84 | + |
| 85 | +> `cgmemtime` measures the high-water RSS+CACHE memory usage of a process and its descendant processes. |
| 86 | +
|
| 87 | +```sh |
| 88 | +./cgmemtime Rscript -e 'source("memtest.R"); funx(x, r_alloc=TRUE)' |
| 89 | +#child_RSS_high: 1641808 KiB |
| 90 | +#group_mem_high: 1626264 KiB |
| 91 | +./cgmemtime Rscript -e 'source("memtest.R"); funx(x, r_alloc=FALSE)' |
| 92 | +#child_RSS_high: 1641096 KiB |
| 93 | +#group_mem_high: 1625820 KiB |
| 94 | +``` |
| 95 | + |
| 96 | +While `cgmemtime` will report very accurate memory usage statistics, it cannot directly measure the memory usage of an individual function call in isolation as it tracks the memory footprint of the entire process (and its child processes). |
| 97 | +To estimate the memory usage of the `funx()` call in this simple example, we can first measure the R process without calling `funx()`. |
| 98 | + |
| 99 | +```sh |
| 100 | +./cgmemtime Rscript -e 'source("memtest.R");' |
| 101 | +#child_RSS_high: 860884 KiB |
| 102 | +#group_mem_high: 843844 KiB |
| 103 | +``` |
| 104 | + |
| 105 | +And then subtract this baseline from the memory usage when `funx()` is executed: |
| 106 | + |
| 107 | +```r |
| 108 | +(1641096-860884)/1024 |
| 109 | +#[1] 761.9258 |
| 110 | +``` |
| 111 | + |
| 112 | +## Thank you |
| 113 | + |
| 114 | +I hope this post will help people to be a bit more skeptical when reading R's memory benchmarks. |
| 115 | + |
| 116 | +``` |
| 117 | +R version 4.5.0 (2025-04-11) |
| 118 | +Platform: x86_64-redhat-linux-gnu |
| 119 | +Running under: Fedora Linux 42 (Workstation Edition) |
| 120 | +
|
| 121 | +Matrix products: default |
| 122 | +BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP; LAPACK version 3.12.0 |
| 123 | +
|
| 124 | +locale: |
| 125 | + [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C |
| 126 | + [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 |
| 127 | + [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 |
| 128 | + [7] LC_PAPER=en_US.UTF-8 LC_NAME=C |
| 129 | + [9] LC_ADDRESS=C LC_TELEPHONE=C |
| 130 | +[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C |
| 131 | +
|
| 132 | +attached base packages: |
| 133 | +[1] stats graphics grDevices utils datasets methods base |
| 134 | +
|
| 135 | +other attached packages: |
| 136 | +[1] bench_1.1.4 inline_0.3.21 |
| 137 | +
|
| 138 | +loaded via a namespace (and not attached): |
| 139 | +[1] compiler_4.5.0 cli_3.6.4 pillar_1.10.2 glue_1.8.0 |
| 140 | +[5] vctrs_0.6.5 lifecycle_1.0.4 rlang_1.1.6 |
| 141 | +``` |
0 commit comments