Reduce garbage generation in `prometheus_text_format` #196

the-mikedavis · 2025-11-05T23:17:46Z

This is a somewhat small set of patches to prometheus_text_format which aim to reduce garbage creation during registry formatting. Reducing garbage creation drives down the cost to the VM of scraping large registries - both in terms of peak memory allocation and also the work that the garbage collector must do.

With these changes I see reduction in allocation reported by tprof in a stress test of one of RabbitMQ's most expensive registries. In a test against single-instance RabbitMQ brokers on EC2 instances this saves a noticeable amount of peak memory and reduces CPU utilization significantly.

tprof testing instructions

Clone https://github.com/rabbitmq/rabbitmq-server
cd rabbitmq-server
make deps
make run-broker
In another terminal in the rabbitmq-server repo, sbin/rabbitmqctl import_definitions path/to/100k-classic-queues.json pointing to this definitions file.
In the shell from the make run-broker terminal, start tprof tracing for new processes: tprof:start(#{type => call_memory}), tprof:enable_trace(new), tprof:set_pattern('_', '_', '_').
In another terminal scrape the expensive endpoint: curl -v localhost:15692/metrics/per-object --output /dev/null
When that's done, collect and format the sample: tprof:format(tprof:inspect(tprof:collect())).

To test this change, Ctrlc twice out of make broker, cd deps/prometheus and check out this branch. Then rm -rf ebin in that directory, cd ../../ and repeat steps 4, 6, 7 and 8 again (skipping definitions import).

Registry collection tprof measurement before this change...

****** Process <0.301089.0>  --  100.00% of total *** 
FUNCTION                                                                                   CALLS      WORDS    PER CALL  [    %]
... removed everything less than 1% ...
prometheus_text_format:render_labels/1                                                   2308195    1944642        0.84  [ 1.01]
erlang:atom_to_binary/2                                                                   651584    2375647        3.65  [ 1.23]
prometheus_rabbitmq_core_metrics_collector:'-emit_queue_info/3-fun-0-'/3                  100000    2500000       25.00  [ 1.29]
prometheus_model_helpers:counter_metric/2                                                 301325    3615900       12.00  [ 1.87]
prometheus_text_format:'-render_labels/1-fun-0-'/2                                        321434    4178642       13.00  [ 2.16]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^1/1-0-'/2             2300145    4400076        1.91  [ 2.28]
prometheus_model_helpers:'-metrics_from_tuples/2-lc$^0/1-0-'/2                           2308456    4616300        2.00  [ 2.39]
lists:'-filter/2-lc$^0/1-0-'/2                                                           2408461    4816304        2.00  [ 2.49]
erlang:integer_to_binary/1                                                               2206892    6620701        3.00  [ 3.43]
prometheus_rabbitmq_core_metrics_collector:label/1                                       2200038   11000022        5.00  [ 5.69]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^0/1-1-'/2             2300145   11500190        5.00  [ 5.95]
prometheus_text_format:'-emit_mf_metrics/2-fun-0-'/3                                     2308150   11541419        5.00  [ 5.97]
prometheus_model_helpers:gauge_metric/2                                                  2006812   24081744       12.00  [12.47]
prometheus_text_format:has_special_char/1                                               23475329   24147190        1.03  [12.50]
prometheus_text_format:render_series/3                                                   2308200   32511401       14.09  [16.83]
ets:match_object/2                                                                            19   38406095  2021373.42  [19.88]
                                                                                                  193184463              [100.0]

Registry collection tprof measurement after this change...

****** Process <0.401000.0>  --  99.99% of total *** 
FUNCTION                                                                                  CALLS      WORDS    PER CALL  [    %]
... removed everything less than 1% ...
prometheus_model_helpers:label_pair/1                                                    429393    1717572        4.00  [ 1.16]
prometheus_text_format:render_labels/1                                                  2308195    1944642        0.84  [ 1.32]
erlang:atom_to_binary/2                                                                  651584    2375647        3.65  [ 1.61]
prometheus_rabbitmq_core_metrics_collector:'-emit_queue_info/3-fun-0-'/3                 100000    2500000       25.00  [ 1.69]
prometheus_model_helpers:counter_metric/2                                                301325    3615900       12.00  [ 2.45]
prometheus_text_format:'-render_labels/1-fun-0-'/2                                       321434    4178642       13.00  [ 2.83]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^1/1-0-'/2            2300145    4400076        1.91  [ 2.98]
prometheus_model_helpers:'-metrics_from_tuples/2-lc$^0/1-0-'/2                          2308456    4616300        2.00  [ 3.13]
lists:'-filter/2-lc$^0/1-0-'/2                                                          2408461    4816304        2.00  [ 3.26]
erlang:integer_to_binary/1                                                              2206892    6620705        3.00  [ 4.49]
prometheus_rabbitmq_core_metrics_collector:label/1                                      2200038   11000022        5.00  [ 7.45]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^0/1-1-'/2            2300145   11500190        5.00  [ 7.79]
prometheus_text_format:render_series/4                                                  2308200   11541000        5.00  [ 7.82]
prometheus_text_format:render_value/2                                                   2308200   11543618        5.00  [ 7.82]
prometheus_model_helpers:gauge_metric/2                                                 2006812   24081744       12.00  [16.32]
ets:match_object/2                                                                           19   38406095  2021373.42  [26.02]
                                                                                                 147597866              [100.0]

So with this change, the Cowboy request process in charge of this endpoint allocates 147_597_866 words instead of 193_184_463, a reduction of 45_586_597 words or 23.6%.

Stress-testing on EC2...

On EC2 I have two m7g.xlarge instances running RabbitMQ: galactica which carries this change and kestrel which uses prometheus at v5.1.1 (latest version RabbitMQ has adopted). A third instance curls these instances at an interval of two seconds with this script:

#! /usr/bin/env bash

N=600
SLEEP=2
for i in $(seq 1 $N)
do
  echo "Sleeping ${SLEEP}s... ($i / $N)"
  sleep $SLEEP
  echo "Ask for metrics from $1... ($i / $N)"
  curl -s "http://$1:15692/metrics/per-object" --output /dev/null &
done

wait

This asynchronously fires off a scrape request every two seconds for twenty minutes. The third node runs this script against both galactica and kestrel at the same time. The third node also scrapes these nodes' node_exporter metrics and RabbitMQ prometheus endpoint for Erlang allocator metrics.

`kestrel` (baseline)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

`galactica` (this branch)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

We can see kestrel (baseline) pinned at around 95% CPU usage consistently, hovering at around 9-10 GB instance-wide memory usage and the VM aware of 3.5-4.5 GB of usage. And galactica (this branch) sitting at 50% CPU usage, around 7.5-8.5 GB instance-wide memory and the VM tracking around 2-3 GB of memory.

While the peak memory usage is reduced nicely, the main benefit is the CPU is loaded much less than before - I assume from performing less garbage collection.

`prometheus_text_format:has_special_char/1` is called very often when a registry contains many metrics with label pairs. We can use `binary:match/2` to search within a label binary for the special characters (newline, backslash and double-quote) without allocation. The old code using binary match syntax creates a match context every time the function is called (except, not recursion - then the match context is reused). A match context allocates 5 words to the process heap when it is created. When matching many many binaries this scales to create a noticeable amount of short-lived garbage. In comparison `binary:match/2` with a precompiled match pattern does not allocate. The BIF for it is also very well optimized, using `memchr` since OTP 22.

The formatting callback for a registry can build each metrics family as a single binary in order to reduce garbage. This mainly involves passing the accumulator binary through all functions that append to it. It's more efficient to append to the resulting binary than to allocate smaller binaries and then append them. For example: <<Blob/binary, Name/binary, "_", Suffix/binary>>. %% versus Combined = <<Name/binary, "_", Suffix/binary>>, <<Blob/binary, Combined/binary>>. The first expression generates less garbage than the second. A good example of this was the `add_brackets/1` function which was inlined. Inlining does not turn the first expression (above) into the second according to the compiler unfortunately, so we pay the cost of creating a binary with brackets and then formatting that into the larger blob, rather than formatting in just by copying. This change manually inlines `add_brackets/1` into its caller `render_series/4`. This change also changes some list strings into binaries. Especially for ASCII, strings binaries are _far_ more compact than lists. Lists need two words per ASCII character - one for the character and one for the tail pointer. So it's like UTF-32 but worse, basically UTF-128 on a 64 bit machine. ASCII or UTF-8 text in binaries takes one byte per character in the binary's array, plus a word or two of metadata. E.g. `<<"hello">>` allocates three words while `"hello"` allocates ten.

Building on the work in the parent commit, now that the data being passed to the `ram_file` is a binary, we can instead build the entire output gradually within the process. We pay in terms of I/O overhead from writing and then reading from the `ram_file` since `ram_file` is a port - all data is passed between the VM and the port driver. The memory consumed by a port driver is also invisible to the VM's allocator, so large port driver resource usage should be avoided where possible. Instead this change refactors the `registry_collect_callback` to fold over collectors and build an accumulator. The `create_mf` callback's return of `ok` forces us to store this rather than pass and return it. So it's a little less hygienic but is more efficient than passing data in/out of a port. This also introduces a function `format_into/3` which can use this folding function directly. This can be used to avoid collecting the entire response in one binary. Instead the response can be streamed with `cowboy_req:stream_body/3` for example.

the-mikedavis added 2 commits November 5, 2025 17:19

the-mikedavis marked this pull request as draft November 6, 2025 18:17

the-mikedavis force-pushed the md/opt branch from f3b748d to ceb4c9f Compare November 6, 2025 19:22

This was referenced Nov 6, 2025

Add prometheus_text_format:format_into/3 #194

Closed

Optimization: stream HTTP responses from rabbit_prometheus_handler rabbitmq/rabbitmq-server#14885

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce garbage generation in `prometheus_text_format` #196

Reduce garbage generation in `prometheus_text_format` #196

Uh oh!

the-mikedavis commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Reduce garbage generation in prometheus_text_format #196

Are you sure you want to change the base?

Reduce garbage generation in prometheus_text_format #196

Uh oh!

Conversation

the-mikedavis commented Nov 5, 2025

kestrel (baseline)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

galactica (this branch)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Reduce garbage generation in `prometheus_text_format` #196

Reduce garbage generation in `prometheus_text_format` #196

`kestrel` (baseline)

`galactica` (this branch)