Reduce garbage generation in prometheus_text_format
#196
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a somewhat small set of patches to
prometheus_text_formatwhich aim to reduce garbage creation during registry formatting. Reducing garbage creation drives down the cost to the VM of scraping large registries - both in terms of peak memory allocation and also the work that the garbage collector must do.With these changes I see reduction in allocation reported by
tprofin a stress test of one of RabbitMQ's most expensive registries. In a test against single-instance RabbitMQ brokers on EC2 instances this saves a noticeable amount of peak memory and reduces CPU utilization significantly.tprof testing instructions
https://github.com/rabbitmq/rabbitmq-servercd rabbitmq-servermake depsmake run-brokerrabbitmq-serverrepo,sbin/rabbitmqctl import_definitions path/to/100k-classic-queues.jsonpointing to this definitions file.make run-brokerterminal, starttproftracing for new processes:tprof:start(#{type => call_memory}), tprof:enable_trace(new), tprof:set_pattern('_', '_', '_').curl -v localhost:15692/metrics/per-object --output /dev/nulltprof:format(tprof:inspect(tprof:collect())).To test this change, Ctrlc twice out of
make broker,cd deps/prometheusand check out this branch. Thenrm -rf ebinin that directory,cd ../../and repeat steps 4, 6, 7 and 8 again (skipping definitions import).Registry collection tprof measurement before this change...
Registry collection tprof measurement after this change...
So with this change, the Cowboy request process in charge of this endpoint allocates
147_597_866words instead of193_184_463, a reduction of45_586_597words or 23.6%.Stress-testing on EC2...
On EC2 I have two
m7g.xlargeinstances running RabbitMQ:galacticawhich carries this change andkestrelwhich usesprometheusat v5.1.1 (latest version RabbitMQ has adopted). A third instancecurls these instances at an interval of two seconds with this script:This asynchronously fires off a scrape request every two seconds for twenty minutes. The third node runs this script against both
galacticaandkestrelat the same time. The third node also scrapes these nodes'node_exportermetrics and RabbitMQ prometheus endpoint for Erlang allocator metrics.kestrel(baseline)Instance-wide memory usage
Instance-wide CPU usage
Erlang allocators
galactica(this branch)Instance-wide memory usage
Instance-wide CPU usage
Erlang allocators
We can see
kestrel(baseline) pinned at around 95% CPU usage consistently, hovering at around 9-10 GB instance-wide memory usage and the VM aware of 3.5-4.5 GB of usage. Andgalactica(this branch) sitting at 50% CPU usage, around 7.5-8.5 GB instance-wide memory and the VM tracking around 2-3 GB of memory.While the peak memory usage is reduced nicely, the main benefit is the CPU is loaded much less than before - I assume from performing less garbage collection.