fcmp++: daemon sets max arenas glibc will use by j-berman · Pull Request #228 · seraphis-migration/monero

j-berman · 2025-11-08T18:58:51Z

This setting ensures the daemon won't use more memory than expected when batch verifying FCMP++ txs. This is the crucial setting necessary for @SNeedlewoods to avoid consistent OOM's when syncing as noted here.

What was happening: @SNeedlewoods' 2 thread machine with 8GB RAM must have been using more than 10 arenas, which would mean the allocator was not releasing over 8GB of memory back to the OS, even though the memory used for batch verifying FCMP++ txs is freed properly.

This PR ensures a 2 thread Linux machine using glibc's allocator would use a max of 2 arenas, ensuring that more memory doesn't get allocated and cached than expected.

Hopefully the comments in the PR / links are also helpful in explaining the underlying reasoning.

jeffro256 · 2025-11-08T21:50:45Z

I thought the high memory usage was inside the Rust code. Does changing glibc parameters actually affect Rust memory allocation?

j-berman · 2025-11-09T03:45:31Z

I thought the high memory usage was inside the Rust code

So the Rust side is doing the allocations, but I believe the issue is also that because the calls from C++ are multithreaded, that the allocator on the Rust side is using multiple arenas. If the C++ side was not multithreaded, I would not expect this change to make a difference.

Does changing glibc parameters actually affect Rust memory allocation?

The Rust side uses the system allocator, which is glibc's in @SNeedlewoods' case and (the expected default case for most linux systems). Thus, by default this call also affects the Rust side.

Any system using a different memory allocator on the Rust side is not expected to be affected by this same issue.

If the Rust side ends up using a different memory allocator, then whatever decisions that memory allocator makes could resurface the issue on that system.

j-berman · 2025-11-09T04:32:37Z

There is also a case to do this on the C++ side to avoid needing another dependency to call glibc's mallopt on the Rust side

jeffro256 · 2025-11-09T20:30:05Z

So is the allocation that the Rust side is doing a small number of very large chunks, which is why each arena is getting outsized? If so, perhaps this approach might be better suited: pop-os/cosmic-bg#73. It ostensibly sets a maximum value for arena usage after which large allocations use mmap and drop the value immediately on a call to free().

jeffro256 · 2025-11-09T20:48:05Z

Without having done any testing myself, and without any real evidence to back my opinion up yet, it feels wrong to decrease the number of arenas from the default. I fear that it may have negative performance impacts for everything else that isn't FCMP++ allocations due to a higher rate of contention. It's honestly might be fine on a 2-thread system since if any thread is looking for an arena to use, the max number of other threads holding arenas is 1 so each thread will on average have to only try 0.5 times to get an arena. However, on a Ryzen Threadripper 3900X with 128 processing threads, this change might actually become a bottleneck.

ComputeryPony · 2025-11-10T02:38:57Z

Perhaps it would be better to switch to just using mmap/VirtualAlloc for those big allocations in the FCMP++ code (with a fallback to standard allocations on unsupported platforms). That way there's less chance of unintended performance side effects on existing C++ code and having to constantly keep this change in mind when attempting to optimize memory usage in the future.

j-berman · 2025-11-10T16:51:03Z

I will look into both the above ideas some more (I did explore setting a lower threshold in my investigation, but will take another pass at it). My thoughts right now..

The Rust side is doing a large number of small allocations, not a small number of large allocations. That will make the above ideas a bit trickier to implement effectively than anticipated. See this idea by @kayabaNerve to use one large allocation for all the inner vecs, which would make it simpler.

I think the perf concerns seem overstated/misunderstood. This PR is setting max arenas equal to n threads, which the man pages basically even recommends to boost performance here:

When employing newer glibc versions, applications may in
some cases exhibit high contention when accessing arenas.
In these cases, it may be beneficial to increase
M_ARENA_MAX to match the number of threads. This is
similar in behavior to strategies taken by tcmalloc and
jemalloc (e.g., per-thread allocation pools).

I think there may be some valid concern this solution isn't portable, and that other systems may have a similar issue depending on their allocator's behavior.

kayabaNerve · 2025-11-10T17:04:57Z

I also suggested the idea of using a single allocation per-thread for the Rust code, either static dynamic, and writing a bespoke allocator over increments of it per batch verification. That way the system only sees n large allocations.

Without using another allocator, without writing one, without going through the fcmp++ codebase for each allocation, tuning is the available option.

One may make sure Rust is affected by having Rust explicitly use libc's allocator and call mallopt on its end. The Rust-maintained libc bindings shouldn't be a concern.

j-berman · 2025-11-10T17:20:35Z

One may make sure Rust is affected by having Rust explicitly use libc's allocator and call mallopt on its end

Personally I think this was overkill compared to just tuning the setting on the C++ side for a few reasons:

If glibc isn't available on the system, the Rust side won't be able to change M_ARENA_MAX anyway.
We don't know if another libc impl will have the issue or not, so we're not necessarily in a better position by forcing libc's allocator when glibc isn't available.
If glibc isn't the default system allocator, I don't expect glibc to be available on that system out of the box.
Leaning on the assumption that Rust will use the default system allocator seems sane.

I think above is reasoning enough not to add another dependency on the Rust side, even if Rust-maintained, and add the additional overhead of calling it over the FFI and modifying system allocator on the Rust side.

j-berman · 2025-11-12T00:05:49Z

Unfortunately setting even MALLOC_MMAP_THRESHOLD_=128 didn't impact the increasing RSS apparently caused by arenas not cleaning up unused memory. It also slows down verify considerably.

You can observe increasing memory usage by observing top while running the batch_verify_from_file test from #224.

j-berman · 2025-11-13T20:01:55Z

This change may not be necessary thanks to this drastic improvement to the FCMP++ lib's memory usage: kayabaNerve/monero-oxide#4

Will follow up.

j-berman · 2025-11-19T02:44:49Z

This PR is still necessary for systems using glibc. Unfortunately we received a report of an OOM on a machine with 4 cores and 8gb of RAM with the improved FCMP++ lib linked above, and without this change. That OOM is unfortunately unsurprising because by default, glibc can use up to n threads * 8 arenas. Thus, this PR is still necessary for systems using glibc.

I included further rationale in the NWLB channel here

j-berman force-pushed the mallopt branch from b46cede to 831003d Compare November 8, 2025 19:13

fcmp++: set max arenas glibc will use

2a2c95a

j-berman force-pushed the mallopt branch from 831003d to 2a2c95a Compare November 8, 2025 19:21

j-berman mentioned this pull request Nov 8, 2025

fcmp++: set max arenas glibc will use [stressnet] #229

Merged

j-berman marked this pull request as draft November 13, 2025 20:01

Use at least 2 arenas to match glibc

984cea4

j-berman marked this pull request as ready for review November 19, 2025 02:40

j-berman mentioned this pull request Dec 9, 2025

Sporadically slow FCMP++ verify #246

Closed

j-berman mentioned this pull request Jan 8, 2026

TODO's for beta stressnet #166

Open

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fcmp++: daemon sets max arenas glibc will use#228

fcmp++: daemon sets max arenas glibc will use#228
j-berman wants to merge 2 commits intoseraphis-migration:fcmp++-stagefrom
j-berman:mallopt

j-berman commented Nov 8, 2025

Uh oh!

jeffro256 commented Nov 8, 2025

Uh oh!

j-berman commented Nov 9, 2025

Uh oh!

j-berman commented Nov 9, 2025 •

edited

Loading

Uh oh!

jeffro256 commented Nov 9, 2025

Uh oh!

jeffro256 commented Nov 9, 2025 •

edited

Loading

Uh oh!

ComputeryPony commented Nov 10, 2025

Uh oh!

j-berman commented Nov 10, 2025

Uh oh!

kayabaNerve commented Nov 10, 2025

Uh oh!

j-berman commented Nov 10, 2025

Uh oh!

j-berman commented Nov 12, 2025

Uh oh!

j-berman commented Nov 13, 2025

Uh oh!

j-berman commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

j-berman commented Nov 8, 2025

Uh oh!

jeffro256 commented Nov 8, 2025

Uh oh!

j-berman commented Nov 9, 2025

Uh oh!

j-berman commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffro256 commented Nov 9, 2025

Uh oh!

jeffro256 commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ComputeryPony commented Nov 10, 2025

Uh oh!

j-berman commented Nov 10, 2025

Uh oh!

kayabaNerve commented Nov 10, 2025

Uh oh!

j-berman commented Nov 10, 2025

Uh oh!

j-berman commented Nov 12, 2025

Uh oh!

j-berman commented Nov 13, 2025

Uh oh!

j-berman commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

j-berman commented Nov 9, 2025 •

edited

Loading

jeffro256 commented Nov 9, 2025 •

edited

Loading