Skip to content

Conversation

@ndr-ds
Copy link
Contributor

@ndr-ds ndr-ds commented Sep 11, 2025

Motivation

After a while, while benchmarking, some of our shards start OOMing. After inspecting memory profiles, it seems that the culprit is the ValueCache for blocks specifically.

Proposal

Lower the size of the cache so that it fills up and start evicting entries before making the VM run out of memory.

Test Plan

Deploy a network with this code, benchmark it for a while and make sure that memory goes flat

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

This was referenced Sep 11, 2025
@ndr-ds ndr-ds marked this pull request as ready for review September 11, 2025 16:12
@ndr-ds ndr-ds requested review from Twey, afck, deuszx and ma2bd September 11, 2025 16:12
value_cache::ValueCache,
};

const BLOCK_CACHE_SIZE: usize = 5_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it really OOMed with 10_000 I'd do more than just halve it. I'd go down to 1_000 or even 100 maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I agree, but the more we lower this the more we seem to be penalized in performance... So 5k seems like a good compromise

use super::ValueCache;

/// Test cache size for unit tests.
const TEST_CACHE_SIZE: usize = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 10? Does it really make a difference in any of the tests? Otherwise it will just make the tests slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, 10 should be fine. Keep in mind that the slowness we would get from this is from the initial memory allocation only, because if it's some long running test it would make the test faster, actually, as more things would be cached.
For these tests that trade-off doesn't seem to make sense though, so 10 indeed makes more sense.

@deuszx
Copy link
Contributor

deuszx commented Sep 11, 2025

We already have size-based cache in the repository: BlockCache in the linera-service/exporter. It uses quick_cache Cache with a custom weighter BlockCacheWeighter.

Maybe as a follow-up though.

@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from 73a8b47 to d7b6485 Compare September 12, 2025 14:45
@ndr-ds ndr-ds force-pushed the 09-03-continuous_memory_profiling branch from 38ab4bf to 7a2e2e9 Compare September 12, 2025 14:45
@ndr-ds ndr-ds changed the base branch from 09-03-continuous_memory_profiling to graphite-base/4544 September 15, 2025 13:38
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from d7b6485 to 62be5ef Compare September 15, 2025 13:38
@ndr-ds ndr-ds changed the base branch from graphite-base/4544 to 09-15-add_memory_profiling_dashboard September 15, 2025 13:38
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from 62be5ef to f768ed5 Compare September 16, 2025 03:05
@ndr-ds ndr-ds force-pushed the 09-15-add_memory_profiling_dashboard branch from 15d2858 to 987f950 Compare September 16, 2025 03:05
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from f768ed5 to 9f0c9c4 Compare September 16, 2025 13:01
@ndr-ds ndr-ds changed the base branch from 09-15-add_memory_profiling_dashboard to graphite-base/4544 September 16, 2025 17:24
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from 9f0c9c4 to 5bd75ef Compare September 16, 2025 17:28
@ndr-ds ndr-ds changed the base branch from graphite-base/4544 to 09-15-add_memory_profiling_dashboard September 16, 2025 17:29
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from 5bd75ef to fe923dd Compare September 16, 2025 17:39
@ndr-ds ndr-ds force-pushed the 09-15-add_memory_profiling_dashboard branch from d61401d to 9e25a21 Compare September 16, 2025 17:39
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from fe923dd to 8be9c4e Compare September 16, 2025 17:56
@ndr-ds ndr-ds force-pushed the 09-15-add_memory_profiling_dashboard branch from 9e25a21 to 3ab6fa5 Compare September 16, 2025 17:56
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from 8be9c4e to 91a76d1 Compare September 16, 2025 17:58
@ndr-ds ndr-ds force-pushed the 09-15-add_memory_profiling_dashboard branch from 3ab6fa5 to 513abd9 Compare September 16, 2025 17:58
@graphite-app graphite-app bot changed the base branch from 09-15-add_memory_profiling_dashboard to graphite-base/4544 September 16, 2025 21:23
@ndr-ds ndr-ds force-pushed the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch from 91a76d1 to 56ecc30 Compare September 16, 2025 22:06
@graphite-app graphite-app bot changed the base branch from graphite-base/4544 to main September 16, 2025 22:06
@graphite-app
Copy link

graphite-app bot commented Sep 16, 2025

Merge activity

  • Sep 16, 10:06 PM UTC: Graphite rebased this pull request, because this pull request is set to merge when ready.

@ndr-ds ndr-ds added this pull request to the merge queue Sep 16, 2025
Merged via the queue into main with commit 41f8794 Sep 17, 2025
62 of 84 checks passed
@ndr-ds ndr-ds deleted the 09-09-decrease_block_cache_size_increase_execution_state_cache_size branch September 17, 2025 00:00
ma2bd pushed a commit that referenced this pull request Sep 21, 2025
## Motivation

After a while, while benchmarking, some of our shards start OOMing.
After inspecting memory profiles, it seems that the culprit is the
`ValueCache` for blocks specifically.

## Proposal

Lower the size of the cache so that it fills up and start evicting
entries before making the VM run out of memory.

## Test Plan

Deploy a network with this code, benchmark it for a while and make sure
that memory goes flat

## Release Plan

- Nothing to do / These changes follow the usual release cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants