Refactor replay buffer to use KV buffer #147

DNXie · 2025-09-11T00:28:13Z

This PR refactors replay buffer storage to KV-based buffer.
We can’t integrate torchstore yet since the necessary APIs aren’t implemented (numel, delete, etc.). This PR refactors replay buffer storage to KV-based data structure, so we’ll be able to switch the backend easily once torchstore is ready.

Added StoreInterface for KV store abstraction and further integration of torchstore
Implemented KVStore as a temporary KV store backend
Added unit tests for KVStore in test_kv_store.py
Refactored ReplayBuffer to useStoreInterface instead of a local list
Updated test_replay_buffer.py and test_toy_rl.py accordingly
Updated all callsites of ReplayBuffer in apps.grpo.main, apps.rl.main, apps.toy_rl.main

Test

pytest tests/unit_tests/test_kv_store.py
pytest tests/unit_tests/test_replay_buffer.py
pytest tests/unit_tests/rl/test_toy_rl.py
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
python -m apps.rl.main --config apps/rl/llama3_8b.yaml
python -m apps.toy_rl.main

casteryh

overall LGTM.
But we can think more about how evict should work in replay buffer - which of course is outside of the scope of this PR.

src/forge/actors/replay_buffer.py

casteryh · 2025-09-11T08:36:38Z

src/forge/actors/replay_buffer.py

+        await self._add(episode)
+
+    async def _add(self, episode) -> None:
+        key = f"rb_ep_{await self.store.numel()}_{uuid.uuid4().hex}"


What's the point of await self.store.numel()?
Also this may be expensive.

Re: recovery and determinism, maybe you could use uuid5 or something like highway hash. But I am not sure how important is determinism.

Maybe you can add a counter in the ReplayBuffer class

Then derive the key using uuid5 and/or highway hash and/or your favorite hash, with the following 3 pieces of information

the counter

the rank of the current worker

the content of the value.

This will generally avoid duplicate keys even if you have episodes with the same content coming in.

I don't think we need deterministic at this stage. Let's keep things simple. I have dropped the await self.store.numel() to make things efficient. Thanks for pointing this out.

src/forge/actors/replay_buffer.py

src/forge/data/stores.py

tests/unit_tests/test_kv_store.py

tests/unit_tests/test_replay_buffer.py

DNXie · 2025-09-11T17:35:37Z

src/forge/actors/replay_buffer.py


+        keys = await self.store.keys()
+
        # TODO: Make this more efficient


@joecummings Do we still need this TODO?

joecummings · 2025-09-12T18:11:59Z

src/forge/interfaces.py

        pass


+class StoreInterface(ABC):


For now, pair this down to the exact APIs we will be using in the ReplayBuffer - no more, no less. We can always update the interface later.

I’ve already pared the interface down to only the APIs we need. My concern is that methods like numel and delete are essential for the replay buffer’s functionality (e.g., eviction, checking buffer size) but aren’t yet implemented in TorchStore. If we remove these from the interface, the buffer implementation won’t be able to operate consistently.

src/forge/actors/replay_buffer.py

joecummings · 2025-09-12T18:29:02Z

src/forge/actors/replay_buffer.py

+    async def _evict(self, curr_policy_version: int) -> None:
+        keys = await self.store.keys()
+        for key in keys:
+            episode = await self.store.get(key)


This is fine for now, but leave a comment that we could store each key as a uuid + the policy version and make this more efficient.

Yes. Added.
Once torchstore support fetching with "prefix", this would be much easier.

Yes. Added. Once torchstore support fetching with "prefix", this would be much easier.

Coming soon!

src/forge/actors/replay_buffer.py

src/forge/data/stores.py

src/forge/actors/replay_buffer.py

joecummings · 2025-09-12T18:37:02Z

src/forge/actors/replay_buffer.py

-            for trajectory in self.buffer
-            if (curr_policy_version - trajectory.policy_version) <= self.max_policy_age
-        ]
+    async def _evict(self, curr_policy_version: int) -> None:


We control this internal method so you can pass in the keys from above, which we already calculated.

Actually, we have to re-fetch the keys after eviction because _evict may delete some entries. We fetch once before _evict to know what to check for eviction, then fetch again after to ensure we only sample from the remaining keys. This prevents trying to access keys that no longer exist.

I think we need to reconsider the whole eviction logic. See my previous point re: concurerncy.

DNXie · 2025-09-12T22:46:59Z

src/forge/actors/replay_buffer.py

-    @endpoint
-    async def setup(self) -> None:
-        self.buffer: list = []
+    def __post_init__(self):


@joecummings I changed the setup to post_init because I found that the setup is not called in many of the scripts when we are using ReplayBuffer. And if it is not called, things may not be initialized correctly (e.g., sampler). Let me know if this cause any concerns.

casteryh

LGTM, Approval 🚀
Note: This is by no means criticism, but I want to point out that the logic here is strictly incorrect when run concurrently (see comments), we can leave that to a future PR.

casteryh · 2025-09-12T23:13:21Z

src/forge/actors/replay_buffer.py

        total_samples = self.dp_size * bsz

        # Evict old episodes
-        self._evict(curr_policy_version)
+        await self._evict(curr_policy_version)

-        if total_samples > len(self.buffer):
+        total_available = await self.store.numel()
+        if total_samples > total_available:
            return None

+        keys = await self.store.keys()
+
        # TODO: Make this more efficient


As a general comment: _evict() before getting keys is not a reliable way to ensure we don't get outdated policies.
Since we have several await points between _evict() and keys(). Unless you want to put an async lock on self.store, which you probably don't. This is beyond the scope of this PR though, please and an TODO here. cc @joecummings

Even nothing is concurrent at all at this point. I think we should at least keep in mind we will need to support concurrency in the very near future. Also it's not necessarily harder to write concurrently correct program. Albeit we do need to be more careful.

Good point. Will add this TODO before landing.

casteryh · 2025-09-12T23:22:28Z

src/forge/actors/replay_buffer.py

-            for trajectory in self.buffer
-            if (curr_policy_version - trajectory.policy_version) <= self.max_policy_age
-        ]
+    async def _evict(self, curr_policy_version: int) -> None:


I think we need to reconsider the whole eviction logic. See my previous point re: concurerncy.

casteryh · 2025-09-12T23:22:42Z

src/forge/actors/replay_buffer.py

+    async def _evict(self, curr_policy_version: int) -> None:
+        keys = await self.store.keys()
+        for key in keys:
+            episode = await self.store.get(key)


Yes. Added. Once torchstore support fetching with "prefix", this would be much easier.

Coming soon!

joecummings · 2025-09-15T21:19:49Z

src/forge/actors/replay_buffer.py

 class ReplayBuffer(ForgeActor):
    """Simple in-memory replay buffer implementation."""

+    store: StoreInterface


nit: maybe call this the backend?

wdyt @LucasLLC ?

Changed store -> backend

joecummings · 2025-09-15T21:21:29Z

src/forge/actors/replay_buffer.py

    @endpoint
    async def state_dict(self) -> dict[str, Any]:
+        keys = await self.store.keys()
+        episodes = [(k, await self.store.get(k)) for k in keys]


This is not ideal IMO - is there a way we could dump / serialize the contents of the store?

@LucasLLC ?

allenwang28

This is really cool! Transparently my concern is that this adds a lot of complexity, which is risky when we don't have multi-node e2e running yet.

Timing wise, it'll be better to shelf this for now and re-visit this once we have something running (maybe even in the next 2 weeks). We'll inevitably hit bottlenecks, but that'll motivate our longer term design and implementation

vidhyav · 2025-09-16T21:37:11Z

src/forge/data/stores.py

+    A simple single-node key-value (KV) store implementation of StoreInterface.
+
+    This acts as a temporary backend for the replay buffer until torchstore
+    supports the full set of operations we need (delete, pop, keys, numel, etc.).


Can you explain the consistency semantics and the thread safety of the interface?

For example, once put finishes, any future gets will always see the effect of the put.

The current KVStore implementation is a simple in-memory dictionary with async methods, intended as a temporary backend until torchstore is ready. It does not provide thread safety or strong consistency guarantees in the presence of concurrent access. Specifically:

If multiple coroutines access or modify the store concurrently, race conditions may occur (e.g., a get may see stale or missing data if a delete or put happens at the same time).

In a single-threaded asyncio event loop, as long as each operation is awaited, the store behaves as expected: once a put completes, subsequent gets will see the new value.

However, if the store is accessed from multiple threads or if multiple async tasks interleave operations without awaiting, consistency is not guaranteed.

The plan is to switch to torchstore once the key APIs like delete and numel are ready, which should provide proper concurrency and consistency guarantees.

DNXie and others added 12 commits August 21, 2025 16:10

Add reward interface, math reward, unit tests

da21e1d

Merge branch 'meta-pytorch:main' into main

5c72908

Merge branch 'meta-pytorch:main' into main

b4d7a61

Merge branch 'meta-pytorch:main' into main

02d77c6

Merge branch 'meta-pytorch:main' into main

fd1d38b

Merge branch 'meta-pytorch:main' into main

f79beee

Merge branch 'meta-pytorch:main' into main

d8d775a

Merge branch 'meta-pytorch:main' into main

e423c44

Merge branch 'meta-pytorch:main' into main

4815c05

Merge branch 'meta-pytorch:main' into main

77d41e4

Merge branch 'meta-pytorch:main' into main

a3feb1e

add ts interface

ff6f5c7

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 11, 2025

DNXie and others added 12 commits September 10, 2025 19:09

add unit test

d9411b9

fix lint

dcc8e00

rename file

d812b9c

add delete all function

05dd33b

Merge branch 'meta-pytorch:main' into main

23d7e02

add ts interface

9953c91

add unit test

726be1c

fix lint

e25a239

rename file

ddde20f

add delete all function

0016889

fix tests

1aec7aa

Resolve merge conflict: keep local stores.py

0d7f4ac

DNXie requested review from casteryh, joecummings and pbontrager September 11, 2025 03:08

casteryh reviewed Sep 11, 2025

View reviewed changes

tests/unit_tests/test_replay_buffer.py Show resolved Hide resolved

DNXie commented Sep 11, 2025

View reviewed changes

resolve comments. add more test cases

0234808

DNXie changed the title ~~[WIP] Use torchstore as backend for replay buffer~~ [WIP] refactor replay buffer to use KV buffer Sep 11, 2025

fix lint

ee5bb0c

DNXie requested a review from LucasLLC September 11, 2025 18:45

DNXie changed the title ~~[WIP] refactor replay buffer to use KV buffer~~ Refactor replay buffer to use KV buffer Sep 11, 2025

DNXie changed the title ~~Refactor replay buffer to use KV buffer~~ [WIP] Refactor replay buffer to use KV buffer Sep 11, 2025

fix failed test

b32d840

DNXie changed the title ~~[WIP] Refactor replay buffer to use KV buffer~~ [RFC] Refactor replay buffer to use KV buffer Sep 12, 2025

joecummings reviewed Sep 12, 2025

View reviewed changes

resolve comments

80113df

DNXie marked this pull request as ready for review September 12, 2025 22:15

apply changes globally

f23285c

DNXie commented Sep 12, 2025

View reviewed changes

simplify deleteall

895858c

DNXie requested a review from joecummings September 12, 2025 22:52

DNXie changed the title ~~[RFC] Refactor replay buffer to use KV buffer~~ Refactor replay buffer to use KV buffer Sep 12, 2025

DNXie added 3 commits September 12, 2025 16:02

fix bug with keys

d8ba98d

fix lint

af84852

fix test

8bef515

casteryh approved these changes Sep 12, 2025

View reviewed changes

joecummings approved these changes Sep 15, 2025

View reviewed changes

allenwang28 requested changes Sep 15, 2025

View reviewed changes

DNXie and others added 5 commits September 15, 2025 16:18

add todo

a58fd2d

store->backend

780239a

correct name

2d2503b

Merge branch 'main' into replay_buffer_ts

c66cfd9

fix lint

393bcca

vidhyav reviewed Sep 16, 2025

View reviewed changes

DNXie closed this by deleting the head repository Oct 3, 2025


		keys = await self.store.keys()

		# TODO: Make this more efficient

Refactor replay buffer to use KV buffer #147

Refactor replay buffer to use KV buffer #147

Uh oh!

Conversation

DNXie commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casteryh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DNXie Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DNXie Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

casteryh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

DNXie commented Sep 11, 2025 •

edited

Loading

DNXie Sep 11, 2025 •

edited

Loading

DNXie Sep 12, 2025 •

edited

Loading

casteryh left a comment •

edited

Loading