Persistence to MongoDB slows quite alot #2895

avinoam134 · 2024-12-29T13:35:01Z

avinoam134
Dec 29, 2024

Hey all! I'm integrating langgraph into my server with persistence to my MongoDB using langgraph's new default AsyncMongoDBSaver. However some stress tests i made show that it increases my latency quite significatly compared to a MemorySaver.
Running a single user request (i.e 1 full graph run from START to END) takes ~3 seconds with the checkpointer, and ~2 seconds with a MemorySaver. So that seems pretty legit to me
However this scales badly. running 20 users requests (i.e 20 full graph runs) concurrently with the checkpointer raises latency to ~9.5seconds, while the MemorySaver is still on ~2.5 sec latency.
It's worth noting each of my graph iterations go through ~5 "super steps" untill completion (depending on the user prompt, navigation through nodes may vary a little).
Is this latency considered normal? If not can you help me try figuring out why this is happening?

I have tried eliminating the suspicion that locks on the DB are the issue by creating a collection per thread-id in the DB so that each user request would get it's memory persisted to it's own collection (and hence cancel race condition on the entire DB) but this have barely improved the ~9.5 sec latency to ~9sec.
Also, it's worth mentioning that the DB is made solely for the langgraphs runs and the only endpoints triggered in the tests are ones that just trigger a gaph.ainvoke(config) with no other tasks. So there's nothing else that slows things down except for the graph runs.

I also have a suspicion to why this is happening and would like some conformation:
The way i understood things - langgraph is writing intermidiate checkpoints for every super-step passed through the graph. If that's the case, could this explain the issue? If so why are those intermidiate writes made for, aside from protection against unexpected server shutdowns mid graph runs (which isn't significant to me)? Is there a way to cancel it and make writes only at the end of the graph run?

saminahbab · 2025-03-23T19:35:31Z

saminahbab
Mar 23, 2025

I have noticed this too, significant slowing down.

0 replies

aymentil · 2025-04-03T08:44:04Z

aymentil
Apr 3, 2025

Same problem for me too, significant latency increase for a similar sized graph than you I save for each graph execution 31 checkpoints and 130 intermediate checkpoints . I don't know if there is a way to see how many checkpoints we save using the InMemorySaver to compare .

0 replies

caseyclements · 2025-05-30T14:40:31Z

caseyclements
May 30, 2025

Hi @avinoam134, @saminahbab , @aymentil. If you provide some example code, we will investigate.

0 replies

aymentil · 2025-06-03T09:21:16Z

aymentil
Jun 3, 2025

Hi @caseyclements

Regarding the code snippet we're using the following package langgraph-checkpoint-mongodb, at our graph compilation time we pass an instance of AsyncMongoDBSaver , (i.e graph.compile(checkpointer=AsyncMongoDBSaver())
Note: We're not using a native MongoDB instance we're using an emulated one through Azure (Azure Cosmos DB for MongoDB (vCore)) using 7.0 version of MongoDB .

0 replies

caseyclements · 2025-06-13T16:47:00Z

caseyclements
Jun 13, 2025

Using bench/fanout _to_subgraph.py from langgraph's benchmarking suite as an example, I set up a test to compare the sync and async versions of MongoDBSaver with those of InMemorySaver and PostGresSaver.

I cannot speak about Azure's CosmosDB performance,
but persistence to MongoDB performs as well as Postgres.

I've opened up a draft pull-request (#149) in our monorepo. It is draft because we have not set up infrastructure for postgres in our CI, and likely will not, but you can look at the code and run it for yourself to investigate.

tests/integration_tests/test_fanout_to_subgraph.py::test_sync

Begin test_sync
mongodb: 8.9449 seconds
postgres: 10.8356 seconds
in_memory: 3.4794 seconds
PASSED

tests/integration_tests/test_fanout_to_subgraph.py::test_async

Begin test_async
mongodb_async: 9.7151 seconds
postgres_async: 16.1918 seconds
in_memory_async: 2.6644 seconds
PASSED

0 replies

aymentil · 2025-07-09T14:51:19Z

aymentil
Jul 9, 2025

Hi @caseyclements

Using a LangGraph graph that I'm working on I created a benchmark where I ask set of questions for a certain number of runs and I did it with checkpoiting using Redis, MongoDB and PostgreSQL I focused on mainly two metrics latency + storage efficiency. Here is the finding:

Overall Performance Comparison

Database	Mean Total Time (s)	Median Total Time (s)	Std Dev Total Time
PostgreSQL	17.18	17.04	6.57
MongoDB	17.83	17.17	7.63
Redis	18.81	18.64	6.06

Checkpoint Time Comparison

Database	Mean Checkpoint Time (s)	Median Checkpoint Time (s)	Std Dev Checkpoint Time
PostgreSQL	16.88	16.75	6.51
MongoDB	17.54	16.90	7.61
Redis	17.59	17.51	5.93

Storage Growth Comparison

Database	Initial Size	Final Size	Growth
MongoDB	0.00 MB	5.82 MB	5.82 MB
PostgreSQL	10.86 MB	18.91 MB	8.04 MB
Redis	42.36 MB	77.14 MB	36.47 MB

0 replies

Persistence to MongoDB slows quite alot #2895

Uh oh!

Uh oh!

avinoam134 Dec 29, 2024

Replies: 6 comments

Uh oh!

saminahbab Mar 23, 2025

Uh oh!

Uh oh!

aymentil Apr 3, 2025

Uh oh!

caseyclements May 30, 2025

Uh oh!

aymentil Jun 3, 2025

Uh oh!

caseyclements Jun 13, 2025

Uh oh!

Uh oh!

aymentil Jul 9, 2025

Overall Performance Comparison

Checkpoint Time Comparison

Storage Growth Comparison

avinoam134
Dec 29, 2024

saminahbab
Mar 23, 2025

aymentil
Apr 3, 2025

caseyclements
May 30, 2025

aymentil
Jun 3, 2025

caseyclements
Jun 13, 2025

aymentil
Jul 9, 2025