upsert performance degradation #9685
Replies: 2 comments
-
|
Hi @dimitryshamonin, You didn't specify which version you're running, versions prior to v25.1 had some issues with the posting list cache management. But regardless, the slowdown in mutation performance is not surprising as the node count in your graph increases. Each mutation must achieve Raft consensus within its group (majority acknowledgment + WAL writes across replicas), so more nodes per group means more network round trips and disk syncs per proposal. Additionally, with more groups, mutations touching predicates on different groups require cross-group distributed transaction coordination, adding further latency. Because Dgraph guarantees strong consistency, this is the tradeoff. I do see one thing in your cache settings that might move the needle. At the moment, you're only giving the posting list cache 4GB (the first part of your 10,45,45 partitioning of 40GB). Every Also, since you have the cores, I'd double the numcompactors. Sustained writes will block if the L0 block fills up faster than the compactors can drain it. Hope this helps. And please post your results (good or bad) here so we can follow along. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @matthewmcneely, We plan to do next load tests this friday - ill update the results. We have tried to change the cache distribution, however it does not seem to change much. We usually see one hour of rps gains with spikes after predicate movement and than - constant degradation. Like this:
Strange that this timings is almost identical for more then 20 test uploads with different group/badger settings. The performance is fine for our task, but it is good to know if you can squeeze some extra rps in case of SLA change. P.S. We are using 25.3 |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Greetings!
In our project we handle around 300-400 mil of objects (around 20 keys each). About 50 GB of compressed data on disk.
We have a dgraph cluster of 12 alpha/ 4 groups (each node - 24 CPU/ 60+ Gb ram).
Data update takes about 5-6 hours (which is fine), however we do notice heavy write performance degradation over upload period (from ~20000 objects/s to ~5000):
Update itself is handled with upserts - batches of 500 objects in 8 parallel streams.
Single mutation looks like this:
Current alpha config:
However tinkering with the settings does not seem to affect the rps much.
Cluster logs does not contains anything interesting except compactions.
Is this behavior normal or are we missing something?
Beta Was this translation helpful? Give feedback.
All reactions