Streams running with Ram Disk (tmfps) #8948
-
Hi We are very keen about using RMQ Streams and like all of its features as well as performance, but encountered some hickups during host failures of the underlying storage system that apparently affects RMQ Stream Latency drastically (15-20 seconds of delay). Thank you & |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
The underlying Raft library, Using a RAM disk is a good option for an "in-memory" Streams solution. |
Beta Was this translation helpful? Give feedback.
-
RabbitMQ 4.0 will remove in-memory only entities because in our 16+ years of experience, they cause way more problems than they solve. |
Beta Was this translation helpful? Give feedback.
-
Thx for the answers and the link to the RA library. So if I got that right and a client writes a msg to an RMQ Stream, every node executes the following:
So let's say if the disc is unresponsive for ~5 secs or more I would expect the following behavior:
If this only happens on one out of 3 nodes, this should not affect the writing client right? And a side question: Are there any limits for this cache - so let's say if the disc is unresponsive for ~30 secs during peak load (worst case?) |
Beta Was this translation helpful? Give feedback.
-
Thx again for your replies. I think I got some more insights. To answer your cache q: Its documented here: https://github.com/rabbitmq/ra/blob/main/docs/internals/INTERNALS.md#wals-ets-tables To give you some more details about our env (should have added that in the first place): We conduct some more failure tests with VSAN and ram disks and I will then come back with results if you guys are interested. |
Beta Was this translation helpful? Give feedback.
Modern RabbitMQ 3.x features, most notably quorum queues and streams, and as of RabbitMQ 4.0, virtually every subsystem, are not designed with transient storage in mind.
Data safety features of modern queue and stream types, node restarts, the upgrade process and tooling: all of these things do not assume that a node can basically lose all of its data and prior knowledge about the rest of the cluster.
Upgrades and even node restarts will fail with RAM-based storage (at some point whatever provides the RAM-backed filesystem volume also has to be restarted, right?).
Both the Cluster Formation and Clustering doc guides describes a scenario where a node is reset or restored after failure, and…