Some ideas about cloud storage basd message system based on Apache Iggy #1707

evanwave · 2025-04-17T15:13:10Z

evanwave
Apr 17, 2025

Hi, I'd like to share more about my use cases and patterns and some thoughts and ideas about Apache Iggy.

We have many clusters distributed across different regions, with each region having its own cluster. Within each cluster, we run various functions (mostly WASM functions, along with some workloads on each Kubernetes pod) that interact with the cluster. A typical stream processing workflow involves multiple connectors consuming data from various external systems (such as MQTT devices, Elasticsearch, CDC data, backend systems, etc.) into our cluster. This data then passes through many functions, possibly forming a DAG, and finally sends messages to external systems, primarily S3.

Our system characteristics or requirements are:

Low latency within the cluster: Function-to-function communication relies on an event queue, and most functions run in the same cluster. Therefore, the latency between functions should be minimal. Additionally, we have some functions that communicate across regions, but in these cases, latency is not a significant concern.
Persisting into cloud storage: Most of our data eventually goes to cloud storage (S3) for further analysis. We have a dedicated process to compact this stream data into an Iceberg table.
Flexible geo-replication: Our system consists of many clusters, mostly running in some IoT gateways where networks aren't interconnected. However, they can still access public clouds like S3 and our metadata server. We avoid a global MQ cluster to keep things simple, so we need to set up geo-replication for data sharing between clusters. This is a significant challenge for us, and we aim to use S3 to facilitate geo-replication. Any data sent to S3 can be shared by all clusters.

Our thoughts:

We are impressed by Apache Iggy's low latency (microsecond-level delay is remarkable). We want to use Apache Iggy to handle low latency within a single cluster, possibly running an Iggy cluster in each of our clusters.
We aim to persist our data into S3, similar to the Diskless Topic proposal by Apache Kafka https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics . We don't want tiered storage like Kafka's; instead, we want to flush our data every 200-800 ms to S3. Kafka's tiered storage isn't suitable for this. Another pain point with tiered storage is that it tiers data by partition. However, we have many topics and a very short flush interval, which impacts performance. We aim to combine multiple partitions into one file (or more, with control over the number of parallel files) and flush them all at once, similar to a WAL file. This issue is also present with the connector, so a simple S3 connector isn't sufficient for our needs.
However, the latency of Diskless Topic might still be too high (200-800 ms is too high for function-to-function communication). Therefore, we want to first persist the data to the local disk in each cluster and then asynchronously flush it to S3, similar to what AutoMQ does: https://www.automq.com/blog/principle-analysis-how-automq-implements-high-performance-wal-based-on-raw-devices

Based on these thoughts, we have some ideas:

We can maintain a WAL storage in Apache Iggy, similar to the Bookkeeper Journal Log, combining many partitions into one WAL file and flushing them together. This allows us to achieve microsecond latency for function-to-function communication.
Iggy can then asynchronously flush the data to S3 every 200-800 ms, achieving sub-second latency for inter-cluster data exchange. For each flush, we can organize data, including compacting topics, merging data from the same partition, and trimming topics (deleting messages already consumed).
The WAL storage can act as a write cache, enabling different Iggy nodes within the same cluster to share this cache, avoiding the need to download data from S3 while consuming. We can also have a read cache for each node to improve read performance.
We will have a global metadata server to communicate and coordinate all this stuff like topic assignment, routing, etc.

I believe a cloud storage-based MQ would be very helpful for this case, and it's becoming a trend in the industry. Many MQ vendors have already implemented it. It would be better if we consider S3 from the start. Also, I'm not thinking that all these features be implemented within Iggy. We can let Iggy focus on its core functions, but I hope it can provide interfaces to support these additional features.

What do you think? I'm really excited to hear your thoughts! Thanks!

spetz · 2025-04-18T11:18:55Z

spetz
Apr 18, 2025
Collaborator

Hey Evan,

Thank you for the very detailed input. As of now, we do have the basic support for S3 backups/archiver, and although, we did consider implementing some sort of tiered storage already in the past, actually quite similar to your use case (persisting the data onto the disk to achieve low latency, and then flushing it into the object storage in the cloud), we decided to halt it, until we complete the most important features, such as ongoing zero-copy (de)serialization, io_uring along with the shared-nothing design, and finally, clustering support based on VSR.

One of the main reasons that makes Iggy stand out across the other tools within the message streaming infrastructure space, is its performance, low latency, and the overall efficiency. We're already able to achieve the microsecond range for p999+ tail latencies (depending on the use case), and this will only get better with the upcoming improvements.

Speaking on behalf of the core team, we do believe, that the need for the extremely fast message streaming infrastructure, where pushing the gigabytes of data per second, within the microsecond tail latency range, will matter a lot in the near future.

If we were to focus mostly on the cloud storage at this point (which is actually quite a challenge to implement properly and would require at least the partial redesign of the existing storage layer), I can't think of many reasons why someone would someone use Iggy instead of the existing Kafka API-compatible tools (where some of these solely focus on this particular feature). The tiered/diskless storage will most likely happen at some point, but at least for now, it's not on our roadmap, as it doesn't give us much of an advantage.

For example, flushing the data could be even done with the usage of connectors, but then being able to read the data, would be a different story. One would have to redesign the storage to support the remote reads, metadata, efficient batching, etc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some ideas about cloud storage basd message system based on Apache Iggy #1707

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Some ideas about cloud storage basd message system based on Apache Iggy #1707

Uh oh!

evanwave Apr 17, 2025

Replies: 1 comment

Uh oh!

spetz Apr 18, 2025 Collaborator

evanwave
Apr 17, 2025

spetz
Apr 18, 2025
Collaborator