How the storage engines work? #212

tyt2y3 · 2025-02-12T11:29:38Z

tyt2y3
Feb 12, 2025

Congrats, I've been looking for a pure Rust streaming solution for a while!

Just read it briefly, particularly interested in the S3 backend. The broker is completely stateless, so how does it subscribe to new messages? Polling the metadata of the S3 object periodically? I would love to learn about the design / algorithm around latency.

As per the Postgres backend, I can imagine how one would implement it, but still good to know about the design specifics. Does it put shards into different tables?

Second lastly, how does the in memory store compare with Redis Stream?

Lastly, do you plan to make a native rust client for it, potentially using a simpler protocol? I would like to avoid librdkafka, really.

I would like to try it out and may be integrate this into https://github.com/SeaQL/sea-streamer/

shortishly · 2025-02-12T14:51:23Z

shortishly
Feb 12, 2025
Maintainer

Thanks for the questions!

Just read it briefly, particularly interested in the S3 backend. The broker is completely stateless, so how does it subscribe to new messages? Polling the metadata of the S3 object periodically? I would love to learn about the design / algorithm around latency.

In the Kafka protocol clients periodically issue fetch requests with a maximum wait time and a minimum number of bytes to receive (essentially a "long poll"). The broker makes an initial request to storage asking for data, if it finds sufficient data, it is returned to the client immediately. Otherwise the broker will wait and issue further request(s) to storage after 50% of the time has expired.

As per the Postgres backend, I can imagine how one would implement it, but still good to know about the design specifics. Does it put shards into different tables?

At present all topic partitions are stored in the same table. I plan to use table partitioning essentially for each topic partition creating a shard. For topics that also have a message schema, I am also considering partitioning those tables and using the message schema to form the DDL of the table - if the message schema has a field called "first_name" that is a string, then the PostgreSQL table created would have a column called "first_name" that is varchar. Essentially, a Kafka Sink that writes to a partitioned table using the topic's schema.

Second lastly, how does the in memory store compare with Redis Stream?

The underlying S3 and memory storage are provided by object_store. All the storage engines present the same Kafka(esque) API.

Lastly, do you plan to make a native rust client for it, potentially using a simpler protocol? I would like to avoid librdkafka, really.

No plan, right now... but that itch gets bigger every time I need to use the Java CLI to do something. It is possible that itch might develop into some simple CLI Rust utils to create topics/etc, that could eventually morph into a client.

I'll write up some longer form responses to your questions when I have a bit more time.

SeaQL looks very cool! I would be happy to chat more about that!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tansu

How the storage engines work? #212

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Tansu

How the storage engines work? #212

Uh oh!

Uh oh!

tyt2y3 Feb 12, 2025

Replies: 1 comment

Uh oh!

shortishly Feb 12, 2025 Maintainer

tyt2y3
Feb 12, 2025

shortishly
Feb 12, 2025
Maintainer