Replies: 1 comment
-
|
Thanks for the questions!
In the Kafka protocol clients periodically issue fetch requests with a maximum wait time and a minimum number of bytes to receive (essentially a "long poll"). The broker makes an initial request to storage asking for data, if it finds sufficient data, it is returned to the client immediately. Otherwise the broker will wait and issue further request(s) to storage after 50% of the time has expired.
At present all topic partitions are stored in the same table. I plan to use table partitioning essentially for each topic partition creating a shard. For topics that also have a message schema, I am also considering partitioning those tables and using the message schema to form the DDL of the table - if the message schema has a field called "first_name" that is a string, then the PostgreSQL table created would have a column called "first_name" that is varchar. Essentially, a Kafka Sink that writes to a partitioned table using the topic's schema.
The underlying S3 and memory storage are provided by object_store. All the storage engines present the same Kafka(esque) API.
No plan, right now... but that itch gets bigger every time I need to use the Java CLI to do something. It is possible that itch might develop into some simple CLI Rust utils to create topics/etc, that could eventually morph into a client. I'll write up some longer form responses to your questions when I have a bit more time. SeaQL looks very cool! I would be happy to chat more about that! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Congrats, I've been looking for a pure Rust streaming solution for a while!
Just read it briefly, particularly interested in the S3 backend. The broker is completely stateless, so how does it subscribe to new messages? Polling the metadata of the S3 object periodically? I would love to learn about the design / algorithm around latency.
As per the Postgres backend, I can imagine how one would implement it, but still good to know about the design specifics. Does it put shards into different tables?
Second lastly, how does the in memory store compare with Redis Stream?
Lastly, do you plan to make a native rust client for it, potentially using a simpler protocol? I would like to avoid librdkafka, really.
I would like to try it out and may be integrate this into https://github.com/SeaQL/sea-streamer/
Beta Was this translation helpful? Give feedback.
All reactions