-
Notifications
You must be signed in to change notification settings - Fork 118
[Ingress-Kafka] Refactor to use ingestion-client
#3975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
384928a to
45e2daf
Compare
36539bf to
eb4dffa
Compare
ingress-coreingress-client
2a6ae6f to
3aa9219
Compare
67091b8 to
95472a3
Compare
107eab3 to
0e6a70f
Compare
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for updating this PR @muhamadazmy. I think we are really close to merging this PR. I left a few minor questions/comments. The main one being whether there can be a problem when running a mixed v1.5 and v1.6 cluster and what happens to the legacy consumer task that tries to get the dedup information from a v1.5 node.
| debug!( | ||
| "Error while looking up latest dedup info for {partition_id}: {err} .. retrying" | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the level escalate if it happens repeatedly?
| /// fallback to retry every 2 seconds. | ||
| pub connection_retry_policy: RetryPolicy, | ||
|
|
||
| /// # Request Batch Size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When should I tweak this value as a user? What's the impact on the system?
502e9e6 to
8829153
Compare
879b1e3 to
cd8127a
Compare
698ac46 to
625f9c1
Compare
|
Hey @muhamadazmy, glad to see work on Kafka ingress. Is there a bench setup you have (or one that is already in the restate repo) that I can use to measure peak throughput & system behaviour in my deployment of Restate (cluster)? |
|
Dear @v1gnesh, Just to clarify: PR #3976 only introduces the foundational piece (the ingestion client). The actual performance improvements to Kafka ingestion are implemented in this PR (#3975), which makes use of the new ingestion client. That said, I’m happy to explain how I measure throughput. Test setup:
I prepare a Kafka topic containing exactly 1 million invocations. This allows me to reset the consumer group ID and repeatedly measure throughput under different configurations. Once the Kafka subscription is created, I monitor throughput in Grafana. The most relevant metrics are:
Note that using the new ingestion client is disable by default since it's still a new feature. To enable it explicitly you need to start restate server with |
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this PR @muhamadazmy. The changes look good to me. I've left a comment for a possible follow-up where we could reconsider whether to use u128 for the ProducerId::Producer and a comment about the usage of silent modifications which I believe are dangerous to use and might result in incorrect behavior.
| "failed to query latest dedup \ | ||
| sequence number for '{legacy_producer_id}': {status:?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include node_id.
| fn dedup_producer_id( | ||
| subscription: &SubscriptionId, | ||
| consumer_group: &str, | ||
| topic: &str, | ||
| partition: i32, | ||
| ) -> u128 { | ||
| let mut hasher = xxhash_rust::xxh3::Xxh3::new(); | ||
|
|
||
| subscription.hash(&mut hasher); | ||
| '\0'.hash(&mut hasher); | ||
| consumer_group.hash(&mut hasher); | ||
| '\0'.hash(&mut hasher); | ||
| topic.hash(&mut hasher); | ||
| '\0'.hash(&mut hasher); | ||
| partition.hash(&mut hasher); | ||
|
|
||
| hasher.digest128() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering whether u128 is actually an overkill for the ProducerId::Producer. How many producers could there be and could we generate the Kafka producer id based on the Schema (e.g. having a counter value that counts the number of subscriptions and every new subscription gets the next counter value)? If we wouldn't hash then decreasing the producer id value to a u64 should be more than sufficient. Of course, one would have to handle the different partitions which could be done by using a few bits of the u64 for the partition (I guess that 16 bist should be more than enough for the number of partitions).
Nothing we have to address right now but maybe something to double check before creating the v1.6 release.
| pub enum ProducerId { | ||
| Partition(PartitionId), | ||
| Other(ByteString), | ||
| Producer(U128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering whether u128 is really or whether u64 could be sufficient if we generated a deterministic subscription counter from the Schema, for example. Maybe something to consider before we release things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind using u128 is also to allow external producer ids in case we create a public ingestion API in the future. This way the user can use unique names for his external producer and we still can calculate deterministic producer ids with virtually infinite collision-free space, without the need to track producer ids.
|
Thanks for taking the time for a detailed response, @muhamadazmy. I had tested a Restate cluster with 3 container nodes in kubernetes. So I started messing with the rdkafka config params, and just when the subscription was created, the network activity was high (as I was fiddling with the initial req bytes kinda param), then it went back to ~60. |
|
@v1gnesh If you would like with this new ingestion mechanism you can always build the restate server binary from this branch directly. |
8e592ee to
57ff349
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for updating this PR @muhamadazmy. Nice work! LGTM. +1 for merging after addressing my minor comments.
| if let Some(key) = msg.key() { | ||
| headers.push(Header::new( | ||
| "kafka.key", | ||
| &*base64::prelude::BASE64_URL_SAFE.encode(key), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we base64 encoding the key when we require the key to be valid utf-8 here
restate/crates/ingress-kafka/src/builder.rs
Line 181 in 57ff349
| std::str::from_utf8(&key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is retained as is from the legacy ingestion. Mainly to avoid unnecessary changes between the two versions.
| // this future will be aborted when the partition is no longer needed, so any exit is a failure | ||
| if let Err(err) = self.run_inner().await { | ||
| _ = self.failed.send(err); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment does not seem to be aligned with what is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_inner() will only return on error. but i see why this can be confusing. I will change it so it always signal a failure if returned even with an Ok().
| "failed to query legacy dedup \ | ||
| sequence number for producer '{legacy_producer_id}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the node_id to the error message as well.
| debug!( | ||
| restate.subscription.id = %self.builder.subscription().id(), | ||
| messaging.consumer.group.name = self.consumer_group_id, | ||
| "Starting topic '{}' partition '{}' consumption loop", | ||
| self.topic_partition.0, | ||
| self.topic_partition.1 | ||
| ); | ||
|
|
||
| let legacy_dedup_offset = self.legacy_dedup_offset().await; | ||
| debug!( | ||
| topic=%self.topic_partition.0, kafka_partition=%self.topic_partition.1, | ||
| consumer_group=%self.consumer_group_id, | ||
| "Legacy dedup offset: {legacy_dedup_offset:?}", | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to #[instrument()] run_inner and exposing the commonly used fields as part of the span instead of adding it to every log statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, indeed
| kafka_partition=%self.topic_partition.1, | ||
| consumer_group=%self.consumer_group_id, | ||
| offset=%offset, | ||
| "store kafka offset", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| "store kafka offset", | |
| "Store kafka offset", |
| kafka_partition=%self.topic_partition.1, | ||
| consumer_group=%self.consumer_group_id, | ||
| offset=%offset, | ||
| "skipping kafka message (dedup)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we start log statements with a capital letter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make sure to follow the convention :)
| self.consumer.store_offset(&self.topic_partition.0, self.topic_partition.1, offset)?; | ||
| }, | ||
| Some(received) = consumer_stream.next() => { | ||
| let msg = received?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are exiting the consumption task here, does it make sense to try to drain inflight before exiting or wouldn't it matter because the offset wouldn't be stored on the Kafka broker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If consumer stream failed to receive next message it's probably due to network error, so big chance we will fail to update committed offset. But also, even if some inflight has already been ingested, they will get automatically deduped on next task restart.
crates/types/src/config/common.rs
Outdated
|
|
||
| /// # Experimental Kafka batch ingestion | ||
| /// | ||
| /// Use the new experimental kafka ingestion path which leverage batching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Use the new experimental kafka ingestion path which leverage batching | |
| /// Use the new experimental kafka ingestion path which leverages batching |
crates/worker/src/partition/mod.rs
Outdated
| if let Some(lsn) = &self.status.last_applied_log_lsn { | ||
| self.last_applied_log_lsn_watch.send_replace(*lsn); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we directly using lsn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm 😅 I can't remember exactly, but obviously a mistake
d46c641 to
98c293f
Compare
Summary: Refactor ingress-kafka to leverage on `ingestion-client` implementation. This replaces the previous direct write to bifrost which allows: - Batching, which increases throughput - PP becomes the sole writer of its logs (WIP restatedev#3965)
[Ingress-Kafka] Refactor to use
ingestion-clientSummary:
Refactor ingress-kafka to leverage on
ingestion-clientimplementation. This replacesthe previous direct write to bifrost which allows:
Stack created with Sapling. Best reviewed with ReviewStack.
Shuffler#4024IngestionClientfor invocation and state mgmt #3980ingestion-client#3975