Performance investigations for slow cruise control rebalancing #7167

peterschmitzer · 2022-08-08T06:56:53Z

peterschmitzer
Aug 8, 2022

We are currently practicing a scale out of our kafka cluster in our preprod and for unknown reason the rebalancing performs poorly and takes a lot of time. Unfortunately, we have not found a lot of documentation about performance considerations.

We are running strimzi/kafka:0.28.0-kafka-3.0.0

The cluster holds 4500 partitions with 30 replicas per topic and replication factor of 3 on 3 brokers when I scaled up to 4 brokers.
After applying the kafka rebalance with mode "add-brokers" it proposed to move 1000 replicas and 82MB. The whole process took more than one hour to complete. In our other cluster where we wanted to move only a little bit more data it seems to take forever (there we stoppe the rebalancing).

Does someone have experience with bad performance when applying a rebalance proposal? What may be possible root causes for the issues we see? In the past (with older strimzi and kafka versions) we have done a rebalance in a poc and moved 80GB of data which was completed in 15 minutes, so we are a little bit confused.

scholzj · 2022-08-08T08:09:47Z

scholzj
Aug 8, 2022
Maintainer

Maybe @kyguy or @ppatierno might have some ideas. But keep in mind that performance is heavily dependent on your infrastructure and might not be easily reproducible.

4 replies

kyguy Aug 8, 2022
Collaborator

Hi @peterschmitzer, happy to take a closer look! Could you share your Kafka and KafkaRebalance custom resource and Cruise Control pod logs during the rebalancing?

peterschmitzer Aug 9, 2022
Author

cruise-control (1).txt
events.txt
kafka-rcd.txt
kafkarebalance.txt

Thanks a lot for the offer! Please find the requested information attached and let me know if I can supply further information.

peterschmitzer Aug 23, 2022
Author

hi @kyguy did you already have some time to look at our config? I am wondering if I could use the kafka-reassign-partition as a mitigation to move the partitions manually. Is that a valid approach shortterm?

kyguy Aug 23, 2022
Collaborator

Hey @peterschmitzer apologies for the delay! I notice that the KafkaRebalance custom resource has the following configuration settings:

kind: KafkaRebalance
metadata:
...
spec:
  concurrentIntraBrokerPartitionMovements: 0
  concurrentLeaderMovements: 0
  concurrentPartitionMovementsPerBroker: 0
  replicationThrottle: 0
  skipHardGoalCheck: false

Is there any particular reason that concurrentLeaderMovements, concurrentPartitionMovementsPerBroker and replicationThrottle are set to 0? These configurations can slow down the rebalancing process and likely are the reason why the rebalancing is taking so long! Have you tried rebalancing without specifying these configurations so that the operator will use the larger default values?

peterschmitzer · 2022-08-08T13:19:35Z

peterschmitzer
Aug 8, 2022
Author

It may be worth mentioning that the reassignment of partitions "manually" via the kafka-reassign-partitions.sh in kafka onboard tools worked flawlessly and was very fast. So we doubt it has something to do with our underlying infrastructure.

1 reply

scholzj Aug 8, 2022
Maintainer

Well, I'm not sure that is the case. It can impact how the reassignment qoutas might be set etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strimzi

Performance investigations for slow cruise control rebalancing #7167

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Strimzi

Performance investigations for slow cruise control rebalancing #7167

Uh oh!

peterschmitzer Aug 8, 2022

Replies: 2 comments · 5 replies

Uh oh!

scholzj Aug 8, 2022 Maintainer

Uh oh!

Uh oh!

kyguy Aug 8, 2022 Collaborator

Uh oh!

peterschmitzer Aug 9, 2022 Author

Uh oh!

peterschmitzer Aug 23, 2022 Author

Uh oh!

kyguy Aug 23, 2022 Collaborator

Uh oh!

peterschmitzer Aug 8, 2022 Author

Uh oh!

scholzj Aug 8, 2022 Maintainer

peterschmitzer
Aug 8, 2022

Replies: 2 comments 5 replies

scholzj
Aug 8, 2022
Maintainer

kyguy Aug 8, 2022
Collaborator

peterschmitzer Aug 9, 2022
Author

peterschmitzer Aug 23, 2022
Author

kyguy Aug 23, 2022
Collaborator

peterschmitzer
Aug 8, 2022
Author

scholzj Aug 8, 2022
Maintainer