-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Search before reporting
- I searched in the issues and found nothing similar.
Read release policy
- I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.
User environment
Broker: apachepulsar/pulsar-all: 4.0.7
Helm chart: https://github.com/apache/pulsar-helm-chart/tree/pulsar-4.0.1
Issue Description
Replication for some topics randomly stops during normal operation, causing backlog to accumulate.
The issue is observed in two main cases:
1. A sudden spike in the publish rate to a topic (for example, a steady rate of 5 messages per second followed by a burst of 5,000 messages per second lasting about 5 minutes).
2. External infrastructure issues, such as frequent broker restarts or resource-related problems (for example, high iowait on BookKeeper).
The issue occurs at the topic level. For example, we have both partitioned and non-partitioned topics. The problem is observed with non-partitioned topics, and there were also cases where replication got stuck for a single partition of a partitioned topic.
The only way to restore replication is to disable replication for the namespace and then re-enable it.
Error messages
No any error or warn messages
Reproducing the issue
Can't reproduce using standalone cluster
Additional information
https://apache-pulsar.slack.com/archives/C078TGY9R29/p1764167528705519
2 Clusters:
- 5 Bookies
- 4 Brokers
- 3 Proxies
Replication: 2/2/2
Are you willing to submit a PR?
- I'm willing to submit a PR!