Skip to content

Idempotent producer doesn't recover after partition temporary unwritable #2519

@eghilt

Description

@eghilt

During a couple of rebalances, some partitions became temporarily unwritable due to the number of available replicas falling below the minimum required. Applications utilizing idempotent producers, which sent messages to the affected topics, were unable to recover from this issue and continued to discard messages, even after the replica problem was resolved on the Kafka server side. The kafka client applications only recovered after a restart.
Applications without the idempotent producer activated were able to recover immediately once the server-side issue was fixed, without a restart.

Error logs:

  • reason='Local: Inconsistent state' code='Local_Inconsistent'
  • reason='Local: Purged in queue' code='Local_PurgeQueue'
  • reason='Broker: Broker received an out of order sequence number' code='OutOfOrderSequenceNumber'
  • "Unable to reconstruct MessageSet"

"Local_PurgeQueue" and "Unable to reconstruct MessageSet" errors stopped after a few minutes, but "Local_Inconsistent" and "OutOfOrderSequenceNumber" didn't stop until the instance was restarted

Confluent.Kafka Version=2.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions