Skip to content

fix delivery tag collision in pubsub transport#2487

Open
jgogstad wants to merge 3 commits intocelery:mainfrom
jgogstad:jgogstad/fix/fix_deliery_tag_collision
Open

fix delivery tag collision in pubsub transport#2487
jgogstad wants to merge 3 commits intocelery:mainfrom
jgogstad:jgogstad/fix/fix_deliery_tag_collision

Conversation

@jgogstad
Copy link

@jgogstad jgogstad commented Mar 10, 2026

This PR fixes two bugs with the gcp pubsub transport:

  1. ack extension doesn't honor request limits
  2. pubsub redeliveries corrupts the QoS internal state causing exceptions during ack

the first one is straight forward, the ModifyAckDeadline request body length limit is 512 KiB, so apply batching when extending; https://docs.cloud.google.com/pubsub/quotas#resource_limits

for the second one, when pubsub delivers the same message multiple times—via ack deadline expiry, explicit nack, or its at-least-once delivery guarantee under concurrent consumers. Each delivery has the same delivery_tag, embedded in the payload at publish time (Channel._inplace_augment_message) , but a different ack_id (assigned by Pub/Sub per delivery). Kombu's virtual transport uses delivery_tag as the key in QoS._delivered, so when the same message is consumed more than once, the new entry overwrites the previous one, losing its ack_id. This causes basic_ack to either acknowledge the wrong ack_id (which is fine, it's the same message) or raise KeyError. The latter happens when _flush removes the entry and basic_ack tries to read it

I propose that the fix for the second one is to generate a unique delivery_tag on pull, this aligns with how the RabbitMQ transport does it (unique delivery_tags for each delivery). Maintainers please verify and have an opinion though, I don't know this code base well.

We hit the second issue, we nacked a couple of messages, and we had autoretry_for = (Exception,). When basic_ack threw exceptions due to the internal state being corrupted, it triggered the retry path which published more messages, raised exceptions on ack, and so on (this is how we ended up with hitting the ModfiyAckDeadline limit).

I'm pretty sure there's a bug where a nack doesn't cause the ack-id to be removed from the list of ack ids being extended, it's not a problem per se, it'll just be ignored by pubsub. Not handled in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants