fix delivery tag collision in pubsub transport#2487
Open
jgogstad wants to merge 3 commits intocelery:mainfrom
Open
fix delivery tag collision in pubsub transport#2487jgogstad wants to merge 3 commits intocelery:mainfrom
jgogstad wants to merge 3 commits intocelery:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes two bugs with the gcp pubsub transport:
the first one is straight forward, the
ModifyAckDeadlinerequest body length limit is 512 KiB, so apply batching when extending; https://docs.cloud.google.com/pubsub/quotas#resource_limitsfor the second one, when pubsub delivers the same message multiple times—via ack deadline expiry, explicit nack, or its at-least-once delivery guarantee under concurrent consumers. Each delivery has the same
delivery_tag, embedded in the payload at publish time (Channel._inplace_augment_message) , but a differentack_id(assigned by Pub/Sub per delivery). Kombu's virtual transport usesdelivery_tagas the key inQoS._delivered, so when the same message is consumed more than once, the new entry overwrites the previous one, losing itsack_id. This causesbasic_ackto either acknowledge the wrongack_id(which is fine, it's the same message) or raiseKeyError. The latter happens when_flushremoves the entry andbasic_acktries to read itI propose that the fix for the second one is to generate a unique
delivery_tagon pull, this aligns with how the RabbitMQ transport does it (unique delivery_tags for each delivery). Maintainers please verify and have an opinion though, I don't know this code base well.We hit the second issue, we nacked a couple of messages, and we had
autoretry_for = (Exception,). Whenbasic_ackthrew exceptions due to the internal state being corrupted, it triggered the retry path which published more messages, raised exceptions on ack, and so on (this is how we ended up with hitting theModfiyAckDeadlinelimit).I'm pretty sure there's a bug where a nack doesn't cause the ack-id to be removed from the list of ack ids being extended, it's not a problem per se, it'll just be ignored by pubsub. Not handled in this PR