-
Notifications
You must be signed in to change notification settings - Fork 133
Description
Description
We had investigated a weird issue happening on the channel level. On several channels we have identified that it is possible to have duplicate message ids.
The result of this is something like
# example
[
{global_id: 1, message_id: 1, channel: "channel_name", data: ""},
{global_id: 2, message_id: 2, channel: "channel_name", data: ""},
{global_id: 3, message_id: 3, channel: "channel_name", data: ""},
{global_id: 57, message_id: 3, channel: "channel_name", data: ""},
{global_id: 4, message_id: 4, channel: "channel_name", data: ""},
{global_id: 58, message_id: 4, channel: "channel_name", data: ""},
{global_id: 59, message_id: 5, channel: "channel_name", data: ""}
]To reach that state you have to reset the backlog_id_key (message_id) key without resetting the backlog_key (channel backlog) itself.
The only way message_id gets reset as far as I know is through expiration inside the transactional LUA script but in many cases we didn't expect this to expire at all since new messages where added daily to the channel (default expiration is 7.days).
The code that affects us the most right now is the following
https://github.com/discourse/message_bus/blob/main/lib/message_bus/backends/redis.rb#L228-L238
since it gets the first of the items having the same score without taking into account the global_id order. In the usual case you will always have 1 item there, but in our case it happens to be 2, so it's very possible that we can consume stale data.
Questions
- Are there other ways to reset or expire just the
backlog_id_keyin message bus channel lifecycle? That could explain this inconsistent state of the channel? - Did you ever face anything similar? The LUA script is nearly identical through the gem versions.
- Is it reasonable to handle this edge case and always return the last item for a
message_idbased on theglobal_idon theredis.zrangebyscore backlog_key, message_id, message_idwith a PR?
Versions
redis: 6.x
redis gem: 4.8.1
message_bus: 3.3.8 (we are upgrading to 4.3.8 since having duplicates results in infinite loops on the old version 🙃 )