Replies: 10 comments 24 replies
-
I will convert this issue to a GitHub discussion. Currently GitHub will automatically close and lock the issue even though your question will be transferred and responded to elsewhere. This is to let you know that we do not intend to ignore this but this is how the current GitHub conversion mechanism makes it seem for the users :( |
Beta Was this translation helpful? Give feedback.
-
Publishing under such condition — constant server restarts — can entirely depend on what the client does with confirmations (acknowledgements) and whether those acknowledgements even get to the client (and how that's handled). Only a traffic capture can tell what actually reaches both ends. |
Beta Was this translation helpful? Give feedback.
-
Note message offsets may not be contiguous (offset tracking information is stored in the stream itself with entries that can have their own offset). Subscribing from 601 and getting 602+ can be a normal situation, because 601 happens to be an offset tracking entry. What is not normal is if you subscribe from the beginning and see that you get a message with offset 601 at some point. But this must be all on the same stream :-) From what you're seeing, it seems it's the case (subscribing at 601, getting a message with offset 602, subscribing on the same stream from the beginning, seeing a message at 601). Can you confirm this? Do you have a program or even better some steps to reproduce? |
Beta Was this translation helpful? Give feedback.
-
I can confirm that problem. I encountered it when working on my library: https://gitlab.com/wrobell/rbfly To replicate with the library, please checkout master and apply this patch to insert some print statements for convenience
Run script to receive data from a stream:
Publish data to to a stream, one by one (no batching)
Now, restart RabbitMQ from time to time. Both scripts will reconnect and the receiving stream data script will output first message after reconnection
Offset 30598 was requested, but message with offset 30599 was received instead. You can kill both scripts now. Run stream data receiving script again, but requesting offset 30598
I was able to receive the message with requested offset this time. I will try to get some data with Wireshark next. |
Beta Was this translation helpful? Give feedback.
-
Another try and this is what I see in the receiving script output
I would expect to receive message with offset 47419 and data b'0000047419' first. This is what I captured with Wireshark (my annotations prefixed with
Please note that message content, published messages ids and offsets of delivered messages are the same numbers because there was no data in the stream initially. To summarize:
According to the above Wireshark log, RabbitMQ does not send message with offset 47419 at all. |
Beta Was this translation helpful? Give feedback.
-
Anything in the server logs?
On Mon, 13 Dec 2021 at 21:35, wrobell ***@***.***> wrote:
Another try and this is what I see in the receiving script output
INFO:rbfly.streams.client:(re)subscribed to stream, name=test-perf-stream, subscription id=1, offset=Offset.OFFSET(47419)
DeliveredMessage(timestamp=1639427463, offset=47420, data=b'0000047420')
I would expect to receive message with offset 47419 and data b'0000047419'
first.
This is what I captured with Wireshark (my annotations prefixed with ###)
$ tshark -f 'tcp port 5552' -i lo -E header=yes -E separator=, -E quote=d -T fields -e frame.time -e tcp.port -e data
### receiver: subscribe to stream, offset type 4, offset value 47419 (0004000000000000b93b); request fails due to stream not available yet
"Dec 13, 2021 20:31:01.814753031 GMT","49490,5552","0000002b0007000100000005010010746573742d706572662d73747265616d0004000000000000b93b000a00000000"
"Dec 13, 2021 20:31:01.814777896 GMT","5552,49490",
"Dec 13, 2021 20:31:01.814903121 GMT","5552,49490","0000000a80070001000000050006"
"Dec 13, 2021 20:31:01.814905933 GMT","49490,5552",
### receiver: subscribe again after 1 sec (please note that no confirmation yet)
"Dec 13, 2021 20:31:02.814958733 GMT","49490,5552","0000002b0007000100000006010010746573742d706572662d73747265616d0004000000000000b93b000a00000000"
"Dec 13, 2021 20:31:02.814972048 GMT","5552,49490",
### sender: declare publisher and query publisher sequence
"Dec 13, 2021 20:31:02.816074927 GMT","49488,5552","0000003b000100010000000601001e7075626c69736865725f746573742d706572662d73747265616d5f3030310010746573742d706572662d73747265616d"
"Dec 13, 2021 20:31:02.816082230 GMT","5552,49488",
"Dec 13, 2021 20:31:02.816398723 GMT","5552,49488","0000000a80010001000000060001"
"Dec 13, 2021 20:31:02.816402141 GMT","49488,5552",
"Dec 13, 2021 20:31:02.816547091 GMT","49488,5552","0000003a0005000100000007001e7075626c69736865725f746573742d706572662d73747265616d5f3030310010746573742d706572662d73747265616d"
"Dec 13, 2021 20:31:02.816551266 GMT","5552,49488",
"Dec 13, 2021 20:31:03.700074953 GMT","5552,49488","0000001280050001000000070001000000000000b93b"
"Dec 13, 2021 20:31:03.700083639 GMT","49488,5552",
### sender: publish message with data b'0000047419' and receive confirmation
"Dec 13, 2021 20:31:03.700412876 GMT","49488,5552","0000001f000200010100000001000000000000b93c0000000a30303030303437343139"
"Dec 13, 2021 20:31:03.700416874 GMT","5552,49488",
"Dec 13, 2021 20:31:03.700720573 GMT","5552,49488","00000011000300010100000001000000000000b93c"
### sender: publish message with data b'0000047420' and receive confirmation
"Dec 13, 2021 20:31:03.700815770 GMT","49488,5552","0000001f000200010100000001000000000000b93d0000000a30303030303437343230"
"Dec 13, 2021 20:31:03.700820718 GMT","5552,49488",
"Dec 13, 2021 20:31:03.701007842 GMT","5552,49488","00000011000300010100000001000000000000b93d"
### sender: publish message with data b'0000047421' (no confirmation yet)
"Dec 13, 2021 20:31:03.701149756 GMT","49488,5552","0000001f000200010100000001000000000000b93e0000000a30303030303437343231"
"Dec 13, 2021 20:31:03.701154745 GMT","5552,49488",
### receiver: confirmation of stream subscription request received
"Dec 13, 2021 20:31:03.701288192 GMT","5552,49490","0000000a80070001000000060001"
"Dec 13, 2021 20:31:03.701294639 GMT","49490,5552",
### receiver: delivery of message with data b'0000047420' with offset 47420 (000000000000b93c)
"Dec 13, 2021 20:31:03.701347988 GMT","5552,49490","000000430008000101"
"Dec 13, 2021 20:31:03.701351149 GMT","49490,5552",
"Dec 13, 2021 20:31:03.701427738 GMT","5552,49490","50000001000000010000017db57dda140000000000000004000000000000b93c8d9465e40000000e00000028000000000000000a30303030303437343230"
"Dec 13, 2021 20:31:03.701431578 GMT","49490,5552",
### sender: receives publish confirmation for message with data b'0000047421'
"Dec 13, 2021 20:31:03.701453101 GMT","5552,49488","00000011000300010100000001000000000000b93e"
### receiver: start of delivery of message with data b'0000047421'
"Dec 13, 2021 20:31:03.701525264 GMT","5552,49490","000000430008000101"
"Dec 13, 2021 20:31:03.701528726 GMT","49490,5552",
### sender: publish message with data b'0000047422' (no confirmation yet)
"Dec 13, 2021 20:31:03.701562330 GMT","49488,5552","0000001f000200010100000001000000000000b93f0000000a30303030303437343232"
"Dec 13, 2021 20:31:03.701568709 GMT","5552,49488",
### receiver: continue receiving delivery of message with data b'0000047421'
"Dec 13, 2021 20:31:03.701573405 GMT","5552,49490","50000001000000010000017db57dda150000000000000004000000000000b93dfa9355720000000e00000028000000000000000a30303030303437343231"
Please note that message content, published messages ids and offsets of
delivered messages are the same numbers because there was no data in the
stream initially.
To summarize:
1. Sender subscribes to the stream and requests message with offset
47419.
2. Receiver declares publisher and sends messages with data
b'0000047419', b'0000047420' for which confirmation is received.
3. Receiver sends message with data b'0000047421'.
4. Sender receives stream subscription confirmation.
5. Sender receives message with offset 47420 instead of 47419.
According to the above Wireshark log, RabbitMQ does not send message with
offset 47419 at all.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3885 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJAHFATWYLSGRS3JCG3NI3UQZRIRANCNFSM5J5GYOZA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
*Karl Nilsson*
|
Beta Was this translation helpful? Give feedback.
-
I think this can also happen on application start, where publisher and subscriber applications start at the same time. |
Beta Was this translation helpful? Give feedback.
-
I tried to run your sample but did not succeed. I had to apply the patch manually, install the
I'm setting up a similar test with the Java client and see if I can reproduce. |
Beta Was this translation helpful? Give feedback.
-
OK, even though I cannot reproduce with the Java client, we may have a lead. It's likely there's a bug on the server side, but you should still be careful with the code of the Python client. If I understand correctly this line, you re-subscribe at The Java client enforces at-least-once semantics, by re-subscribing at the last dispatched offset (if offset tracking is not enabled). This is likely to trigger duplicates, but avoid losing messages. This difference between the 2 clients explain why I can't reproduce with the Java client: it asks for an offset that exists, so the server does not jump at the end of the stream. Nevertheless, this is likely we have something to fix on the server side, thanks for the thorough analysis. |
Beta Was this translation helpful? Give feedback.
-
Really it should be down to the application to decide which offset it wants
to resume at but if it is invisible to the consumer handler and you know
all offsets of the previous chunk are still available then + 1 is fine.
On Sat, 15 Jan 2022 at 09:49, wrobell ***@***.***> wrote:
I built a generic Unix package
<https://github.com/rabbitmq/rabbitmq-server/files/7861401/rabbitmq-server-generic-unix-3.9.12%2B4.g238dc7b.zip>
with a fix <rabbitmq/osiris#65>, can you try it
out? (it's a tar.xz file inside a zip file, because GitHub does not allow
to enclose ZIP files)
I have done some manual tests and so far the problem is gone. Let's see
how an automated one goes overnight.
I cannot reproduce the problem with version of RabbitMQ above. Thanks for
the fix.
Just to clarify. Does it mean, that I can keep last offset + 1 in my
library? What is the recommendation?
—
Reply to this email directly, view it on GitHub
<#3885 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJAHFESQGN6UZKFWRJD2PTUWE7LPANCNFSM5J5GYOZA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***
com>
--
*Karl Nilsson*
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
How to reproduce
Usually above scenario works. But after one of the restarts, the receiving client might not receive a message. For example, it resubscribes with offset 601, but receives message with offset 602, so sequence of received messages is
Once all messages are published and I run new instance of client reading from offset 0, the new instance will receive all messages, i.e.
I can replicate this issue every time with a bit of patience, usually after 3-5 restarts of the broker on my laptop.
I cannot replicate this issue when there is no publishing client, i.e. subscribe to a stream with offset 0 and read all messages while RabbitMQ broker is restarted from time to time.
If publisher sends data in batches, i.e. 10 messages per request, then client might not receive whole batch of messages.
Also, the requested (skipped) and received messages have the same timestamp.
RabbitMQ 3.9.11.
Beta Was this translation helpful? Give feedback.
All reactions