Replies: 6 comments
-
@spjoe Can you provide the code to reproduce the issue as Maven or Gradle project? Thanks. |
Beta Was this translation helpful? Give feedback.
-
And full broker logs at debug level from all nodes. Cheers. |
Beta Was this translation helpful? Give feedback.
-
Converting to a discussion because we don't have all the information needed to investigate a report like this. |
Beta Was this translation helpful? Give feedback.
-
These messages may or may not have made it to any of the nodes. We have seen node failure detection on Kubernetes taking 30-60 seconds, so this could be something very similar to #9209 but in a different place. |
Beta Was this translation helpful? Give feedback.
-
@spjoe I noticed the publisher code does not check the confirmation status. The client library calls the confirmation callback after 30 seconds if a message has not been confirmed. The application can choose to republish the message in such a case. You should either print the payload only if the message is actually confirmed or set the confirm timeout to something much longer. You can set it to |
Beta Was this translation helpful? Give feedback.
-
BTW, 100 KB is rather a small value for stream segments, 10 MB would be more reasonable for a 100-MB-max-length stream. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
We noticed if we do fail-over testing using streams, then messages are lost
Note: If we do similar fail-over test with quorum queues, then no messages are missing.
Reproduction steps
ps-rabbitmq-stream-client.zip
A small client application was written to reproduce this issue. The application consists of a consumer and a producer. The producer sends messages in the following format:
The consumer application uses the timestamp to calculate a latency. The sequence number is used to identify gaps and duplication of messages.
The producer prints every confirmed message to stdout.
The consumer prints every time a message is received the following.
An example output we have on the consumer side is the following:
This example shows that at first the messages do come with a ~40 seconds delay(messages from 5199 until 5998) and then there is a gap of 3298 messages (5998-6296). With the given publish rate of 100 per second these are around 33 seconds worth of messages lost.
The corresponding producer side:
Producer:
This shows that all messages are confirmed but the confirms are out of order.
Expected behavior
That no message is lost.
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions