-
Notifications
You must be signed in to change notification settings - Fork 1.7k
GH-3276: Support async retry with @RetryableTopic #3523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-3276: Support async retry with @RetryableTopic #3523
Conversation
|
Thank you, @chickenchickenlove , for such a great investigation and the crack on the possible solution! |
|
Sounds good, @artembilan 👍 My Understanding 1
My Understanding 2
Which understanding is closer to what you described? |
|
The
So, I assume that your impl deals already with a I'm not sure what is And as you see, the |
|
Thanks for your description. 🙇♂️ FYI, From now on, function calls flow is : |
|
@artembilan , For example, we have
In that case,
Initially, only the However, once async failures occurs, a Thus, it can not be proper workaround. How about adding a I imagined adding a method like What are your thoughts on this approach? |
If so, it seems to work well. I made a new commit and pushed it to current branch. Would you consider this to be the right direction? If so, there are a few additional points we may need to consider 😀 I think that these are needed!
|
|
Hi, @artembilan ! When you have time, could you take a look? 🙇♂️ |
artembilan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what is going on now.
Thank you for digging further!
Now I know what is the power of single-threaded processors and how to handle its behavior.
So, I have only one simple concern about public API and after that it would be great to mention this feature in the docs.
Re. error handling of the error handler.
See invokeErrorHandler():
try {
handled = this.commonErrorHandler.handleOne(rte, cRecord, this.consumer,
KafkaMessageListenerContainer.this.thisOrParentContainer);
}
catch (Exception ex) {
this.logger.error(ex, "ErrorHandler threw unexpected exception");
}
So, that should be enough for now to close all the gaps we have so far.
spring-kafka/src/main/java/org/springframework/kafka/listener/MessageListener.java
Outdated
Show resolved
Hide resolved
|
@artembilan , sorry to bother you.
I can't understand your comments 😢. |
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Outdated
Show resolved
Hide resolved
| @FunctionalInterface | ||
| public interface MessageListener<K, V> extends GenericMessageListener<ConsumerRecord<K, V>> { | ||
|
|
||
| default void setCallbackForAsyncFailureQueue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you renamed method to Queue?
What was wrong with a Callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes to setCallbackForAsyncFail(...)!
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
artembilan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for so lengthy review, but in my feeling we are close for merge, so I'd like to clear up things around.
Thank you for incredible work you've done with this flaky use-case!
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...rg/springframework/kafka/retrytopic/AsyncMonoFutureRetryTopicClassLevelIntegrationTests.java
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Show resolved
Hide resolved
| this.genericListener instanceof KafkaBackoffAwareMessageListenerAdapter<?, ?>) { | ||
| KafkaBackoffAwareMessageListenerAdapter<K, V> genListener = | ||
| (KafkaBackoffAwareMessageListenerAdapter<K, V>) this.genericListener; | ||
| if (genListener.getDelegate() instanceof RecordMessagingMessageListenerAdapter<K, V>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed to extract pattern variable: https://www.baeldung.com/java-pattern-matching-instanceof
| @@ -1,5 +1,5 @@ | |||
| /* | |||
| * Copyright 2015-2019 the original author or authors. | |||
| * Copyright 2015-2024 the original author or authors. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This apparently has to be reverted as well.
I mean you revert, but don't run check locally.
| public void onMessage(ConsumerRecord<K, V> data, Consumer<?, ?> consumer) { | ||
| onMessage(data, null, consumer); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: all the changes in this class have to be reverted.
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Show resolved
Hide resolved
| weeded.incrementAndGet(); | ||
| } | ||
| }); | ||
| assertThat(weeded.get()).isEqualTo(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this reflection stuff is related to the logic we would like to cover with this tests?
I thought we are pursuing a goal to verify that async failures are retried.
How does KafkaAdmin interaction affects an expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AsyncCompletableFutureRetryTopicClassLevelIntegrationTests demonstrates that the async retry feature works well in RetryTopicClassLevelIntegrationTests. It means the synchronous condition. (ConsumerRecord succeed or fail one by one)
So, I did copy and paste RetryTopicClassLevelIntegrationTests and replace KafkaListener's return type to Mono<?> or CompletableFuture<?>.
In the RetryTopicClassLevelIntegrationTests, weeded seems to demonstrate that some new topics should be created.

As a result of admin.newTopics(), instance weededTopics has two types (TopicForRetryable, NewTopic).
I think that the writer of RetryTopicClassLevelIntegrationTests want to verify whether expected NewTopics are created.
If there are no major issues, I would like to follow the assumptions of RetryTopicClassLevelIntegrationTests as well.
This is because I believe that AsyncRetry should also satisfy the synchronous retry condition where offsets are processed sequentially, given that AsyncRetry may or may not process offsets in sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see how this weeded are relevant to async handling.
That is really not a part of retry topic behavior, but rather its configuration.
So, having same verifications in two different places is redundant.
...ngframework/kafka/retrytopic/AsyncCompletableFutureRetryTopicClassLevelIntegrationTests.java
Outdated
Show resolved
Hide resolved
...java/org/springframework/kafka/retrytopic/AsyncCompletableFutureRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
8b2aebb to
645d7ce
Compare
...java/org/springframework/kafka/retrytopic/AsyncCompletableFutureRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
|
@artembilan , I have a question! Don't we need test codes for this situation? 🤔
I didn't wrote any test codes for this situation. Do you think I should write it? What are your thoughts? |
Sorry, I'm not totally sure in your question. |
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Show resolved
Hide resolved
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Outdated
Show resolved
Hide resolved
Sorry for lack of explanation 🙇♂️ For your information, However, in the |
That's exactly what I meant about checking against |
| this.observationEnabled = this.containerProperties.isObservationEnabled(); | ||
|
|
||
| if (!AopUtils.isAopProxy(this.genericListener) && | ||
| this.genericListener instanceof AbstractDelegatingMessageListenerAdapter<?>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think your previous solution was correct.
We have to set callback only in case of a KafkaBackoffAwareMessageListenerAdapter which is indeed an indicator that we are in retry topic scenario.
This way we won't break all other use-cases where such an error handling would be unexpected.
As per your advice, I'm glad I debugged the AsyncListener test code. When you have time, Please take another look! 🙇♂️ |
artembilan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling locally for final review and possible merge.
|
Merged as 9eafa61. thank you for awesome contribution as always! So, I've accepted those new tests as well, although they are indeed some kind of slow. A couple remarks about contribution overall:
See more info here: https://cbea.ms/git-commit/ |
|
Thanks for your time and awesome review, always! 🙇♂️ |
|
So, yeah, we really got some failing tests already from this story: |
|
I see 🥲. |

Motivation:
From now on, when the listener end point method returns
CompletableFuture<?>andMono<?>successfully and bothCompletableFuture<?>andMono<?>fail to be completed due to unexpected error, these failure record never retried even if a developer set@Retryabletopic.If we supports async retry with @RetryableTopic, we expect that spring-kafka's will be improved.
Changes
ConsumerRecordwhen bothCompletableFuture<?>andMono<?>was failed.Idea
The Kafka's
Consumerhas soft lock, so multiple threads can not access the consumer simultaneously.It means that the thread in
ThreadExecutorwhich handleCompletableFutureorMonocannot access the consumer.my idea is very simple.
the
KafkaMessageListenerContainerhasConcurrentLinkedDequeas its field.Then, when
MessagingMessageListenerContaineraddasyncSuccess()andasyncFailure()toCompletableFuture<?>andMono<?>as hooks.If these future object failed to be completed, future object will call
MessagingMessageListenerContainer.asyncFailure().When
MessagingMessageListenerContainer.asyncFailure()is called,FailedRecordTuplewill be put toConcurrentLinkedDeque.Then, next loop,
KafkaMessageListenerContainerwill callhandleAsyncFailure().then,
KafkaMessageListenerContainersfollowsFailedRecordTracker's logic.Result
========== Previous Context(below) ========
Background
I investigate how to support
@RetryableTopicfor async.I think it is difficult task for me to solve it. 😓
Anyway, challenges are a good thing, so I dug a little deeper. 😅
First of all, i want to discuss with the reviewers about which direction i should take for the implementation.
I implemented an ErrorHandler that sends messages to
RetryandDLTtopics when an error occurs, passing it as a callback toMessagingMessageListenerAdapterand inserting it intoMessagingMessageListeneerAdapter#asyncFailure()to send messages to Retry and DLT topics. Although this method has some issues, it can achieve the intended purpose.You can check successful test cases below.
shouldRetryFirstFutureTopic
shouldRetryFirstMonoTopic
shouldRetrySecondFutureTopic
shouldRetrySecondMonoTopic
shouldRetryThirdFutureTopicWithTimeout
shouldRetryThirdMonoTopicWithTimeout
shouldRetryFourthFutureTopicWithNoDlt
shouldRetryFourthMonoTopicWithNoDlt
shouldGoStraightToDltInFuture
shouldGoStraightToDltInMono
However, i realized that it is hard for

spring-kafkato supportback-offto async kafka listener endpoint.The
Backoff timeis determined whenKafkaMessageListenerContainercallsinvokeErroHandler().At that time,
FailedRecordis computed byThread.However, it is hard to imagine that
CompletableFutureandMonoalways will be executed in same thread.it always will affects
nextBackOff.Thus,
spring-kafkacannot supportBackOff Retryin async mode.IMHO, I think that we have 3 options.
1.
@RetryableTopicdoes not supportbackofffeatures inasyncmode.in this case, new kafka header is needed to distinguish between
syncandasync.the other one is that always use minimum backoff time \to avoid
KafkaBackoffException2. Introduce micrometer/context-propagation to support
backofffeatures in async mode.For example, https://spring.io/blog/2023/03/28/context-propagation-with-project-reactor-1-the-basics.
I believe that we can infer proper
nextBackOffTimeby using context-propagation to store and restore state in eachthread.3. Big refactoring.
But i have no idea. maybe
SyncKafkaMessageListenerContainerandAsyncKafkaMessageListenerContainer?Thanks for reading this description.
What do you think? Please let me know your opinion.
Thanks in advance! 🙇♂️