Skip to content

Conversation

@rodrigo-molina
Copy link
Contributor

Type of change

  • Enhancement / new feature

Description

Currently, the Kafka Connect connector autoRestart configuration has a fixed max backoff cap of 60 minutes.

This PR introduces a maxBackoffMinutes property to make the cap configurable. If not set, it defaults to 60 minutes, preserving backward compatibility.

Checklist

  • Write tests
  • Make sure all tests pass <= I’m getting some connection timeouts in Minikube tests with make all, but they pass when run individually.
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

Add support for `maxBackoffMinutes` property to configure the maximum
backoff cap for automatic restarts of failed connectors and tasks.
Default remains 60 minutes if not set.

Signed-off-by: rodrigo-molina <[email protected]>
Signed-off-by: rodrigo-molina <[email protected]>
@rodrigo-molina rodrigo-molina force-pushed the feat/max-backoff-for-auto-restart-kafka-connect-connector branch from a297e6d to 72c8219 Compare September 29, 2025 16:03
@codecov
Copy link

codecov bot commented Sep 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.60%. Comparing base (d613305) to head (72c8219).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11946      +/-   ##
============================================
+ Coverage     67.59%   67.60%   +0.01%     
- Complexity     7102     7103       +1     
============================================
  Files           574      574              
  Lines         28179    28185       +6     
  Branches       3199     3199              
============================================
+ Hits          19047    19054       +7     
+ Misses         7805     7802       -3     
- Partials       1327     1329       +2     
Files with missing lines Coverage Δ
...strimzi/api/kafka/model/connector/AutoRestart.java 52.63% <100.00%> (+12.63%) ⬆️
...ter/operator/assembly/AbstractConnectOperator.java 78.58% <100.00%> (+0.05%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. However, thsi should probably have a proposal first given it is an APi change with unclear use-case?

@rodrigo-molina
Copy link
Contributor Author

Hey @scholzj,

Thanks for the quick reply! Sure thing, what’s the preferred channel for the proposal? GitHub Issues, Discussions, or can we continue here in the PR?

The main motivation for this change is to shorten the maximum retrial interval to align it with the retry policies we use on our platform. Using today's AutoRestart features, a faulty connector that self-recovers after a couple of hours may still wait up to an hour before being restarted. Adding a configurable maxBackoffMinutes would let us tweak and experiment with the failing patterns.

I also found it interesting that the connector resiliency needs to be handled outside of Kafka Connect itself. We are experiencing some Kafka Connect tasks giving up even with "forever retries", when setting errors.retry.timeout=-1. I still need to dig deeper into this behavior.

@scholzj
Copy link
Member

scholzj commented Sep 29, 2025

@rodrigo-molina Sorry, I forgot the details. The proposals can be opened here as a PR: https://github.com/strimzi/proposals ... this is where they will be discussed and voted on. There is a template with a structure outline. There are also many other proposals you can check out to see what they do. I guess in this case it does not need too much stuff on the implementation details. But it should cover the API, the use case(s), etc.

@rodrigo-molina
Copy link
Contributor Author

@rodrigo-molina Sorry, I forgot the details. The proposals can be opened here as a PR: https://github.com/strimzi/proposals ... this is where they will be discussed and voted on. There is a template with a structure outline. There are also many other proposals you can check out to see what they do. I guess in this case it does not need too much stuff on the implementation details. But it should cover the API, the use case(s), etc.

Thanks for the detail.

@ppatierno
Copy link
Member

@rodrigo-molina thank you for taking care about this! Can you please move this PR to draft to make it clear it's not open for reviews yet because of the proposal which needs to be written and discussed.

@rodrigo-molina rodrigo-molina marked this pull request as draft September 30, 2025 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants