Skip to content
This repository was archived by the owner on Jul 30, 2025. It is now read-only.

Conversation

@ballard26
Copy link
Contributor

Errors such as OperationNotAttempted and InvalidTxnState can be pretty common while doing transactions on clusters whose nodes are being started and stopped randomly. This PR attempts to recover a producer from these errors.

@ballard26 ballard26 requested a review from jcsp December 1, 2022 04:49
@ballard26 ballard26 force-pushed the repeater-txn-error-handling branch from 965e389 to 18aca5e Compare December 1, 2022 04:50
@jcsp
Copy link
Contributor

jcsp commented Dec 1, 2022

Can you add error counters in the producer status struct?

I suspect there will be some transaction tests that expect no such errors, and some that will expect errors.

As I understand it, the transaction code should cope gracefully with intentional node shutdowns (where leaderships are transferred away first), but is allowed to hit an error when nodes unexpectedly stop. So each test case should know which kind of restarts it is doing, and be tolerant or intolerant of errors accordingly.

@ballard26
Copy link
Contributor Author

Yeah, I'll add an error counter. Will talk to Bharath about whether this behavior is always expected or not. If not I'll add flag to enable this new behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants