Skip to content

Commit f715924

Browse files
kinghercjfreden
authored andcommitted
ReplicationOperation should fail gracefully (elastic#115341)
Problem: finishAsFailed could be called asynchronously in the middle of operations like runPostReplicationActions which try to sync the translog. finishAsFailed immediately triggers the failure of the resultListener which releases the index shard primary operation permit. This means that runPostReplicationActions may try to sync the translog without an operation permit. Solution: We refactor the infrastructure of ReplicationOperation regarding pendingActions and the resultListener, by replacing them with a RefCountingListener. This way, if there are async failures, they are aggregated, and the result listener is called once, after all mid-way operations are done. For the specific error we got in issue elastic#97183, this means that a call to onNoLongerPrimary (which can happen if we fail to fail a replica shard or mark it as stale) will not immediately release the primary operation permit and the assertion in the translog sync will be honored. Fixes elastic#97183
1 parent 5c85ef3 commit f715924

File tree

2 files changed

+214
-195
lines changed

2 files changed

+214
-195
lines changed

0 commit comments

Comments
 (0)