fix(replica): fix incorrect error code when secondary replica disk status is abnormal by limowang · Pull Request #2387 · apache/incubator-pegasus

limowang · 2026-03-17T02:40:33Z

…atus is abnormal

empiredan · 2026-03-17T08:54:14Z

@limowang Thank you very much for helping fix this issue! Please add more precise details to the description, including what the symptoms of the bug are, what the root cause analysis shows, and what fixes were made. This information will later be included in the commit message, so please ensure it is accurate.

empiredan · 2026-03-17T09:03:28Z

src/replica/replica_2pc.cpp

+            response_client_write(request, disk_status_to_error_code(_dir_node->status));
+        } else {
+            // Secondary replica disk is abnormal but primary is OK
+            response_client_write(request, ERR_REPLICATION_FAILURE);


The ERR_REPLICATION_FAILURE error code appears to have been reserved early on, and no history of it being used has been found. The reason for using ERR_REPLICATION_FAILURE seems to be simply to reuse an existing error code to indicate that the issue occurred during replication to a secondary replica?

As I understand, your original intention was to use ERR_REPLICATION_FAILURE to distinguish whether the problem is on the primary or the secondary. However, based on the code, the error here is quite clear—it is either a disk issue on the primary or on the secondary. In fact, the replication to the secondary has not even happened yet.

For faster issue diagnosis, would it make sense to replace it with disk-related error codes such as ERR_DISK_INSUFFICIENT and ERR_DISK_IO_ERROR? This way, we can quickly identify that a machine in the cluster has a disk problem—either running out of space or encountering I/O errors—which can typically be detected quickly through monitoring metrics.

limowang added 2 commits March 17, 2026 10:32

fix(replica): fix incorrect error code when secondary replica disk st…

04f11e4

…atus is abnormal

fix(replica): fix incorrect error code when secondary replica disk st…

515697b

…atus is abnormal

github-actions bot added java-client cpp labels Mar 17, 2026

limowang requested review from acelyc111 and empiredan March 17, 2026 04:38

empiredan reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(replica): fix incorrect error code when secondary replica disk status is abnormal#2387

fix(replica): fix incorrect error code when secondary replica disk status is abnormal#2387
limowang wants to merge 2 commits intoapache:masterfrom
limowang:fix/disk_abnormal

limowang commented Mar 17, 2026

Uh oh!

empiredan commented Mar 17, 2026

Uh oh!

empiredan Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

limowang commented Mar 17, 2026

Uh oh!

empiredan commented Mar 17, 2026

Uh oh!

empiredan Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants