Skip to content

fix(replica): fix incorrect error code when secondary replica disk status is abnormal#2387

Open
limowang wants to merge 2 commits intoapache:masterfrom
limowang:fix/disk_abnormal
Open

fix(replica): fix incorrect error code when secondary replica disk status is abnormal#2387
limowang wants to merge 2 commits intoapache:masterfrom
limowang:fix/disk_abnormal

Conversation

@limowang
Copy link
Collaborator

@empiredan
Copy link
Contributor

@limowang Thank you very much for helping fix this issue! Please add more precise details to the description, including what the symptoms of the bug are, what the root cause analysis shows, and what fixes were made. This information will later be included in the commit message, so please ensure it is accurate.

response_client_write(request, disk_status_to_error_code(_dir_node->status));
} else {
// Secondary replica disk is abnormal but primary is OK
response_client_write(request, ERR_REPLICATION_FAILURE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ERR_REPLICATION_FAILURE error code appears to have been reserved early on, and no history of it being used has been found. The reason for using ERR_REPLICATION_FAILURE seems to be simply to reuse an existing error code to indicate that the issue occurred during replication to a secondary replica?

As I understand, your original intention was to use ERR_REPLICATION_FAILURE to distinguish whether the problem is on the primary or the secondary. However, based on the code, the error here is quite clear—it is either a disk issue on the primary or on the secondary. In fact, the replication to the secondary has not even happened yet.

For faster issue diagnosis, would it make sense to replace it with disk-related error codes such as ERR_DISK_INSUFFICIENT and ERR_DISK_IO_ERROR? This way, we can quickly identify that a machine in the cluster has a disk problem—either running out of space or encountering I/O errors—which can typically be detected quickly through monitoring metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants