feat(replica): support querying replica status via RESTful API#2377
Open
empiredan wants to merge 7 commits intoapache:masterfrom
Open
feat(replica): support querying replica status via RESTful API#2377empiredan wants to merge 7 commits intoapache:masterfrom
empiredan wants to merge 7 commits intoapache:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sometimes we need to know the current status of a replica. For example,
during offline partition split, after new partitions are generated locally,
we need to start the replica server to load the new partitions. Only after
confirming that all partition data has been successfully loaded can we
rebuild the metadata and recover the Pegasus cluster. However, there is
currently no reliable way to verify that all partition data has finished loading.
There are two possible approaches:
Check the replica server logs.
For example, if we find
"load replica successfully", we assume the partitionhas been loaded successfully; if we find
"load replica failed", we assume theloading failed.
However, the problem is that log files are automatically cleaned up once their
size or count exceeds certain thresholds. When there are a large number of
partitions, the relevant logs might already be removed before we even start
checking whether the partitions were loaded successfully.
Wait for a fixed period of time.
This approach is also impractical because we do not know when a partition
starts loading or how long it will take to load. At the same time, we cannot wait
indefinitely.
If we could directly know the current status of a replica — such as whether it is
still loading or already serving — this problem would be much easier to solve.
Therefore, this PR introduces a RESTful API to query the current status of a
replica.
Since the HTTP service is started before partition data loading begins, it is
possible to query the replica status from the replica server while partitions are
being loaded.
An example usage of the RESTful API:
If the partition is currently loading, the replica server will return the following
response in JSON format:
The currently supported statuses include:
LOADING: the replica is being loaded;NOT_FOUND: the replica does not exist;CREATING: the replica is being created;SERVING: the replica is serving;CLOSING: the replica is being closed;CLOSED: the replica has been closed;UNKNOWN: the replica is in an unknown status.