Skip to content

feat(replica): support querying replica status via RESTful API#2377

Open
empiredan wants to merge 7 commits intoapache:masterfrom
empiredan:restful-api-get-replica-status
Open

feat(replica): support querying replica status via RESTful API#2377
empiredan wants to merge 7 commits intoapache:masterfrom
empiredan:restful-api-get-replica-status

Conversation

@empiredan
Copy link
Contributor

@empiredan empiredan commented Mar 4, 2026

Sometimes we need to know the current status of a replica. For example,
during offline partition split, after new partitions are generated locally,
we need to start the replica server to load the new partitions. Only after
confirming that all partition data has been successfully loaded can we
rebuild the metadata and recover the Pegasus cluster. However, there is
currently no reliable way to verify that all partition data has finished loading.

There are two possible approaches:

  1. Check the replica server logs.
    For example, if we find "load replica successfully", we assume the partition
    has been loaded successfully; if we find "load replica failed", we assume the
    loading failed.
    However, the problem is that log files are automatically cleaned up once their
    size or count exceeds certain thresholds. When there are a large number of
    partitions, the relevant logs might already be removed before we even start
    checking whether the partitions were loaded successfully.

  2. Wait for a fixed period of time.
    This approach is also impractical because we do not know when a partition
    starts loading or how long it will take to load. At the same time, we cannot wait
    indefinitely.

If we could directly know the current status of a replica — such as whether it is
still loading or already serving — this problem would be much easier to solve.
Therefore, this PR introduces a RESTful API to query the current status of a
replica.

Since the HTTP service is started before partition data loading begins, it is
possible to query the replica status from the replica server while partitions are
being loaded.

An example usage of the RESTful API:

GET http://1.2.3.4:34801/replica/status?app_id=1&partition_index=2

If the partition is currently loading, the replica server will return the following
response in JSON format:

{"status": "LOADING"}

The currently supported statuses include:

  • LOADING: the replica is being loaded;
  • NOT_FOUND: the replica does not exist;
  • CREATING: the replica is being created;
  • SERVING: the replica is serving;
  • CLOSING: the replica is being closed;
  • CLOSED: the replica has been closed;
  • UNKNOWN: the replica is in an unknown status.

@github-actions github-actions bot added the cpp label Mar 4, 2026
@empiredan empiredan marked this pull request as ready for review March 6, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant