|
| 1 | +:description: This section describes how to monitor a database's availability with the help of the rafted status check |
| 2 | +[role=label--new-5.24] |
| 3 | +== Rafted Status Check |
| 4 | + |
| 5 | +Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in rafted databases, which in most cases means being able to write to the database. It can also |
| 6 | +be used to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a rafted database as well. A third and final function is to determine the leader of the raft group. |
| 7 | + |
| 8 | +[NOTE] |
| 9 | +==== |
| 10 | +The member on which the procedure is called replicates a `status check entry` in the same raft group as the transactions, and verifies that the entry can be replicated and applied. |
| 11 | +
|
| 12 | +Since the entry is not applied to the transaction state machine, it's not guaranteed that the database is write available even though the status check reports that |
| 13 | +it can replicate. However, it tells that the raft group is healthy and in most cases that means that the database is write available. |
| 14 | +==== |
| 15 | + |
| 16 | +=== Syntax |
| 17 | + |
| 18 | +[source, shell] |
| 19 | +---- |
| 20 | +CALL dbms.cluster.statusCheck(databases :: LIST<STRING>, timeoutMilliseconds = null :: INTEGER) |
| 21 | +---- |
| 22 | + |
| 23 | +* *databases:* the list of databases for which the status check should run. Providing an empty list will run the |
| 24 | +status check for all *rafted* databases on that server. |
| 25 | +* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that |
| 26 | +replication is unsuccessful. |
| 27 | + |
| 28 | + |
| 29 | +The procedure returns a row for all raft group members of all the requested databases where each row consists of: |
| 30 | + |
| 31 | +* *database:* the database for which the `status check entry` was replicated. |
| 32 | +* *serverId:* the server id of each raft group member, which did or did not participate in a successful replication of the `status check entry`. |
| 33 | +* *serverName:* the server name of each raft group member. |
| 34 | +* *address:* the bolt address of each raft group member. |
| 35 | +* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate an entry in raft. Is `TRUE` if this server managed to replicate the `status check entry` to a majority of raft members within the given timeout. `FALSE` |
| 36 | +if it failed to replicate within the timeout. The value is the same column-wise. A failed replication |
| 37 | +can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in raft, and can't therefore replicate. |
| 38 | +* *memberStatus:* shows the status of each raft group member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the raft group member has raft running and is actively applying entries, including transactions. |
| 39 | +`REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. |
| 40 | +* *recognisedLeader:* shows the server id of the perceived leader of each raft group member. |
| 41 | +* *recognisedLeaderTerm:* shows the term of the perceived leader of each raft group member. If the raft group members report different leaders, the one with the highest term should be trusted. |
| 42 | +* *requester:* is `TRUE` for the server on which the procedure is run, and `FALSE` on the remaining servers. |
| 43 | +* *error:* contains the error message if there is one. An example of an error is that one of more of the requested databases doesn't exist on the requester. |
| 44 | + |
| 45 | +In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. |
| 46 | + |
0 commit comments