Skip to content

Commit b07a7b5

Browse files
committed
Addressing review comments
1 parent d93280b commit b07a7b5

File tree

1 file changed

+18
-24
lines changed

1 file changed

+18
-24
lines changed

modules/ROOT/pages/clustering/monitoring/status-check.adoc

Lines changed: 18 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
1-
:description: This section describes how to monitor a database's availability with the help of the rafted status check
2-
[role=label--new-5.24]
3-
== Rafted Status Check
1+
:description: This section describes how to monitor a database's availability with the help of the cluster status check
2+
[role=label--new-5.24 label--enterprise-edition]
3+
[[database-status-check]]
4+
== Cluster Status Check
45

5-
Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in rafted databases, which in most cases means being able to write to the database. It can also
6-
be used to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a rafted database as well. A third and final function is to determine the leader of the raft group.
6+
Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in clustered databases, which in most cases means being able to write to the database. You can also use the procedure to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a clustered database as well. A third and final function is to determine the leader of the cluster.
77

88
[NOTE]
99
====
10-
The member on which the procedure is called replicates a `status check entry` in the same raft group as the transactions, and verifies that the entry can be replicated and applied.
10+
The member on which the procedure is called replicates a dummy transaction in the same cluster as the real transactions, and verifies that it can be replicated and applied.
1111
12-
Since the entry is not applied to the transaction state machine, it's not guaranteed that the database is write available even though the status check reports that
13-
it can replicate. However, it tells that the raft group is healthy and in most cases that means that the database is write available.
12+
Since the status check doesn't replicate an actual transaction, it's not guaranteed that the database is write available even though the status check reports that it can replicate. Apart from replication there are other stops in the write path that can potentially block a transaction from being applied, e.g. issues in the database. However, it tells that the cluster is healthy and in most cases that means that the database is write available.
1413
====
1514

1615
=== Syntax
@@ -20,27 +19,22 @@ it can replicate. However, it tells that the raft group is healthy and in most c
2019
CALL dbms.cluster.statusCheck(databases :: LIST<STRING>, timeoutMilliseconds = null :: INTEGER)
2120
----
2221

23-
* *databases:* the list of databases for which the status check should run. Providing an empty list will run the
24-
status check for all *rafted* databases on that server.
25-
* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that
26-
replication is unsuccessful.
22+
* *databases:* the list of databases for which the status check should run. Providing an empty list will run the status check for all *clustered* databases on that server, i.e. the status check won't run on singles or secondaries.
23+
* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that replication is unsuccessful.
2724

2825

29-
The procedure returns a row for all raft group members of all the requested databases where each row consists of:
26+
The procedure returns a row for all primary members of all the requested databases where each row consists of:
3027

3128
* *database:* the database for which the `status check entry` was replicated.
32-
* *serverId:* the server id of each raft group member, which did or did not participate in a successful replication of the `status check entry`.
33-
* *serverName:* the server name of each raft group member.
34-
* *address:* the bolt address of each raft group member.
35-
* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate an entry in raft. Is `TRUE` if this server managed to replicate the `status check entry` to a majority of raft members within the given timeout. `FALSE`
36-
if it failed to replicate within the timeout. The value is the same column-wise. A failed replication
37-
can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in raft, and can't therefore replicate.
38-
* *memberStatus:* shows the status of each raft group member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the raft group member has raft running and is actively applying entries, including transactions.
39-
`REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions.
40-
* *recognisedLeader:* shows the server id of the perceived leader of each raft group member.
41-
* *recognisedLeaderTerm:* shows the term of the perceived leader of each raft group member. If the raft group members report different leaders, the one with the highest term should be trusted.
29+
* *serverId:* the server id of each primary member, which did or did not participate in a successful replication of the `status check entry`.
30+
* *serverName:* the server name of each primary member.
31+
* *address:* the bolt address of each primary member.
32+
* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. Is `TRUE` if this server managed to replicate the dummy transaction to a majority of raft members within the given timeout. `FALSE` if it failed to replicate within the timeout. The value is the same column-wise. A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate.
33+
* *memberStatus:* shows the status of each primary member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the member can replicate and is actively applying transactions. `REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions.
34+
* *recognisedLeader:* shows the server id of the perceived leader of each primary member.
35+
* *recognisedLeaderTerm:* shows the term of the perceived leader of each primary member. If the members report different leaders, the one with the highest term should be trusted.
4236
* *requester:* is `TRUE` for the server on which the procedure is run, and `FALSE` on the remaining servers.
43-
* *error:* contains the error message if there is one. An example of an error is that one of more of the requested databases doesn't exist on the requester.
37+
* *error:* contains the error message if there is one. An example of an error is that one or more of the requested databases doesn't exist on the requester.
4438

4539
In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not.
4640

0 commit comments

Comments
 (0)