Skip to content

Commit 9f6313a

Browse files
authored
Edit status check (#1836)
Minor improvements to the status check page, mainly aligning the formatting closer to the other monitoring pages: - Using tables for the arguments and return values - Change title to 'Monitor replication status' - Add an example of sample output
2 parents 3159e09 + e8f816a commit 9f6313a

File tree

2 files changed

+72
-34
lines changed

2 files changed

+72
-34
lines changed

modules/ROOT/pages/clustering/index.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ This chapter describes the following:
1919
** xref:clustering/monitoring/show-servers-monitoring.adoc[Monitor servers] -- The tools available for monitoring the servers in a cluster.
2020
** xref:clustering/monitoring/show-databases-monitoring.adoc[Monitor databases] -- The tools available for monitoring the databases in a cluster.
2121
** xref:clustering/monitoring/endpoints.adoc[Monitor cluster endpoints for status information] -- The endpoints and semantics of endpoints used to monitor the health of the cluster.
22-
** xref:clustering/monitoring/status-check.adoc[Cluster status check] label:new[Introduced in 5.24] -- The procedure that checks which databases are up-to-date and can participate in a successful replication.
22+
** xref:clustering/monitoring/status-check.adoc[Monitor replication status] label:new[Introduced in 5.24] -- The procedure to monitor which members of a clustered database are up-to-date and can participate in a successful replication.
2323
* xref:clustering/disaster-recovery.adoc[Disaster recovery] -- How to recover a cluster in the event of a disaster.
2424
* xref:clustering/settings.adoc[Settings reference] -- A summary of the most important cluster settings.
2525
* xref:clustering/server-syntax.adoc[Server commands reference] -- Reference of Cypher administrative commands to add and manage servers.
Lines changed: 71 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
:description: This section describes how to monitor a database's availability with the help of the cluster status check procedure.
22

33
:page-role: enterprise-edition new-5.24
4-
[[cluster-status-check]]
5-
= Cluster status check
4+
[[monitoring-replication]]
5+
= Monitor replication status
66

7-
Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in clustered databases, which in most cases means being able to write to the database.
8-
You can also use the procedure to check which members are up-to-date and can participate in a successful replication.
9-
Therefore, it is useful in determining the fault-tolerance of a clustered database as well.
10-
A third and final function is to determine the leader of the cluster.
7+
Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in clustered databases.
8+
In most cases this means a clustered database is write available.
9+
The procedure identifies which members of a clustered database are up-to-date and can participate in successful replication.
10+
Therefore, it is useful in determining the fault tolerance of a clustered database.
11+
Additionally, you can use the procedure to identify the leader of a clustered database within the cluster.
1112

1213
[NOTE]
1314
====
@@ -18,45 +19,62 @@ Apart from replication there are other stops in the write path that can potentia
1819
However, it tells that the cluster is healthy and in most cases that means that the database is write available.
1920
====
2021

21-
[[procedure-syntax]]
22-
== Syntax
22+
[[cluster-status-check]]
23+
== Cluster status check
2324

25+
*Syntax:*
2426
[source, shell]
2527
----
2628
CALL dbms.cluster.statusCheck(databases :: LIST<STRING>, timeoutMilliseconds = null :: INTEGER)
2729
----
2830

29-
* *databases:* the list of databases for which the status check should run.
30-
Providing an empty list runs the status check for all *clustered* databases on that server, i.e. the status check won't run on singles or secondaries.
31-
* *timeoutMilliseconds:* specifies how long the replication may take.
32-
Default value is 1000 milliseconds.
33-
If replication takes longer than this timeout, it will return that replication is unsuccessful.
31+
*Arguments:*
32+
33+
[options="header", cols="m,a,a"]
34+
|===
35+
| Name | Type | Description
36+
| databases | List<String> | Databases for which the status check should run.
37+
Providing an empty list runs the status check for all *clustered* databases on that server, i.e. it won't run on singles or secondaries.
38+
| timeoutMilliseconds | Integer | How long to allow for replication, before returning it was unsuccessful.
39+
Default value is 1000 milliseconds.
40+
|===
3441

42+
*Returns:*
3543

3644
The procedure returns a row for all primary members of all the requested databases where each row consists of:
3745

38-
* *database:* the database for which the `status check entry` was replicated.
39-
* *serverId:* the server id of each primary member, which did or did not participate in a successful replication of the `status check entry`.
40-
* *serverName:* the server name of each primary member.
41-
* *address:* the Bolt address of each primary member.
42-
* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction.
43-
+
44-
** `TRUE` -- if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout.
45-
** `FALSE` -- if it failed to replicate within the timeout.
46+
[options="header", cols="m,a,a"]
47+
|===
48+
| Name | Type | Description
49+
| database | String | The database for which a `status check entry` was replicated.
50+
| serverId | String | The UUID of the server, which did or did not participate in a successful replication of the `status check entry`.
51+
| serverName | String | The friendly name of the server, or its UUID if no name is set.
52+
| address | String | The address of the Bolt port for the server.
53+
| replicationSuccessful | Boolean | Indicates if the server (on which the procedure is run) can replicate a transaction.
54+
| memberStatus | String | The status of each primary member.
55+
| recognisedLeader | String | The server id of the perceived leader of each primary member.
56+
| recognisedLeaderTerm | Integer | The term of the perceived leader of each primary member.
57+
If the members report different leaders, the one with the highest term should be trusted.
58+
| requester | Boolean | Whether a server is the requester or not.
59+
| error | String | Contains the error message if one is present.
60+
An example of an error is that one or more of the requested databases do not exist on the requester.
61+
|===
62+
63+
=== Possible values of `replicationSuccessful`
64+
* `TRUE` -- if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout.
65+
* `FALSE` -- if it failed to replicate within the timeout.
4666
The value is the same column-wise.
47-
A failed replication can either mean a real issue in the cluster (e.g., no leader) or that this server is too far behind in apply and can't replicate.
48-
* *memberStatus:* shows the status of each primary member.
49-
It can be `APPLYING`, `REPLICATING`, or `UNAVAILABLE`.
50-
+
51-
** `APPLYING` means that the member can replicate and is actively applying transactions.
52-
** `REPLICATING` means that the member can participate in replicating, but can't apply.
67+
A failed replication can either indicate a real issue in the cluster (e.g., no leader) or that this server is too far behind in applying updates and can't replicate.
68+
69+
=== Possible values of `memberStatus`
70+
* `APPLYING` means that the member can replicate and is actively applying transactions.
71+
* `REPLICATING` means that the member can participate in replicating, but can't apply.
5372
This state is uncommon, but may happen while waiting for the database to start and accept transactions.
54-
* *recognisedLeader:* shows the server id of the perceived leader of each primary member.
55-
* *recognisedLeaderTerm:* shows the term of the perceived leader of each primary member.
56-
If the members report different leaders, the one with the highest term should be trusted.
57-
* *requester:* is `TRUE` for the server on which the procedure is run, and `FALSE` on the remaining servers.
58-
* *error:* contains the error message if there is one.
59-
An example of an error is that one or more of the requested databases doesn't exist on the requester.
73+
* `UNAVAILABLE` means that the member is either too far behind the leader or unreachable.
74+
75+
=== Possible values of `requester`
76+
* `TRUE` -- for the server on which the procedure is run.
77+
* `FALSE` -- on the remaining servers.
6078

6179
In general, you can use the `replicationSuccessful` field to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not.
6280

@@ -74,4 +92,24 @@ Lastly, `UNAVAILABLE` members are either too far behind or unreachable.
7492
They are unhealthy and cannot add to the fault-tolerance.
7593
====
7694

95+
[[status-check-example]]
96+
== Example
97+
98+
=== Running the status check
99+
When running the cluster status check against a server, expect similar output to the following:
100+
101+
[source,queryresults,role=noplay]
102+
----
103+
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
104+
| database | serverId | serverName | address | replicationSuccessful | memberStatus | recognisedLeader | recognisedLeaderTerm | requester | error |
105+
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
106+
| "neo4j" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "localhost:7682" | TRUE | "APPLYING" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | 4 | FALSE | NULL |
107+
| "neo4j" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "localhost:7681" | TRUE | "APPLYING" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | 4 | TRUE | NULL |
108+
| "neo4j" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "localhost:7683" | TRUE | "APPLYING" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | 4 | FALSE | NULL |
109+
| "system" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "localhost:7681" | TRUE | "APPLYING" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | 1 | TRUE | NULL |
110+
| "system" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "localhost:7683" | TRUE | "APPLYING" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | 1 | FALSE | NULL |
111+
| "system" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "localhost:7682" | TRUE | "APPLYING" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | 1 | FALSE | NULL |
112+
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
113+
----
114+
77115

0 commit comments

Comments
 (0)