|
| 1 | +:description: Tha page describes how to manage the Neo4j cluster on AWS. |
| 2 | +:page-role: enterprise-edition |
| 3 | + |
| 4 | +[[neo4j-cluster-cloud-deployments]] |
| 5 | += Neo4j cluster on self-managed cloud deployments |
| 6 | + |
| 7 | +Before diving into the topic, it is important to understand basics about Neo4j's clustering. |
| 8 | + |
| 9 | +Neo4j cluster consists of a homogenous pool of servers that collectively run a number of databases. |
| 10 | +The servers can operate in two different database-hosting modes: _primary_ and _secondary_. |
| 11 | +A server can simultaneously act as a primary host for one or more databases and as a secondary host for other databases. |
| 12 | + |
| 13 | +For more details on operational and application aspects of Neo4j's clustering, refer to the xref::clustering/index.adoc[Clustering in Neo4j]. |
| 14 | + |
| 15 | +For information on how to manage databases and servers in a cluster, see respectively xref::clustering/databases.adoc[] and xref::clustering/servers.adoc[]. |
| 16 | + |
| 17 | + |
| 18 | +== Neo4j cluster on AWS |
| 19 | + |
| 20 | +Neo4j does not provide Amazon Machine Images (AMIs) with a pre-installed version of the product. |
| 21 | +The Neo4j AWS Marketplace listings (and listings on GitHub) use CloudFormation templates that deploy and configure Neo4j dynamically with a shell script. |
| 22 | + |
| 23 | + |
| 24 | +// === Neo4j cluster and auto-scaling groups on AWS |
| 25 | + |
| 26 | + |
| 27 | +=== Removing a (secondary constrained) server from the cluster |
| 28 | + |
| 29 | +Imagine you have a cluster consisting of three primary constrained servers and two secondary constrained servers. |
| 30 | +This means that three servers host primary databases and the other two host secondary databases. |
| 31 | + |
| 32 | +When performing rolling updates on Amazon Machine Images (AMIs) for secondary servers, it is important to follow a structured approach. |
| 33 | +Rotating AMIs is a common practice in such environments. |
| 34 | + |
| 35 | +However, simply removing secondary servers from the target Network Load Balancer (NLB) one by one does not prevent read requests from being routed to them. |
| 36 | +This occurs because the NLB and Neo4j server-side routing operate independently and do not share awareness of server availability. |
| 37 | + |
| 38 | +To correctly remove a secondary server from the cluster and reintroduce it after the update: |
| 39 | + |
| 40 | +. Remove the server from the NLB to stop traffic routing. |
| 41 | +. Shut down the server before proceeding with the AMI update. |
| 42 | + |
| 43 | + |
| 44 | +Here are the steps: |
| 45 | + |
| 46 | +. Remove the secondary from the AWS NLB. |
| 47 | + This prevents external clients from sending requests to the secondary. |
| 48 | + |
| 49 | +. Since Neo4j's cluster routing (server-side routing) does not use the NLB, you need to ensure that queries are not routed to the secondary server. |
| 50 | +To do this, you have to cleanly shut down the secondary. |
| 51 | + |
| 52 | +.. Run the following query to check servers are hosting all their assigned databases. |
| 53 | +The query should return no results: |
| 54 | ++ |
| 55 | +[source, cypher, role=noplay] |
| 56 | +---- |
| 57 | +SHOW SERVERS YIELD name, hosting, requestedHosting, serverId WHERE requestedHosting <> hosting |
| 58 | +---- |
| 59 | + |
| 60 | +.. Use the following query to check all databases are in their expected state. |
| 61 | +The query should return no results: |
| 62 | ++ |
| 63 | +[source, cypher, role=noplay] |
| 64 | +---- |
| 65 | +SHOW DATABASES YIELD name, address, currentStatus, requestedStatus, statusMessage WHERE currentStatus <> requestedStatus RETURN name, address, currentStatus, requestedStatus, statusMessage |
| 66 | +---- |
| 67 | + |
| 68 | +.. To stop the Neo4j service, run the following command: |
| 69 | ++ |
| 70 | +[source, shell, role=copy] |
| 71 | +---- |
| 72 | +sudo systemctl stop neo4j |
| 73 | +---- |
| 74 | ++ |
| 75 | +To configure the timeout period for waiting on active transactions to either complete or be terminated during shutdown, you can modify the environment variable `NEO4J_SHUTDOWN_TIMEOUT` using `systemctl edit neo4j.service` |
| 76 | +or the setting xref::configuration/configuration-settings.adoc#config_db.shutdown_transaction_end_timeout[`db.shutdown_transaction_end_timeout`] in _neo4j.conf_ file. |
| 77 | ++ |
| 78 | +By default, `NEO4J_SHUTDOWN_TIMEOUT` is set to 120 seconds and `db.shutdown_transaction_end_timeout` -- to 10 seconds. |
| 79 | ++ |
| 80 | +If the shutdown process exceeds these limits, it is considered failed. |
| 81 | +You may need to increase the values if the system serves long-running transactions. |
| 82 | + |
| 83 | +.. Verify that the shutdown process has finished successfully by checking the _neo4j.log_ for relevant log messages confirming the shutdown. |
| 84 | + |
| 85 | + |
| 86 | +. When everything is updated or fixed, start the secondaries one by one again. |
| 87 | +.. Run `systemctl start neo4j`. |
| 88 | +.. Once the server has been restarted, confirm it is running successfully. |
| 89 | ++ |
| 90 | +Run the following command and check the server has state `Enabled` and health `Available`. |
| 91 | ++ |
| 92 | +[source, cypher, role=noplay] |
| 93 | +---- |
| 94 | +SHOW SERVERS WHERE name = [server-id]; |
| 95 | +---- |
| 96 | + |
| 97 | +.. Confirm that the server has started all the databases that it should. |
| 98 | ++ |
| 99 | +This command shows any databases that are not in their expected state: |
| 100 | ++ |
| 101 | +[source, cypher, role=noplay] |
| 102 | +---- |
| 103 | +SHOW DATABASES YIELD name, address, currentStatus, requestedStatus, serverID WHERE currentStatus <> requestedStatus AND serverID = [server-id] RETURN name, address, currentStatus, requestedStatus |
| 104 | +---- |
| 105 | + |
| 106 | +. Reattach the secondary to the NLB. |
| 107 | +Once the secondary server is stable and caught up, add it back to the AWS NLB target group. |
| 108 | + |
| 109 | + |
0 commit comments