Skip to content

Commit 12ffafb

Browse files
authored
Merge pull request #53 from graphistry/dev/fix-cluster-example
Fix NFS example for cluster deployment (allow followers to ingest datasets/files)
2 parents 1f8c349 + 98d6877 commit 12ffafb

File tree

1 file changed

+46
-24
lines changed

1 file changed

+46
-24
lines changed

docs/install/cluster.md

Lines changed: 46 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
**Note**: *This deployment configuration is currently **experimental** and subject to future updates.*
44

5-
This document offers step-by-step instructions for deploying **Graphistry** in a multinode environment using Docker Compose. In this architecture, a **leader** node handles dataset ingestion and manages the single PostgreSQL instance, while **follower** nodes can visualize graphs too using the shared datasets. Currently, only the leader node has permission to upload datasets and files (data ingestion), but future updates will allow follower nodes to also perform dataset and file uploads (data ingestion).
5+
This document provides step-by-step instructions for deploying **Graphistry** in a multinode environment using Docker Compose. In this architecture, both the **Leader** and **Follower** nodes can ingest datasets and files, with all nodes accessing the same **PostgreSQL** instance on the **Leader** node. As a result, **Follower** nodes can also perform data uploads, ensuring that both **Leader** and **Follower** nodes have equal access to dataset ingestion and visualization.
66

7-
The leader and followers will share datasets using a **Distributed File System**, for example, using the Network File System (NFS) protocol. This setup allows all nodes to access the same dataset directory. This configuration ensures that **Graphistry** can be deployed across multiple machines, each with different GPU configuration profiles (some with more powerful GPUs, enabling multi-GPU on multinode setups), while keeping the dataset storage centralized and synchronized.
7+
The leader and followers will share datasets using a **Distributed File System**, for example, using the **Network File System (NFS)** protocol. This setup allows all nodes to access the same dataset directory. This configuration ensures that **Graphistry** can be deployed across multiple machines, each with different **GPU** configuration profiles (some with more powerful GPUs, enabling **multi-GPU** on multinode setups), while keeping the dataset storage centralized and synchronized.
88

99
This deployment mode is flexible and can be used both in **on-premises** clusters or in the **cloud**. For example, it should be possible to use **Amazon Machine Images (AMIs)** from the [Graphistry AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-ppbjy2nny7xzk?sr=0-1&ref_=beagle&applicationId=AWSMPContessa), assigning Amazon VMs created from those images to the **leader** and **follower** roles. This allows for scalable and customizable cloud-based deployments with the same multinode architecture.
1010

@@ -13,7 +13,7 @@ This deployment mode is flexible and can be used both in **on-premises** cluster
1313
1. **Leader Node**: Handles the ingestion of datasets, PostgreSQL write operations, and exposes the required PostgreSQL ports.
1414
2. **Follower Nodes**: Connect to the PostgreSQL instance on the leader and can visualize graphs using the shared datasets. However, they do not have their own attached PostgreSQL instance.
1515
3. **Shared Dataset**: All nodes will access the dataset directory using a **Distributed File System**. This ensures that the leader and followers use the same dataset, maintaining consistency across all nodes.
16-
4. **PostgreSQL**: The PostgreSQL instance on the leader node is used by all nodes in the cluster for querying. The **Nexus** service, which provides the main dashboard for Graphistry, on the **Leader** node is responsible for managing access to the PostgreSQL database. The **Nexus** services on the **follower** nodes will use the PostgreSQL instance of the **Leader**.
16+
4. **PostgreSQL**: The PostgreSQL instance on the **Leader** node is used by all nodes for querying. The **Nexus** service on the **Leader** manages access to the database, while **Follower** nodes also use the **Leader’s** PostgreSQL instance. Both **Leader** and **Follower** nodes can perform actions like user sign-ups and settings modifications through their own **Nexus** dashboards, with changes applied system-wide for consistency across all nodes.
1717

1818
## Configuration File: `cluster.env`
1919

@@ -64,26 +64,44 @@ NFS will be used to share the dataset directory between nodes. Follow the steps
6464

6565
#### On the Leader Node (Main Machine)
6666

67-
1. **Create directories for PostgreSQL data and backups**:
67+
1. **Install NFS server**:
68+
69+
On the leader node, install the NFS server software:
6870

6971
```bash
72+
sudo apt install nfs-kernel-server
73+
```
74+
75+
This will install the necessary software for serving NFS shares to the follower nodes.
76+
77+
2. **Create directories for PostgreSQL and shared data**:
78+
79+
```bash
80+
# These directories will store PostgreSQL data and backups
7081
mkdir -p /mnt/data/shared/postgresql_data
7182
mkdir -p /mnt/data/shared/postgres_backups
72-
```
7383
74-
These directories will hold the PostgreSQL data and backups, which will be shared with follower nodes.
84+
# Create the shared directory
85+
mkdir -p /mnt/data/shared/uploads /mnt/data/shared/files /mnt/data/shared/datasets
86+
```
7587

76-
2. **Install NFS server**:
88+
3. **Set appropriate permissions on the shared directory**:
7789

78-
On the leader node, install the NFS server software:
90+
To ensure the shared directory has the correct permissions and can be written to by NFS clients, it’s important to verify and configure access properly. The user is responsible for ensuring that the shared directory has the necessary permissions to allow remote follower nodes to read, write, and modify files as needed. For instance, you may need to apply the following changes to make sure the shared directory is accessible by NFS clients:
7991

8092
```bash
81-
sudo apt install nfs-kernel-server
93+
# Set permissions to allow full access (read, write, execute) for all users
94+
sudo chmod -R 777 /mnt/data/shared/
95+
96+
# Change ownership to 'nobody:nogroup' for NFS access
97+
sudo chown -R nobody:nogroup /mnt/data/shared/
8298
```
8399

84-
This will install the necessary software for serving NFS shares to the follower nodes.
100+
This will allow all users and processes (including the remote follower instances) to read and write to the shared directory, ensuring they can ingest datasets and files. You can adjust these permissions later based on your security requirements.
85101

86-
3. **Configure NFS exports**:
102+
*Notice: The following shared directory permissions are provided as an example. Please ensure the settings align with your security policies.*
103+
104+
4. **Configure NFS exports**:
87105

88106
Edit the `/etc/exports` file to specify which directories should be shared and with what permissions. The following configuration allows the follower node (with IP `192.168.0.20`) to mount the shared directory with read/write permissions.
89107

@@ -94,14 +112,17 @@ NFS will be used to share the dataset directory between nodes. Follow the steps
94112
Add the following line to export the shared dataset directory:
95113

96114
```bash
97-
/mnt/data/shared/ 192.168.0.20(rw,sync,no_subtree_check)
115+
/mnt/data/shared/ 192.168.0.20(rw,sync,no_subtree_check,no_root_squash)
98116
```
99117

100118
- `rw`: Allows read and write access.
101119
- `sync`: Ensures that changes are written to disk before responding to the client.
102120
- `no_subtree_check`: Disables subtree checking to improve performance.
121+
- `no_root_squash`: Retains root access for the client’s root user on the shared directory, which can be necessary for certain tasks but should be used with caution due to the elevated permissions.
122+
123+
*Notice: The following NFS configuration is provided as an example. Please ensure the settings align with your security policies.*
103124

104-
4. **Export the NFS share** and restart the NFS server to apply the changes:
125+
5. **Export the NFS share** and restart the NFS server to apply the changes:
105126

106127
```bash
107128
sudo exportfs -a
@@ -110,22 +131,22 @@ NFS will be used to share the dataset directory between nodes. Follow the steps
110131

111132
#### On the Follower Node (Secondary Machine)
112133

113-
1. **Create a directory to mount the NFS share**:
134+
1. **Install NFS client**:
135+
136+
On the follower node, install the NFS client software to mount the NFS share:
114137

115138
```bash
116-
mkdir -p /home/user1/mnt/data/shared/
139+
sudo apt install nfs-common
117140
```
118141

119-
This is where the shared dataset will be mounted on the follower node.
120-
121-
2. **Install NFS client**:
122-
123-
On the follower node, install the NFS client software to mount the NFS share:
142+
2. **Create a directory to mount the NFS share**:
124143

125144
```bash
126-
sudo apt install nfs-common
145+
mkdir -p /home/user1/mnt/data/shared/
127146
```
128147

148+
This is where the shared dataset will be mounted on the follower node.
149+
129150
3. **Mount the shared NFS directory**:
130151

131152
Mount the directory shared by the leader node to the local directory on the follower node:
@@ -222,12 +243,13 @@ Once the deployment is complete, you can use the leader node to upload datasets,
222243
* Graphistry JS: https://github.com/graphistry/graphistry-js
223244
* REST API: API Docs: https://hub.graphistry.com/docs/api
224245

225-
For example, you can interact with the leader node from **PyGraphistry** like this:
246+
For example, you can interact with any node from **PyGraphistry** like this:
226247

227248
```python
228249
import graphistry
229-
leader_address = "192.168.0.10"
230-
graphistry.register(api=3, protocol="http", server=leader_address, username="user1", password="password1")
250+
server_address = "192.168.0.10" # using the leader
251+
# or using the follower (server_address=192.168.0.20)
252+
graphistry.register(api=3, protocol="http", server=server_address, username="user1", password="password1")
231253
...
232254
```
233255

0 commit comments

Comments
 (0)