-
Notifications
You must be signed in to change notification settings - Fork 2
Setting up: CockroachDB
The federal spending data was migrated from PostgreSQL to CockroachDB, a distributed SQL ("NewSQL") database built on a transactional and strongly-consistent key-value store. When comparing to PostgreSQL, CockroachDB stands out in its built-in scalability, although there's more latency with complex queries. In our case we're using simple filtering and summing operations on a large (and growing) database, so the tradeoff is ok.
The following cluster deployment largely follows guidance by Cockroach Labs, with adjustments to my particular architecture.
- Download the CockroachDB archive and extract the binary.
curl https://binaries.cockroachdb.com/cockroach-v19.2.2.darwin-10.9-amd64.tgz | \
tar -xJ- Copy the binary into
PATHto execute cockroach commands from any shell.
cp -i cockroach-v19.2.2.darwin-10.9-amd64/cockroach /usr/local/binLaunch 4 EC2 instances, each
- m5.xlarge (4 vCPUs and 16GB memory)
- running Ubuntu
- with 750GB root volume
- with an Elastic IP
Create a security group for the CockroachDB cluster, so that the nodes can communicate with each other.
| Type | Protocol | Port Range | Source | Description |
|---|---|---|---|---|
| Custom TCP Rule | TCP | 8080 | < your IP > | access cockroachDB admin UI |
| Custom TCP Rule | TCP | 8080 | IP range of VPC in CIDR notation | load balancer - health check communication |
| Custom TCP Rule | TCP | 26257 | private IP of instance running FedSpend app UI | communication with the application UI |
| Custom TCP Rule | TCP | 26257 | sg-xxx (security group ID) | for node X to talk to node Y |
| Custom TCP Rule | TCP | 7077 | < your IP > | cockroachDB port |
| SSH | TCP | 22 | < your IP > | SSH into machine |
| Type | Protocol | Port Range | Source | Description |
|---|---|---|---|---|
| All traffic | All | All | sg-xxx (security group ID) | for node X to receive data from node Y |
| All traffic | All | All | 0.0.0.0/0 | all download from external websites |
It's necessary for all nodes to be working from the same clock. Amazon Time Sync is a service that facilitates this and needs to be set up on all nodes.
sudo apt update
sudo apt install chrony
# here we'll open a file...
sudo nano /etc/chrony/chrony.conf
# ...and paste the following text above any pool or server statements
# -> server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
sudo /etc/init.d/chrony restart
# verify that chrony is using the 169.254.169.123 IP address
# to synchronize the time
chronyc sources -v
# verify the time synchronization metrics that are reported by chrony
chronyc trackingWe'll be using the AWS load balancing service to distribute client traffic across nodes.
[TO DO] Why LB uses so many IP addresses?
- Open the EC2 console and select Load balancers.
- Click the button Create Load Balancer on the left.
- Create a Network Load Balancer.
- Select a name for the balancer, and add a listener on port 26257.
- Under the 'Availability Zones' section, select the same VPC and subnet as the one containing the instances.
- In Step 3: Configure Routing, select any name for the target group and that they'll receive communication on port 25267. The Health checks protocol should be HTTP port 8080 with path
/health?ready=1. - In Step 4: Register Targets, register instances with the target group just created, specifying port 26257.
Make a note of the IP of the load balancer.
Generation is done locally, then the files transferred to each instance. Below is an outline, the full certificate process is available elsewhere.
- Generate the CA certificate and key.
- Create the certificate and key for the first node.
- Upload the CA certificate, node certificate, key to the first node.
- Delete the local copy of the node certificate and key.
- Repeat steps 2-4 for all other nodes.
- Create a client certificate and key for the
rootuser. Transfer to the machine you'll run workloads from.
Below is an outline, the full startup process is available elsewhere. For each node...
- SSH into the node and install CockroachDB. Copy the binary into the
PATH.
wget -qO- https://binaries.cockroachdb.com/cockroach-v19.2.2.linux-amd64.tgz | tar xvz
sudo cp -i cockroach-v19.2.2.linux-amd64/cockroach /usr/local/bin/- Run the
cockroach startcommand to prime the node to start.
Run the cockroach init command from within any node to complete the startup process and have them join together as a cluster.
Note Although guidance says to run this command locally, this is not work for me!
cockroach init --certs-dir=certs --host=<private IP of node>For secure clusters, only authorized users can access and view the Admin UI. The Admin UI can be accessed from any node in the cluster, via < node public IP >:8080.
- Launch the CockroachDB SQL client from within the node with the
rootuser certificates.
cockroach sql --certs-dir=certs --host=<load balancer IP>- Create a username and password for other users) for an account that will access the Admin UI.
CREATE USER admin WITH PASSWORD '<your-password>';- Open the Admin UI with a web browser and log in!
📬 Create an issue if you have one.
(📝 instructions accurate as of Feb 2020)