Skip to content

Commit 807e834

Browse files
authored
Merge pull request #170 from arangodb-helper/documentation/starter-architecture
Architecture related docs.
2 parents f09e943 + 646c2f6 commit 807e834

File tree

1 file changed

+165
-0
lines changed

1 file changed

+165
-0
lines changed
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# ArangoDB Starter Architecture
2+
3+
## What does the Starter do
4+
5+
The ArangoDB Starter is a program used to create ArangoDB database deployments
6+
on bare-metal (or virtual machines) with ease.
7+
It enables you to create everything from a simple Single server instance
8+
to a full blown Cluster with datacenter to datacenter replication in under 5 minutes.
9+
10+
The Starter is intended to be used in environments where there is no higher
11+
level orchestration system (e.g. Kubernetes or DC/OS) available.
12+
13+
## Starter versions
14+
15+
The Starter is a separate process in a binary called `arangodb` (or `arangodb.exe` on Windows).
16+
This binary has its own version number that is independent of a ArangoDB (database)
17+
version.
18+
19+
This means that Starter version `a.b.c` can be used to run deployments
20+
of ArangoDB databases with different version.
21+
For example, the Starter with version `0.11.2` can be used to create
22+
ArangoDB deployments with ArangoDB version `3.2.<something>` as well
23+
as deployments with ArangoDB version `3.3.<something>`.
24+
25+
It also means that you can update the Starter independently from the ArangoDB
26+
database.
27+
28+
Note that the Starter is also included in all binary ArangoDB packages.
29+
30+
To find the versions of you Starters & ArangoDB database, run the following commands:
31+
32+
```bash
33+
# To get the Starter version
34+
arangodb --version
35+
# To get the ArangoDB database version
36+
arangod --version
37+
```
38+
39+
## Starter deployment modes
40+
41+
The Starter supports 3 different modes of ArangoDB deployments:
42+
43+
1. Single server
44+
1. Active failover
45+
1. Cluster
46+
47+
Note: Datacenter replication is an option for the `cluster` deployment mode.
48+
49+
You select one of these modes using the `--starter.mode` command line option.
50+
51+
Depending on the mode you've selected, the Starter launches one or more
52+
(`arangod` / `arangosync`) server processes.
53+
54+
No matter which mode you select, the Starter always provides you
55+
a common directory structure for storing the servers data, configuration & log files.
56+
57+
## Starter operating modes
58+
59+
The Starter can run as normal processes directly on the host operating system,
60+
or as containers in a docker runtime.
61+
62+
When running as normal process directly on the host operating system,
63+
the Starter launches the servers as child processes and monitors those.
64+
If one of the server processes terminates, a new one is started automatically.
65+
66+
When running in a docker container, the Starter launches the servers
67+
as separate docker containers, that share the volume namespace with
68+
the container that runs the Starter. It monitors those containers
69+
and if one terminates, a new container is launched automatically.
70+
71+
## Starter data-directory
72+
73+
The Starter uses a single directory with a well known structure to store
74+
all data for its own configuration & logs, as well as the configuration,
75+
data & logs of all servers it starts.
76+
77+
This data directory is set using the `--starter.data-dir` command line option.
78+
It contains the following files & sub-directories.
79+
80+
- `setup.json` The configuration of the "cluster of Starters".
81+
For details see below. DO NOT edit this file.
82+
- `arangodb.log` The log file of the Starter
83+
- `single<port>`, `agent<port>`, `coordinator<port>`, `dbserver<port`>: directories for
84+
launched servers. These directories contain among others the following files:
85+
- `apps`: A directory with Foxx applications
86+
- `data`: A directory with database data
87+
- `arangod.conf`: The configuration file for the server. Editing this file is possible, but not recommended.
88+
- `arangod.log`: The log file of the server
89+
- `arangod_command.txt`: File containing the exact command line of the started server (for debugging purposes only)
90+
91+
## Running on multiple machines
92+
93+
For the `activefailover` & `cluster` mode, it is required to run multiple
94+
Starters, as every Starter will only launch a subset of all servers needed
95+
to form the entire deployment.
96+
For example in `cluster` mode, a Starter will launch a single agent, a single dbserver
97+
and a single coordinator.
98+
99+
It is the responsibility of the user to run the Starter on multiple machines such
100+
that enough servers are started to form the entire deployment.
101+
The minimum number of Starters needed is 3.
102+
103+
The Starters running on those machines need to know about each other's existence.
104+
In order to do so, the Starters form a "cluster" of their own (not to be confused
105+
with the ArangoDB database cluster).
106+
This cluster of Starters is formed from the values given to the `--starter.join`
107+
command line option. You should pass the addresses (`<host>:<port>`) of all Starters.
108+
109+
For example a typical commandline for a cluster deployment looks like this:
110+
111+
```bash
112+
arangodb --starter.mode=cluster --starter.join=hostA:8528,hostB:8528,hostC:8528
113+
# this command is run on hostA, hostB and hostC.
114+
```
115+
116+
The state of the cluster (of Starters) is stored in a configuration file called
117+
`setup.json` in the data directory of every Starter and the ArangoDB
118+
agency is used to elect a master among all Starters.
119+
120+
The master Starter is responsible for maintaining the list of all Starters
121+
involved in the cluster and their addresses. The slave Starters (all Starters
122+
except the elected master) fetch this list from the master Starter on regular
123+
basis and store it to its own `setup.json` config file.
124+
125+
Note: The `setup.json` config file MUST NOT be edited manually.
126+
127+
## Running on multiple machines (under the hood)
128+
129+
As mentioned above, when the Starter is used to create an `activefailover`
130+
or `cluster` deployment, it first creates a "cluster" of Starters.
131+
132+
These are the steps taken by the Starters to bootstrap such a deployment
133+
from scratch.
134+
135+
1. All Starters are started (either manually or by some supervisor)
136+
1. All Starters try to read their config from `setup.json`.
137+
If that file exists and is valid, this bootstrap-from-scratch process
138+
is aborted and all Starters go directly to the `running` phase described below.
139+
1. All Starters create a unique ID
140+
1. The list of `--starter.join` arguments is sorted
141+
1. All Starters request the unique ID from the first server in the sorted `--starter.join` list,
142+
and compares the result with its own unique ID.
143+
1. The Starter that finds its own unique ID, is continuing as `bootstrap master`
144+
the other Starters are continuing as `bootstrap slaves`.
145+
1. The `bootstrap master` waits for at least 2 `bootstrap slaves` to join it.
146+
1. The `bootstrap slaves` contact the `bootstrap master` to join its cluster of Starters.
147+
1. Once the `bootstrap master` has received enough (at least 2) requests
148+
to join its cluster of Starters, it continues with the `running` phase.
149+
1. The `bootstrap slaves` keep asking the `bootstrap master` about its state.
150+
As soon as they receive confirmation to do so, they also continue with the `running` phase.
151+
152+
In the `running` phase all Starters launch the desired servers and keeps monitoring those
153+
servers. Once a functional agency is detected, all Starters will try to be
154+
`running master` by trying to write their ID in a well known location in the agency.
155+
The first Starter to succeed in doing so wins this master election.
156+
157+
The `running master` will keep writing its ID in the agency in order to remaining
158+
the `running master`. Since this ID is written with a short time-to-live,
159+
other Starters are able to detect when the current `running master` has been stopped
160+
or is no longer responsible. In that case the remaining Starters will perform
161+
another master election to decide who will be the next `running master`.
162+
163+
API requests that involve the state of the cluster of Starters are always answered
164+
by the current `running master`. All other Starters will refer the request to
165+
the current `running master`.

0 commit comments

Comments
 (0)