Skip to content

Commit 90608c4

Browse files
authored
Improve documentation (#1975)
* Update docs and reword some sections
1 parent 9fadd74 commit 90608c4

14 files changed

+158
-66
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ To get this controller running in a local Kubernetes cluster:
7979
you can set the `BUILD_PLATFORM` env variable `BUILD_PLATFORM="linux/amd64" make rebuild-operator`.
8080
1. Run `kubectl apply -k ./config/tests/base` to create a new FoundationDB cluster with the operator.
8181

82+
_NOTE_: FoundationDB currently only publishes container images for running on `amd64`/`x64` nodes.
83+
8284
### Running Locally with nerdctl
8385

8486
Instead of Docker you can also use [nerdctl](https://github.com/containerd/nerdctl) to build and push your images.

docs/compatibility.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ in advance of the upgrade, through whatever process you need to update your
4040
clusters safely.
4141
After you updated the operator you should ensure that all clusters are in a reconciled state and all changes are applied.
4242

43-
4443
At this point, you can use the `kubectl-fdb` plugin to check your cluster specs for deprecated fields or defaults.
4544
For more information see the [kubectl-fdb plugin Readme](../kubectl-fdb/Readme.md) and the `deprecation` subcommand.
4645

docs/manual/customization.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ The operator will add a special locality to the fdbserver processes called `dns_
326326

327327
## Using Multiple Namespaces
328328

329-
Our [sample deployment](https://raw.githubusercontent.com/foundationdb/fdb-kubernetes-operator/master/config/samples/deployment.yaml) configures the operator to run in single-namespace mode, where it only manages resources in the namespace where the operator itself is running. If you want a single deployment of the operator to manage your FDB clusters across all of your namespaces, you will need to run it in global mode. Which mode is appropriate will depend on the constraints of your environment.
329+
Our [sample deployment](../../config/samples/deployment.yaml) configures the operator to run in single-namespace mode, where it only manages resources in the namespace where the operator itself is running. If you want a single deployment of the operator to manage your FDB clusters across all of your namespaces, you will need to run it in global mode. Which mode is appropriate will depend on the constraints of your environment.
330330

331331
### Single-Namespace Mode
332332

@@ -339,7 +339,7 @@ To run the controller in single-namespace mode, you will need to configure the f
339339
* A service account for the controller
340340
* The serviceAccountName field in the controller's pod spec
341341
* A `WATCH_NAMESPACE` environment variable defined in the controller's pod spec or in the arguments of the container command
342-
* A Role that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](https://raw.githubusercontent.com/FoundationDB/fdb-kubernetes-operator/master/config/samples/deployment/rbac_role.yaml) for the list of those permissions.
342+
* A Role that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](../../config/samples/deployment/rbac_role.yaml) for the list of those permissions.
343343
* A RoleBinding that binds that role to the service account for the controller
344344

345345
The sample deployment provides all of this configuration.
@@ -354,7 +354,7 @@ To run the controller in global mode, you will need to configure the following t
354354

355355
* A service account for the controller
356356
* The serviceAccountName field in the controller's pod spec
357-
* A ClusterRole that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](https://raw.githubusercontent.com/FoundationDB/fdb-kubernetes-operator/master/config/samples/deployment/rbac_role.yaml) for the list of those permissions.
357+
* A ClusterRole that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](../../config/samples/deployment/rbac_role.yaml) for the list of those permissions.
358358
* A ClusterRoleBinding that binds that role to the service account for the controller
359359

360360
You can build this kind of configuration easily from the sample deployment by changing the following things:

docs/manual/debugging.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,8 @@ Remove [storage-1] from cluster default/sample-cluster with exclude: false and s
9393

9494
**NOTE**: This is a very dangerous operation.
9595
This will delete the pod and the PVC without checking that the data has been re-replicated.
96-
You should only due this after checking that the database is available, has not had any data loss, and that the pod is currently not running. You can confirm the first and second check by looking at the cluster status.
96+
You should only due this after checking that the database is available, has not had any data loss, and that the pod is currently not running.
97+
You can confirm the first and second check by looking at the cluster status.
9798

9899
## Exclusions Not Starting Due to Missing Processes
99100

docs/manual/fault_domains.md

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,8 @@ This strategy uses the pod name as the fault domain, which allows each process t
134134

135135
## Three-Data-Hall Replication
136136

137-
**NOTE**: The support for this redundancy mode is new and might have issues. Please make sure you test this configuration in our test/QA environment.
138-
The [three-data-hall](https://apple.github.io/foundationdb/configuration.html#single-datacenter-modes) replication can be use to replicate data across three data halls, or availability zones.
137+
**NOTE**: The support for this redundancy mode is new and might have issues. Please make sure you test this configuration in your test/QA environment.
138+
The [three-data-hall](https://apple.github.io/foundationdb/configuration.html#single-datacenter-modes) replication can be used to replicate data across three data halls, or availability zones.
139139
This requires that your fault domains are properly labeled on the Kubernetes nodes.
140140
Most cloud-providers will use the well-known label [topology.kubernetes.io/zone](https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone) for this.
141141
When creating a three-data-hall replicated FoundationDBCluster on Kubernetes we have to create 3 `FoundationDBCluster` resources.
@@ -195,9 +195,9 @@ Operations across the different `FoundationDBCluster` resources are [coordinated
195195

196196
## Multi-Region Replication
197197

198-
The replication strategies above all describe how data is replicated within a data center.
198+
The replication strategies above all describe how data is replicated within a data center or a single region.
199199
They control the `zoneid` field in the cluster's locality.
200-
If you want to run a cluster across multiple data centers, you can use FoundationDB's multi-region replication.
200+
If you want to run a cluster across multiple data centers or regions, you can use FoundationDB's multi-region replication.
201201
This can work with any of the replication strategies above.
202202
The data center will be a separate fault domain from whatever you provide for the zone.
203203

@@ -286,15 +286,29 @@ spec:
286286

287287
## Coordinating Global Operations
288288

289-
When running a FoundationDB cluster that is deployed across multiple Kubernetes clusters, each Kubernetes cluster will have its own instance of the operator working on the processes in its cluster. There will be some operations that cannot be scoped to a single Kubernetes cluster, such as changing the database configuration.
290-
The operator provides a locking system to ensure that only one instance of the operator can perform these operations at a time. You can enable this locking system by setting `lockOptions.disableLocks = false` in the cluster spec. The locking system is automatically enabled by default for any cluster that has multiple regions in its database configuration, a `zoneCount` greater than 1 in its fault domain configuration, or `redundancyMode` equal to `three_data_hall`.
289+
When running a FoundationDB cluster that is deployed across multiple Kubernetes clusters, each Kubernetes cluster will have its own instance of the operator working on the processes in its cluster.
290+
There will be some operations that cannot be scoped to a single Kubernetes cluster, such as changing the database configuration.
291+
The operator provides a locking system to reduce the risk of those independent operator instance performing the same action at the same time.
292+
All actions that the operator performs like changing the configuration or restarting processes will lead to the same desired state.
293+
The locking system is only intended to reduce the risk of frequent reoccurring recoveries.
294+
295+
You can enable this locking system by setting `lockOptions.disableLocks = false` in the cluster spec.
296+
The locking system is automatically enabled by default for any cluster that has multiple regions in its database configuration, a `zoneCount` greater than 1 in its fault domain configuration, or `redundancyMode` equal to `three_data_hall`.
291297

292298
The locking system uses the `processGroupIDPrefix` from the cluster spec to identify an process group of the operator.
293299
Make sure to set this to a unique value for each Kubernetes cluster, both to support the locking system and to prevent duplicate process group IDs.
294300

295-
This locking system uses the FoundationDB cluster as its data source. This means that if the cluster is unavailable, no instance of the operator will be able to get a lock. If you hit a case where this becomes an issue, you can disable the locking system by setting `lockOptions.disableLocks = true` in the cluster spec.
301+
This locking system uses the FoundationDB cluster as its data source.
302+
This means that if the cluster is unavailable, no instance of the operator will be able to get a lock.
303+
If you hit a case where this becomes an issue, you can disable the locking system by setting `lockOptions.disableLocks = true` in the cluster spec.
296304

297-
In most cases, restarts will be done independently in each Kubernetes cluster, and the locking system will be used to ensure a minimum time between the different restarts and avoid multiple recoveries in a short span of time. During upgrades, however, all instances must be restarted at the same time. The operator will use the locking system to coordinate this. Each instance of the operator will store records indicating what processes it is managing and what version they will be running after the restart. Each instance will then try to acquire a lock and confirm that every process reporting to the cluster is ready for the upgrade. If all processes are prepared, the operator will restart all of them at once. If any instance of the operator is stuck and unable to prepare its processes for the upgrade, the restart will not occur.
305+
In most cases, restarts will be done independently in each Kubernetes cluster, and the locking system will be used to try to ensure a minimum time between the different restarts and avoid multiple recoveries in a short span of time.
306+
During upgrades, however, all instances must be restarted at the same time.
307+
The operator will use the locking system to coordinate this.
308+
Each instance of the operator will store records indicating what processes it is managing and what version they will be running after the restart.
309+
Each instance will then try to acquire a lock and confirm that every process reporting to the cluster is ready for the upgrade.
310+
If all processes are prepared, the operator will restart all of them at once.
311+
If any instance of the operator is stuck and unable to prepare its processes for the upgrade, the restart will not occur.g
298312

299313
### Deny List
300314

@@ -351,16 +365,17 @@ Depending on the requirements the operator can be configured to either prefer or
351365
The number of coordinators is currently a hardcoded mechanism based on the [following algorithm](https://github.com/FoundationDB/fdb-kubernetes-operator/blob/v0.49.2/api/v1beta1/foundationdbcluster_types.go#L1500-L1508):
352366

353367
```go
368+
// DesiredCoordinatorCount returns the number of coordinators to recruit for a cluster.
354369
func (cluster *FoundationDBCluster) DesiredCoordinatorCount() int {
355-
if cluster.Spec.DatabaseConfiguration.UsableRegions > 1 {
356-
return 9
357-
}
370+
if cluster.Spec.DatabaseConfiguration.UsableRegions > 1 || cluster.Spec.DatabaseConfiguration.RedundancyMode == RedundancyModeThreeDataHall {
371+
return 9
372+
}
358373
359-
return cluster.MinimumFaultDomains() + cluster.DesiredFaultTolerance()
374+
return cluster.MinimumFaultDomains() + cluster.DesiredFaultTolerance()
360375
}
361376
```
362377

363-
For all clusters that use more than one region the operator will recruit 9 coordinators.
378+
For all clusters that use more than one region or uses `three_data_hall`, the operator will recruit 9 coordinators.
364379
If the number of regions is `1` the number of recruited coordinators depends on the redundancy mode.
365380
The number of coordinators is chosen based on the fact that the coordinators use a consensus protocol (Paxos) that needs a majority of processes to be up.
366381
A common pattern in majority based system is to run `n * 2 + 1` processes, where `n` defines the failures that should be tolerated.
@@ -412,7 +427,6 @@ The operator supports the following classes as coordinators:
412427

413428
FoundationDB clusters that are spread across different DC's or Kubernetes clusters only support the same `coordinatorSelection`.
414429
The reason behind this is that the coordinator selection is a global process and different `coordinatorSelection` of the `FoundationDBCluster` resources can lead to an undefined behaviour or in the worst case flapping coordinators.
415-
There are plans to support this feature in the future.
416430

417431
## Next
418432

docs/manual/getting_started.md

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ You can see logs from the operator by running `kubectl logs -f -l app=fdb-kubern
1919

2020
The example below will cover creating a cluster. All subsequent examples will assume that you have just created this cluster, and will cover an operation on this cluster.
2121

22-
For more information on the fields you can define on the cluster resource, see the [go docs](https://godoc.org/github.com/FoundationDB/fdb-kubernetes-operator/pkg/apis/apps/v1beta2#FoundationDBCluster).
22+
For more information on the fields you can define on the cluster resource, see the [cluster spec docs](../cluster_spec.md).
2323

2424
For more information on version compatibility, see our [compatibility guide](/docs/compatibility.md).
2525

@@ -36,16 +36,29 @@ spec:
3636
version: 7.1.26
3737
```
3838
39-
This will create a cluster with 3 storage processes, 4 log processes, and 7 stateless processes. Each fdbserver process will be in a separate pod, and the pods will have names of the form `sample-cluster-$role-$n`, where `$n` is the process group ID and `$role` is the role for the process.
39+
This will create a cluster with 3 storage processes, 4 log processes, and 7 stateless processes.
40+
Each `fdbserver` process will be in a separate pod, and the pods will have names of the form `sample-cluster-$role-$n`, where `$n` is the process group ID and `$role` is the role for the process.
4041

41-
You can run `kubectl get foundationdbcluster sample-cluster` to check the progress of reconciliation. Once the reconciled generation appears in this output, the cluster should be up and ready. After creating the cluster, you can connect to the cluster by running `kubectl exec -it sample-cluster-log-1 -- fdbcli`.
42+
You can run `kubectl get foundationdbcluster sample-cluster` to check the progress of reconciliation.
43+
Once the reconciled generation appears in this output, the cluster should be up and ready.
44+
After creating the cluster, you can connect to the cluster by running `kubectl exec -it sample-cluster-log-1 -- fdbcli`.
4245

43-
This example requires non-trivial resources, based on what a process will need in a production environment. This means that is too large to run in a local testing environment. It also requires disk I/O features that are not present in Docker for Mac. If you want to run these tests in that kind of environment, you can try bringing in the resource requirements, knobs, and fault domain information from a [local testing example](../../config/samples/cluster.yaml).
46+
This example requires non-trivial resources, based on what a process will need in a production environment.
47+
This means that it might be too large to run in a local testing environment.
48+
It also requires disk I/O features that are not present in Docker for Mac.
49+
If you want to run these tests in that kind of environment, you can try bringing in the resource requirements, knobs, and fault domain information from a [local testing example](../../config/samples/cluster.yaml).
50+
51+
_NOTE_: FoundationDB currently only supports `amd64`/`x64`.
4452

4553
In addition to the pods, the operator will create a Persistent Volume Claim for any stateful
4654
processes in the cluster. In this example, each volume will be 128 GB.
4755

48-
By default each pod will have two containers and one init container. The `foundationdb` container will run fdbmonitor and fdbserver, and is the main container for the pod. The `foundationdb-kubernetes-sidecar` container will run a sidecar image designed to help run FDB on Kubernetes. It is responsible for managing the fdbmonitor conf files and providing FDB binaries to the `foundationdb` container. The operator will create a config map that contains a template for the monitor conf file, and the sidecar will interpolate instance-specific fields into the conf and make it available to the fdbmonitor process through a shared volume. The "Upgrading a Cluster" has more detail on we manage binaries. The init container will run the same sidecar image, and will ensure that the initial binaries and dynamic conf are ready before the fdbmonitor process starts.
56+
By default each pod will have two containers and one init container.
57+
The `foundationdb` container will run `fdbmonitor` and `fdbserver`, and is the main container for the pod. The `foundationdb-kubernetes-sidecar` container will run a sidecar image designed to help run FDB on Kubernetes.
58+
It is responsible for managing the `fdbmonitor` conf files and providing FDB binaries to the `foundationdb` container.
59+
The operator will create a config map that contains a template for the monitor conf file, and the sidecar will interpolate instance-specific fields into the conf and make it available to the fdbmonitor process through a shared volume.
60+
The "Upgrading a Cluster" has more detail on we manage binaries.
61+
The init container will run the same sidecar image, and will ensure that the initial binaries and dynamic conf are ready before the `fdbmonitor` process starts.
4962

5063
## Accessing a Cluster
5164

@@ -63,6 +76,22 @@ spec:
6376
template:
6477
spec:
6578
restartPolicy: OnFailure
79+
initContainers:
80+
- name: init-cluster-file
81+
image: foundationdb/foundationdb-kubernetes-sidecar:7.1.26-1
82+
args:
83+
- --init-mode
84+
- --input-dir
85+
- /mnt/config-volume
86+
- --copy-file
87+
- cluster-file
88+
- --require-not-empty
89+
- cluster-file
90+
volumeMounts:
91+
- name: config-volume
92+
mountPath: /mnt/config-volume
93+
- name: shared-volume
94+
mountPath: /out-dir
6695
containers:
6796
- name: fdbcli-status-cronjob
6897
image: foundationdb/foundationdb:7.1.26
@@ -74,18 +103,21 @@ spec:
74103
- name: FDB_CLUSTER_FILE
75104
value: /mnt/config-volume/cluster-file
76105
volumeMounts:
77-
- name: config-volume
106+
- name: shared-volume
78107
mountPath: /mnt/config-volume
79108
volumes:
80109
- name: config-volume
81110
configMap:
82111
name: sample-cluster-config
112+
- name: shared-volume
113+
emptyDir:
114+
medium: Memory
83115
```
84116

85117
Note that:
86118

87119
* The name of the config map will depend on the name of your cluster.
88-
* For long-running applications you should ensure that your cluster file is writeable by your application.
120+
* For long-running applications you should ensure that your cluster file is writeable by your application. You can achieve this by using the init container and copying the cluster-file inside a shared `emptyDir`.
89121

90122
## Next
91123

0 commit comments

Comments
 (0)