You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/manual/customization.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -326,7 +326,7 @@ The operator will add a special locality to the fdbserver processes called `dns_
326
326
327
327
## Using Multiple Namespaces
328
328
329
-
Our [sample deployment](https://raw.githubusercontent.com/foundationdb/fdb-kubernetes-operator/master/config/samples/deployment.yaml) configures the operator to run in single-namespace mode, where it only manages resources in the namespace where the operator itself is running. If you want a single deployment of the operator to manage your FDB clusters across all of your namespaces, you will need to run it in global mode. Which mode is appropriate will depend on the constraints of your environment.
329
+
Our [sample deployment](../../config/samples/deployment.yaml) configures the operator to run in single-namespace mode, where it only manages resources in the namespace where the operator itself is running. If you want a single deployment of the operator to manage your FDB clusters across all of your namespaces, you will need to run it in global mode. Which mode is appropriate will depend on the constraints of your environment.
330
330
331
331
### Single-Namespace Mode
332
332
@@ -339,7 +339,7 @@ To run the controller in single-namespace mode, you will need to configure the f
339
339
* A service account for the controller
340
340
* The serviceAccountName field in the controller's pod spec
341
341
* A `WATCH_NAMESPACE` environment variable defined in the controller's pod spec or in the arguments of the container command
342
-
* A Role that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](https://raw.githubusercontent.com/FoundationDB/fdb-kubernetes-operator/master/config/samples/deployment/rbac_role.yaml) for the list of those permissions.
342
+
* A Role that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](../../config/samples/deployment/rbac_role.yaml) for the list of those permissions.
343
343
* A RoleBinding that binds that role to the service account for the controller
344
344
345
345
The sample deployment provides all of this configuration.
@@ -354,7 +354,7 @@ To run the controller in global mode, you will need to configure the following t
354
354
355
355
* A service account for the controller
356
356
* The serviceAccountName field in the controller's pod spec
357
-
* A ClusterRole that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](https://raw.githubusercontent.com/FoundationDB/fdb-kubernetes-operator/master/config/samples/deployment/rbac_role.yaml) for the list of those permissions.
357
+
* A ClusterRole that grants access to the necessary permissions to all of the resources that the controller manages. See the [sample role](../../config/samples/deployment/rbac_role.yaml) for the list of those permissions.
358
358
* A ClusterRoleBinding that binds that role to the service account for the controller
359
359
360
360
You can build this kind of configuration easily from the sample deployment by changing the following things:
Copy file name to clipboardExpand all lines: docs/manual/debugging.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,8 @@ Remove [storage-1] from cluster default/sample-cluster with exclude: false and s
93
93
94
94
**NOTE**: This is a very dangerous operation.
95
95
This will delete the pod and the PVC without checking that the data has been re-replicated.
96
-
You should only due this after checking that the database is available, has not had any data loss, and that the pod is currently not running. You can confirm the first and second check by looking at the cluster status.
96
+
You should only due this after checking that the database is available, has not had any data loss, and that the pod is currently not running.
97
+
You can confirm the first and second check by looking at the cluster status.
97
98
98
99
## Exclusions Not Starting Due to Missing Processes
Copy file name to clipboardExpand all lines: docs/manual/fault_domains.md
+28-14Lines changed: 28 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,8 +134,8 @@ This strategy uses the pod name as the fault domain, which allows each process t
134
134
135
135
## Three-Data-Hall Replication
136
136
137
-
**NOTE**: The support for this redundancy mode is new and might have issues. Please make sure you test this configuration in our test/QA environment.
138
-
The [three-data-hall](https://apple.github.io/foundationdb/configuration.html#single-datacenter-modes) replication can be use to replicate data across three data halls, or availability zones.
137
+
**NOTE**: The support for this redundancy mode is new and might have issues. Please make sure you test this configuration in your test/QA environment.
138
+
The [three-data-hall](https://apple.github.io/foundationdb/configuration.html#single-datacenter-modes) replication can be used to replicate data across three data halls, or availability zones.
139
139
This requires that your fault domains are properly labeled on the Kubernetes nodes.
140
140
Most cloud-providers will use the well-known label [topology.kubernetes.io/zone](https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone) for this.
141
141
When creating a three-data-hall replicated FoundationDBCluster on Kubernetes we have to create 3 `FoundationDBCluster` resources.
@@ -195,9 +195,9 @@ Operations across the different `FoundationDBCluster` resources are [coordinated
195
195
196
196
## Multi-Region Replication
197
197
198
-
The replication strategies above all describe how data is replicated within a data center.
198
+
The replication strategies above all describe how data is replicated within a data center or a single region.
199
199
They control the `zoneid` field in the cluster's locality.
200
-
If you want to run a cluster across multiple data centers, you can use FoundationDB's multi-region replication.
200
+
If you want to run a cluster across multiple data centers or regions, you can use FoundationDB's multi-region replication.
201
201
This can work with any of the replication strategies above.
202
202
The data center will be a separate fault domain from whatever you provide for the zone.
203
203
@@ -286,15 +286,29 @@ spec:
286
286
287
287
## Coordinating Global Operations
288
288
289
-
When running a FoundationDB cluster that is deployed across multiple Kubernetes clusters, each Kubernetes cluster will have its own instance of the operator working on the processes in its cluster. There will be some operations that cannot be scoped to a single Kubernetes cluster, such as changing the database configuration.
290
-
The operator provides a locking system to ensure that only one instance of the operator can perform these operations at a time. You can enable this locking system by setting `lockOptions.disableLocks = false` in the cluster spec. The locking system is automatically enabled by default for any cluster that has multiple regions in its database configuration, a `zoneCount` greater than 1 in its fault domain configuration, or `redundancyMode` equal to `three_data_hall`.
289
+
When running a FoundationDB cluster that is deployed across multiple Kubernetes clusters, each Kubernetes cluster will have its own instance of the operator working on the processes in its cluster.
290
+
There will be some operations that cannot be scoped to a single Kubernetes cluster, such as changing the database configuration.
291
+
The operator provides a locking system to reduce the risk of those independent operator instance performing the same action at the same time.
292
+
All actions that the operator performs like changing the configuration or restarting processes will lead to the same desired state.
293
+
The locking system is only intended to reduce the risk of frequent reoccurring recoveries.
294
+
295
+
You can enable this locking system by setting `lockOptions.disableLocks = false` in the cluster spec.
296
+
The locking system is automatically enabled by default for any cluster that has multiple regions in its database configuration, a `zoneCount` greater than 1 in its fault domain configuration, or `redundancyMode` equal to `three_data_hall`.
291
297
292
298
The locking system uses the `processGroupIDPrefix` from the cluster spec to identify an process group of the operator.
293
299
Make sure to set this to a unique value for each Kubernetes cluster, both to support the locking system and to prevent duplicate process group IDs.
294
300
295
-
This locking system uses the FoundationDB cluster as its data source. This means that if the cluster is unavailable, no instance of the operator will be able to get a lock. If you hit a case where this becomes an issue, you can disable the locking system by setting `lockOptions.disableLocks = true` in the cluster spec.
301
+
This locking system uses the FoundationDB cluster as its data source.
302
+
This means that if the cluster is unavailable, no instance of the operator will be able to get a lock.
303
+
If you hit a case where this becomes an issue, you can disable the locking system by setting `lockOptions.disableLocks = true` in the cluster spec.
296
304
297
-
In most cases, restarts will be done independently in each Kubernetes cluster, and the locking system will be used to ensure a minimum time between the different restarts and avoid multiple recoveries in a short span of time. During upgrades, however, all instances must be restarted at the same time. The operator will use the locking system to coordinate this. Each instance of the operator will store records indicating what processes it is managing and what version they will be running after the restart. Each instance will then try to acquire a lock and confirm that every process reporting to the cluster is ready for the upgrade. If all processes are prepared, the operator will restart all of them at once. If any instance of the operator is stuck and unable to prepare its processes for the upgrade, the restart will not occur.
305
+
In most cases, restarts will be done independently in each Kubernetes cluster, and the locking system will be used to try to ensure a minimum time between the different restarts and avoid multiple recoveries in a short span of time.
306
+
During upgrades, however, all instances must be restarted at the same time.
307
+
The operator will use the locking system to coordinate this.
308
+
Each instance of the operator will store records indicating what processes it is managing and what version they will be running after the restart.
309
+
Each instance will then try to acquire a lock and confirm that every process reporting to the cluster is ready for the upgrade.
310
+
If all processes are prepared, the operator will restart all of them at once.
311
+
If any instance of the operator is stuck and unable to prepare its processes for the upgrade, the restart will not occur.g
298
312
299
313
### Deny List
300
314
@@ -351,16 +365,17 @@ Depending on the requirements the operator can be configured to either prefer or
351
365
The number of coordinators is currently a hardcoded mechanism based on the [following algorithm](https://github.com/FoundationDB/fdb-kubernetes-operator/blob/v0.49.2/api/v1beta1/foundationdbcluster_types.go#L1500-L1508):
352
366
353
367
```go
368
+
// DesiredCoordinatorCount returns the number of coordinators to recruit for a cluster.
354
369
func (cluster *FoundationDBCluster) DesiredCoordinatorCount() int {
355
-
if cluster.Spec.DatabaseConfiguration.UsableRegions > 1 {
356
-
return 9
357
-
}
370
+
if cluster.Spec.DatabaseConfiguration.UsableRegions > 1 || cluster.Spec.DatabaseConfiguration.RedundancyMode == RedundancyModeThreeDataHall {
For all clusters that use more than one region the operator will recruit 9 coordinators.
378
+
For all clusters that use more than one region or uses `three_data_hall`, the operator will recruit 9 coordinators.
364
379
If the number of regions is `1` the number of recruited coordinators depends on the redundancy mode.
365
380
The number of coordinators is chosen based on the fact that the coordinators use a consensus protocol (Paxos) that needs a majority of processes to be up.
366
381
A common pattern in majority based system is to run `n * 2 + 1` processes, where `n` defines the failures that should be tolerated.
@@ -412,7 +427,6 @@ The operator supports the following classes as coordinators:
412
427
413
428
FoundationDB clusters that are spread across different DC's or Kubernetes clusters only support the same `coordinatorSelection`.
414
429
The reason behind this is that the coordinator selection is a global process and different `coordinatorSelection` of the `FoundationDBCluster` resources can lead to an undefined behaviour or in the worst case flapping coordinators.
415
-
There are plans to support this feature in the future.
Copy file name to clipboardExpand all lines: docs/manual/getting_started.md
+39-7Lines changed: 39 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ You can see logs from the operator by running `kubectl logs -f -l app=fdb-kubern
19
19
20
20
The example below will cover creating a cluster. All subsequent examples will assume that you have just created this cluster, and will cover an operation on this cluster.
21
21
22
-
For more information on the fields you can define on the cluster resource, see the [go docs](https://godoc.org/github.com/FoundationDB/fdb-kubernetes-operator/pkg/apis/apps/v1beta2#FoundationDBCluster).
22
+
For more information on the fields you can define on the cluster resource, see the [cluster spec docs](../cluster_spec.md).
23
23
24
24
For more information on version compatibility, see our [compatibility guide](/docs/compatibility.md).
25
25
@@ -36,16 +36,29 @@ spec:
36
36
version: 7.1.26
37
37
```
38
38
39
-
This will create a cluster with 3 storage processes, 4 log processes, and 7 stateless processes. Each fdbserver process will be in a separate pod, and the pods will have names of the form `sample-cluster-$role-$n`, where `$n` is the process group ID and `$role` is the role for the process.
39
+
This will create a cluster with 3 storage processes, 4 log processes, and 7 stateless processes.
40
+
Each `fdbserver` process will be in a separate pod, and the pods will have names of the form `sample-cluster-$role-$n`, where `$n` is the process group ID and `$role` is the role for the process.
40
41
41
-
You can run `kubectl get foundationdbcluster sample-cluster` to check the progress of reconciliation. Once the reconciled generation appears in this output, the cluster should be up and ready. After creating the cluster, you can connect to the cluster by running `kubectl exec -it sample-cluster-log-1 -- fdbcli`.
42
+
You can run `kubectl get foundationdbcluster sample-cluster` to check the progress of reconciliation.
43
+
Once the reconciled generation appears in this output, the cluster should be up and ready.
44
+
After creating the cluster, you can connect to the cluster by running `kubectl exec -it sample-cluster-log-1 -- fdbcli`.
42
45
43
-
This example requires non-trivial resources, based on what a process will need in a production environment. This means that is too large to run in a local testing environment. It also requires disk I/O features that are not present in Docker for Mac. If you want to run these tests in that kind of environment, you can try bringing in the resource requirements, knobs, and fault domain information from a [local testing example](../../config/samples/cluster.yaml).
46
+
This example requires non-trivial resources, based on what a process will need in a production environment.
47
+
This means that it might be too large to run in a local testing environment.
48
+
It also requires disk I/O features that are not present in Docker for Mac.
49
+
If you want to run these tests in that kind of environment, you can try bringing in the resource requirements, knobs, and fault domain information from a [local testing example](../../config/samples/cluster.yaml).
50
+
51
+
_NOTE_: FoundationDB currently only supports `amd64`/`x64`.
44
52
45
53
In addition to the pods, the operator will create a Persistent Volume Claim for any stateful
46
54
processes in the cluster. In this example, each volume will be 128 GB.
47
55
48
-
By default each pod will have two containers and one init container. The `foundationdb` container will run fdbmonitor and fdbserver, and is the main container for the pod. The `foundationdb-kubernetes-sidecar` container will run a sidecar image designed to help run FDB on Kubernetes. It is responsible for managing the fdbmonitor conf files and providing FDB binaries to the `foundationdb` container. The operator will create a config map that contains a template for the monitor conf file, and the sidecar will interpolate instance-specific fields into the conf and make it available to the fdbmonitor process through a shared volume. The "Upgrading a Cluster" has more detail on we manage binaries. The init container will run the same sidecar image, and will ensure that the initial binaries and dynamic conf are ready before the fdbmonitor process starts.
56
+
By default each pod will have two containers and one init container.
57
+
The `foundationdb` container will run `fdbmonitor` and `fdbserver`, and is the main container for the pod. The `foundationdb-kubernetes-sidecar` container will run a sidecar image designed to help run FDB on Kubernetes.
58
+
It is responsible for managing the `fdbmonitor` conf files and providing FDB binaries to the `foundationdb` container.
59
+
The operator will create a config map that contains a template for the monitor conf file, and the sidecar will interpolate instance-specific fields into the conf and make it available to the fdbmonitor process through a shared volume.
60
+
The "Upgrading a Cluster" has more detail on we manage binaries.
61
+
The init container will run the same sidecar image, and will ensure that the initial binaries and dynamic conf are ready before the `fdbmonitor` process starts.
* The name of the config map will depend on the name of your cluster.
88
-
* For long-running applications you should ensure that your cluster file is writeable by your application.
120
+
* For long-running applications you should ensure that your cluster file is writeable by your application. You can achieve this by using the init container and copying the cluster-file inside a shared `emptyDir`.
0 commit comments