Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 66 additions & 13 deletions deploy-manage/deploy/cloud-on-k8s/deploy-eck-on-gke-autopilot.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,57 @@
1. It is recommended that each Kubernetes host’s virtual memory kernel settings be modified. Refer to [Virtual memory](virtual-memory.md).
2. It is recommended that {{es}} Pods have an `initContainer` that waits for virtual memory settings to be in place.
3. For Elastic Agent/Beats there are storage limitations to be considered.
4. Ensure you are using a node class that is applicable for your workload by adding a `cloud.google.com/compute-class` label in a `nodeSelector`. Refer to [GKE Autopilot documentation.](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-compute-classes).
4. Ensure you are using a node class that is applicable for your workload by adding a `cloud.google.com/compute-class` label in a `nodeSelector`. Refer to [GKE Autopilot documentation](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-compute-classes).

Check notice on line 22 in deploy-manage/deploy/cloud-on-k8s/deploy-eck-on-gke-autopilot.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.

## Ensuring virtual memory kernel settings [k8s-autopilot-setting-virtual-memory]

If you are intending to run production workloads on GKE Autopilot then `vm.max_map_count` should be set. The recommended way to set this kernel setting on the Autopilot hosts is with a `Daemonset` as described in the [Virtual memory](virtual-memory.md) section. You must be running at least version 1.25 when on the `regular` channel or using the `rapid` channel, which currently runs version 1.27.
If you are intending to run production workloads on GKE Autopilot then `vm.max_map_count` should be set. The recommended way to set this kernel setting on the Autopilot hosts depends on your ECK version:

Check notice on line 26 in deploy-manage/deploy/cloud-on-k8s/deploy-eck-on-gke-autopilot.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.

::::{warning}
Only use the provided `Daemonset` exactly as specified or it could be rejected by the Autopilot control plane.
::::
* {applies_to}`eck: ga 3.0-3.1` [Use a DaemonSet](/deploy-manage/deploy/cloud-on-k8s/virtual-memory.md#k8s_using_a_daemonset_to_set_virtual_memory). You must be running at least version 1.25 when on the `regular` channel or using the `rapid` channel, which currently runs version 1.27.

::::{warning}
Use the provided `Daemonset` exactly as specified, with a `max_map_count` value of `262144`, or it could be rejected by the Autopilot control plane.
::::
* {applies_to}`eck: ga 3.2+` [Use a custom ComputeClass](/deploy-manage/deploy/cloud-on-k8s/virtual-memory.md#k8s_using_a_computeclass_to_set_virtual_memory). Using a custom ComputeClass allows you to set a higher value for `max_map_count` due to limitations on the `DaemonSet`.

## Install the ECK Operator [k8s-autopilot-deploy-the-operator]

Refer to [*Install ECK*](install.md) for more information on installation options.

## Deploy an {{es}} cluster [k8s-autopilot-deploy-elasticsearch]

Create an {{es}} cluster. If you are using the `Daemonset` described in the [Virtual memory](virtual-memory.md) section to set `max_map_count` you can add the `initContainer` below is also used to ensure the setting is set prior to starting {{es}}.
Create an {{es}} cluster. The information that you need to provide in your spec depends on whether you've increased your virtual memory kernel setting, and the method that you used.

::::{tab-set}

:::{tab-item} Using a custom ComputeClass
If you used a custom ComputeClass to set `max_map_count`, then you need to reference the custom ComputeClass as part of your template spec.

```shell subs=true
```yaml subs=true
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-sample
spec:
version: {{version.stack}}
nodeSets:
- name: default
count: 1
podTemplate:
spec:
nodeSelector:
cloud.google.com/compute-class: "elasticsearch"
EOF
```
:::


:::{tab-item} Using a DaemonSet

If you used a DaemonSet to set `max_map_count`, you can add the following `initContainer` to ensure the setting is set prior to starting {{es}}.

Check notice on line 70 in deploy-manage/deploy/cloud-on-k8s/deploy-eck-on-gke-autopilot.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Wordiness: Consider using 'before' instead of 'prior to'.

```yaml subs=true
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
Expand All @@ -48,23 +80,44 @@
nodeSets:
- name: default
count: 1
# Only uncomment the below section if you are not using the Daemonset to set max_map_count.
# config:
# node.store.allow_mmap: false
podTemplate:
spec:
# This init container ensures that the `max_map_count` setting has been applied before starting Elasticsearch.
# This is not required, but is encouraged when using the previously mentioned Daemonset to set max_map_count.
# This is not required, but is encouraged when using the Daemonset to set max_map_count.
# Do not use this if setting config.node.store.allow_mmap: false
initContainers:
- name: max-map-count-check
command: ['sh', '-c', "while true; do mmc=$(cat /proc/sys/vm/max_map_count); if [ ${mmc} -eq 262144 ]; then exit 0; fi; sleep 1; done"]
command: ['sh', '-c', "while true; do mmc=$(cat /proc/sys/vm/max_map_count); if [ ${mmc} -eq 262144 ]; then exit 0; fi; sleep 1; done"]
EOF
```
:::
::::

### Deploy without custom virtual memory

If you didn't increase your virtual memory, then you need to set `node.store.allow_mmap` to `false`.

```yaml subs=true
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-sample
spec:
version: {{version.stack}}
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
EOF
```
:::
::::

## Deploy a standalone Elastic Agent and/or Beats [k8s-autopilot-deploy-agent-beats]

When running Elastic Agent and Beats within GKE Autopilot there are storage constraints to be considered. No `HostPath` volumes are allowed, which the ECK operator defaults to when unset for both `Deployments` and `Daemonsets`. Instead use [Kubernetes ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes).
When running Elastic Agent and Beats within GKE Autopilot there are storage constraints to be considered. No `HostPath` volumes are allowed, which the ECK operator defaults to when unset for both `Deployments` and `DaemonSets`. Instead use [Kubernetes ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes).

Check notice on line 120 in deploy-manage/deploy/cloud-on-k8s/deploy-eck-on-gke-autopilot.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence is a bit difficult to digest.... the when unset is not clear (what can be set that uses a hostPath? Probably the data directory of the beat or the state directory of the Elastic agent, but the unset statement feels weird).

Not sure if this sounds better or feels easier.... @pebrc, @shainaraskas , wdyt?

Suggested change
When running Elastic Agent and Beats within GKE Autopilot there are storage constraints to be considered. No `HostPath` volumes are allowed, which the ECK operator defaults to when unset for both `Deployments` and `DaemonSets`. Instead use [Kubernetes ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes).
When running {{agent}} and {{beats}} on GKE Autopilot, storage constraints apply. GKE Autopilot does not allow `hostPath` volumes. By default, the ECK operator uses a `hostPath` volume for the data directory when no alternative volume is configured, whether the workload is deployed as a `Deployment` or a `DaemonSet`. To run successfully, you can use [Kubernetes ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes) instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave this alone for now as it's outside of the scope of the original issue


Refer to [Recipes to deploy {{es}}, {{kib}}, Elastic Fleet Server and Elastic Agent and/or Beats within GKE Autopilot](https://github.com/elastic/cloud-on-k8s/tree/main/config/recipes/autopilot).

65 changes: 61 additions & 4 deletions deploy-manage/deploy/cloud-on-k8s/virtual-memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@

By default, {{es}} uses memory mapping (`mmap`) to efficiently access indices. Default values for virtual address space on Linux distributions can be too low for {{es}} to work properly, which may result in out-of-memory exceptions. This is why [the quickstart example](/deploy-manage/deploy/cloud-on-k8s/elasticsearch-deployment-quickstart.md) disables `mmap` through the `node.store.allow_mmap: false` setting. For production workloads, we recommended you increase the kernel setting `vm.max_map_count` to `1048576` and leave `node.store.allow_mmap` unset.

The kernel setting `vm.max_map_count=1048576` can be set on the host directly, by a dedicated init container which must be privileged, or a dedicated Daemonset.
The kernel setting `vm.max_map_count=1048576` can be set on the host directly, by a dedicated init container which must be privileged, a dedicated Daemonset, or a dedicated ComputeClass.

:::{important}
For {{es}} version 8.16 and later, set the `vm.max_map_count` kernel setting to `1048576`; for {{es}} version 8.15 and earlier, set `vm.max_map_count` to `262144`.
For {{es}} version 8.16 and later, set the `vm.max_map_count` kernel setting to `1048576`; for {{es}} version 8.15 and earlier, set `vm.max_map_count` to `262144`.

The exception is in GKE Autopilot environments:

Check notice on line 20 in deploy-manage/deploy/cloud-on-k8s/virtual-memory.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.
* {applies_to}`eck: ga 3.0-3.1` `vm.max_map_count` must be set to `262144`.
* {applies_to}`eck: ga 3.2+` Use a custom `ComputeClass`, rather than a `DaemonSet`, to override the kernel setting.
:::

For more information, check the {{es}} documentation on [Virtual memory](/deploy-manage/deploy/self-managed/vm-max-map-count.md).
Expand Down Expand Up @@ -91,13 +95,14 @@
securityContext:
privileged: true
runAsUser: 0
command: ['/usr/local/bin/bash', '-e', '-c', 'echo 262144 > /proc/sys/vm/max_map_count']
command: ['/usr/local/bin/bash', '-e', '-c', 'echo 1048576 > /proc/sys/vm/max_map_count'] <1>
containers:
- name: sleep
image: docker.io/bash:5.2.21
command: ['sleep', 'infinity']
EOF
```
1. In GKE Autopilot environments, `vm.max_map_count` must be set to 262144 when using a DaemonSet.

Check notice on line 105 in deploy-manage/deploy/cloud-on-k8s/virtual-memory.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.

To run an {{es}} instance that waits for the kernel setting to be in place:

Expand All @@ -122,8 +127,60 @@
# Do not use this if setting config.node.store.allow_mmap: false
initContainers:
- name: max-map-count-check
command: ['sh', '-c', "while true; do mmc=$(cat /proc/sys/vm/max_map_count); if [ ${mmc} -eq 262144 ]; then exit 0; fi; sleep 1; done"]
command: ['sh', '-c', "while true; do mmc=$(cat /proc/sys/vm/max_map_count); if [ ${mmc} -eq 262144 ]; then exit 0; fi; sleep 1; done"] <1>
EOF
```
1. In GKE Autopilot environments, `vm.max_map_count` must be set to 262144 when using a DaemonSet.

Check notice on line 133 in deploy-manage/deploy/cloud-on-k8s/virtual-memory.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.


## Using a custom ComputeClass to set virtual memory [k8s_using_a_computeclass_to_set_virtual_memory]
```{applies_to}
deployment:
eck: ga 3.2+
```

If you're using [GKE Autopilot](/deploy-manage/deploy/cloud-on-k8s/deploy-eck-on-gke-autopilot.md) to run ECK, then you can use a custom ComputeClass, rather than a DaemonSet, to increase the `vm.max_map_count` setting. This allows you to set a higher value, which is not possible with a DaemonSet.

Check notice on line 142 in deploy-manage/deploy/cloud-on-k8s/virtual-memory.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Wordiness: Consider using 'impossible' instead of 'not possible'.

Check notice on line 142 in deploy-manage/deploy/cloud-on-k8s/virtual-memory.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Acronyms: 'GKE' has no definition.

1. Create a ComputeClass that changes the host kernel setting on all nodes:

```yaml
cat <<EOF | kubectl apply -f -
apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
name: elasticsearch
spec:
whenUnsatisfiable: "DoNotScaleUp" <1>
nodePoolAutoCreation:
enabled: true
priorityDefaults: <2>
nodeSystemConfig:
linuxNodeConfig:
sysctls:
vm.max_map_count: 1048576
priorities:
- machineFamily: n2
EOF
```
1. Default since GKE 1.33

Check notice on line 165 in deploy-manage/deploy/cloud-on-k8s/virtual-memory.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.Wordiness: Consider using 'because' instead of 'since'.
2. `priorityDefaults` is available only since GKE 1.32.1-gke.1729000

2. Create your {{es}} instance using the custom ComputeClass:

```yaml subs=true
cat <<'EOF' | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: {{version.stack}}
nodeSets:
- name: default
count: 1
podTemplate:
spec:
nodeSelector:
cloud.google.com/compute-class: "elasticsearch"
EOF
```
Loading