You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/api-management/how-to-self-hosted-gateway-on-kubernetes-in-production.md
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,6 +32,38 @@ The minimum number of replicas suitable for production is three, preferably comb
32
32
33
33
By default, a self-hosted gateway is deployed with a **RollingUpdate** deployment [strategy](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy). Review the default values and consider explicitly setting the [maxUnavailable](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-unavailable) and [maxSurge](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-surge) fields, especially when you're using a high replica count.
34
34
35
+
## Autoscaling
36
+
37
+
While we provide [guidance on the minimum number of replicas](#number-of-replicas) for the self-hosted gateway, we recommend that you use autoscaling for the self-hosted gateway to meet the demand of your traffic more proactively.
38
+
39
+
There are two ways to autoscale the self-hosted gateway horizontally:
40
+
41
+
- Autoscale based on resource usage (CPU and memory)
42
+
- Autoscale based on the number of requests per second
43
+
44
+
This is possible through native Kubernetes functionality, or by using [Kubernetes Event-driven Autoscaling (KEDA)](https://keda.sh/). KEDA is a CNCF Incubation project that strives to make application autoscaling simple.
45
+
46
+
> [!NOTE]
47
+
> KEDA is an open-source technology that is not supported by Azure support and needs to be operated by customers.
48
+
49
+
### Resource-based autoscaling
50
+
51
+
Kubernetes allows you to autoscale the self-hosted gateway based on resource usage by using a [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/). It allows you to [define CPU and memory thresholds](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-resource-metrics), and the number of replicas to scale out or in.
52
+
53
+
An alternative is to use Kubernetes Event-driven Autoscaling (KEDA) allowing you to scale workloads based on a [variety of scalers](https://keda.sh/docs/latest/scalers/), including CPU and memory.
54
+
55
+
> [!TIP]
56
+
> If you are already using KEDA to scale other workloads, we recommend using KEDA as a unified app autoscaler. If that is not the case, then we strongly suggest to rely on the native Kubernetes functionality through Horizontal Pod Autoscaler.
57
+
58
+
### Traffic-based autoscaling
59
+
60
+
Kubernetes does not provide an out-of-the-box mechanism for traffic-based autoscaling.
61
+
62
+
Kubernetes Event-driven Autoscaling (KEDA) provides a few ways that can help with traffic-based autoscaling:
63
+
64
+
- You can scale based on metrics from a Kubernetes ingress if they are available in [Prometheus](https://keda.sh/docs/latest/scalers/prometheus/) or [Azure Monitor](https://keda.sh/docs/latest/scalers/azure-monitor/) by using an out-of-the-box scaler
65
+
- You can install [HTTP add-on](https://github.com/kedacore/http-add-on), which is available in beta, and scales based on the number of requests per second.
66
+
35
67
## Container resources
36
68
By default, the YAML file provided in the Azure portal doesn't specify container resource requests.
0 commit comments