|
| 1 | +/////////////////////////////////////////////////////////////////////////////// |
| 2 | + |
| 3 | + Copyright (c) 2021, Oracle and/or its affiliates. |
| 4 | + Licensed under the Universal Permissive License v 1.0 as shown at |
| 5 | + http://oss.oracle.com/licenses/upl. |
| 6 | + |
| 7 | +/////////////////////////////////////////////////////////////////////////////// |
| 8 | +
|
| 9 | += Performance Testing |
| 10 | +
|
| 11 | +== Performance Testing in Kubernetes |
| 12 | +
|
| 13 | +Many customers use Coherence because they want access to data at in-memory speeds. Customers who want the best performance from their application typically embark on performance testing and load testing of their application. When doing this sort of testing on Kubernetes, it is useful to understand the ways that your Kubernetes environment can impact your test results. |
| 14 | +
|
| 15 | +== Where are your Nodes? |
| 16 | +
|
| 17 | +When an application has been deployed into Kubernetes, pods will typically be distributed over many nodes in the Kubernetes cluster. |
| 18 | +When deploying into Kubernetes cluster in the cloud, for example on Oracle OKE, the nodes can be distributed across different availability zones. These zones are effectively different data centers, meaning that the network speed can differ considerable between nodes in different zones. |
| 19 | +Performance testing in this sort of environment can be difficult if you use default Pod scheduling. Different test runs could distribute Pods to different nodes, in different zones, and skew results depending on how "far" test clients and servers are from each other. |
| 20 | +For example, when testing a simple Coherence `EntryProcessor` invocation in a Kubernetes cluster spread across zones, we saw the 95% response time when the client and server were in the same zone was 0.1 milli-seconds. When the client and server were in different zones, the 95% response time could be as high as 0.8 milli-seconds. This difference is purely down to the network distance between nodes. Depending on the actual use-cases being tested, this difference might not have much impact on overall response times, but for simple operations it can be a significant enough overhead to impact test results. |
| 21 | +
|
| 22 | +The solution to the issue described above is to use Pod scheduling to fix the location of the Pods to be used for tests. In a cluster like Oracle OKE, this would ensure all the Pods will be scheduled into the same availability zone. |
| 23 | +
|
| 24 | +=== Finding Node Zones |
| 25 | +
|
| 26 | +This example is going to talks about scheduling Pods to a single availability zone in a Kubernetes cluster in the cloud. Pod scheduling in this way uses Node labels, and in fact any label on the Nodes in your cluster could be used to fix the location of the Pods. |
| 27 | +
|
| 28 | +To schedule all the Coherence Pods into a single zone we first need to know what zones we have and what labels have used. |
| 29 | +The standard Kubernetes Node label for the availability zone is `topology.kubernetes.io/zone` (as documented in the https://kubernetes.io/docs/reference/labels-annotations-taints/[Kubernetes Labels Annotations and Taints] documentation). To slightly confuse the situation, prior to Kubernetes 1.17 the label was `failure-domain.beta.kubernetes.io/zone`, which has now been deprecated. Some Kubernetes clusters, even after 1.17, still use the deprecated label, so you need to know what labels your Nodes have. |
| 30 | +
|
| 31 | +Run the following command so list the nodes in a Kubernetes cluster with the value of the two zone labels for each node: |
| 32 | +[source,bash] |
| 33 | +---- |
| 34 | +kubectl get nodes -L topology.kubernetes.io/zone,failure-domain.beta.kubernetes.io/zone |
| 35 | +---- |
| 36 | +
|
| 37 | +The output will be something like this: |
| 38 | +[source] |
| 39 | +---- |
| 40 | +NAME STATUS ROLES AGE VERSION ZONE ZONE |
| 41 | +node-1 Ready node 66d v1.19.7 US-ASHBURN-AD-1 |
| 42 | +node-2 Ready node 66d v1.19.7 US-ASHBURN-AD-2 |
| 43 | +node-3 Ready node 66d v1.19.7 US-ASHBURN-AD-3 |
| 44 | +node-4 Ready node 66d v1.19.7 US-ASHBURN-AD-2 |
| 45 | +node-5 Ready node 66d v1.19.7 US-ASHBURN-AD-3 |
| 46 | +node-6 Ready node 66d v1.19.7 US-ASHBURN-AD-1 |
| 47 | +---- |
| 48 | +In the output above the first `Zone` column has values, and the second does not. This means that the zone label used is the first in the label list in our `kubectl` command, i.e., `topology.kubernetes.io/zone`. |
| 49 | +
|
| 50 | +If the nodes had been labeled with the second, deprecated, label in the `kubectl` command list `failure-domain.beta.kubernetes.io/zone` the output would look like this: |
| 51 | +[source] |
| 52 | +---- |
| 53 | +NAME STATUS ROLES AGE VERSION ZONE ZONE |
| 54 | +node-1 Ready node 66d v1.19.7 US-ASHBURN-AD-1 |
| 55 | +node-2 Ready node 66d v1.19.7 US-ASHBURN-AD-2 |
| 56 | +node-3 Ready node 66d v1.19.7 US-ASHBURN-AD-3 |
| 57 | +node-4 Ready node 66d v1.19.7 US-ASHBURN-AD-2 |
| 58 | +node-5 Ready node 66d v1.19.7 US-ASHBURN-AD-3 |
| 59 | +node-6 Ready node 66d v1.19.7 US-ASHBURN-AD-1 |
| 60 | +---- |
| 61 | +
|
| 62 | +From the list of nodes above we can see that there are three zones, `US-ASHBURN-AD-1`, `US-ASHBURN-AD-2` and `US-ASHBURN-AD-3`. |
| 63 | +In this example we will schedule all the Pods to zome `US-ASHBURN-AD-1`. |
| 64 | +
|
| 65 | +=== Scheduling Pods of a Coherence Cluster |
| 66 | +
|
| 67 | +The `Coherence` CRD supports a number of ways to schedule Pods, as described in the <<other/090_pod_scheduling.adoc,Configure Pod Scheduling>> documentation. Using node labels is the simplest of the scheduling methods. |
| 68 | +In this case we need to schedule Pods onto nodes that have the label `topology.kubernetes.io/zone=US-ASHBURN-AD-1`. |
| 69 | +In the `Coherence` yaml we use the `nodeSelector` field. |
| 70 | +
|
| 71 | +[source,yaml] |
| 72 | +.coherence-cluster.yaml |
| 73 | +---- |
| 74 | +apiVersion: coherence.oracle.com/v1 |
| 75 | +kind: Coherence |
| 76 | +metadata: |
| 77 | + name: storage |
| 78 | +spec: |
| 79 | + replicas: 3 |
| 80 | + nodeSelector: |
| 81 | + topology.kubernetes.io/zone: US-ASHBURN-AD-1 |
| 82 | +---- |
| 83 | +
|
| 84 | +When the yaml above is applied, a cluster of three Pods will be created, all scheduled onto nodes in the `US-ASHBURN-AD-1` availability zone. |
| 85 | +
|
| 86 | +
|
| 87 | +=== Other Performance Factors |
| 88 | +
|
| 89 | +Depending on the Kubernetes cluster you are using there could be various other factors to bear in mind. Many Kubernetes clusters run on virtual machines, which can be poor for repeated performance comparisons unless you know what else might be running on the underlying hardware that the VM is on. If a test run happens at the same time as another VM is consuming a lot of the underlying hardware resource this can obviously impact the results. Unfortunately bear-metal hardware, the best for repeated performance tests, is not always available, so it is useful to bear this in mind if there is suddenly a strange outlier in the tests. |
| 90 | +
|
| 91 | +
|
| 92 | +
|
| 93 | +
|
0 commit comments