|
| 1 | +--- |
| 2 | +title: Monitor performance with wrk and btop |
| 3 | +weight: 70 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +## Apply configuration updates |
| 10 | + |
| 11 | +Now that you have all your nginx deployments running across Intel and ARM architectures, you can monitor performance across each architecture using wrk to generate load and btop to monitor system performance. |
| 12 | + |
| 13 | +{{% notice Note %}} |
| 14 | +This tutorial uses wrk to generate load, which is readily available on apt and brew package managers. [wrk2](https://github.com/giltene/wrk2) is a modern fork of wrk with additional features. wrk was chosen for this tutorial due to its ease of install, but if you prefer to install and use wrk2 (or other http load generators) for your testing, feel free to do so. |
| 15 | +{{% /notice %}} |
| 16 | + |
| 17 | +### Apply performance configuration |
| 18 | + |
| 19 | +The `nginx_util.sh` script includes a `put config` command that will: |
| 20 | + |
| 21 | +- Apply a performance-optimized nginx configuration to all pods |
| 22 | +- Install btop monitoring tool on all pods for system monitoring |
| 23 | +- Restart pods with the new configuration |
| 24 | + |
| 25 | +1. Run the following command to apply the configuration updates: |
| 26 | + |
| 27 | +```bash |
| 28 | +./nginx_util.sh put btop |
| 29 | +``` |
| 30 | + |
| 31 | +You will see output similar to the following: |
| 32 | + |
| 33 | +```output |
| 34 | +Installing btop on all nginx pods... |
| 35 | +Installing btop on nginx-amd-deployment-56b547bb47-vgbjj... |
| 36 | +✓ btop installed on nginx-amd-deployment-56b547bb47-vgbjj |
| 37 | +Installing btop on nginx-arm-deployment-66cb47ddc9-fgmsd... |
| 38 | +✓ btop installed on nginx-arm-deployment-66cb547ddc9-fgmsd |
| 39 | +Installing btop on nginx-intel-deployment-6f5bff9667-zdrqc... |
| 40 | +✓ btop installed on nginx-intel-deployment-6f5bff9667-zdrqc |
| 41 | +✅ btop installed on all pods! |
| 42 | +``` |
| 43 | + |
| 44 | +### Verify configuration updates |
| 45 | + |
| 46 | +2. Check that all pods have restarted with the new configuration: |
| 47 | + |
| 48 | +```bash |
| 49 | +kubectl get pods -n nginx |
| 50 | +``` |
| 51 | + |
| 52 | +You should see all pods with recent restart times. |
| 53 | + |
| 54 | +{{% notice Note %}} |
| 55 | +Because pods are ephemeral, btop will need to be reinstalled if the pods are deleted or restarted. If you get an error saying btop is not found, simply rerun the `./nginx_util.sh put btop` command to reinstall it. |
| 56 | +{{% /notice %}} |
| 57 | + |
| 58 | + |
| 59 | +### Monitor pod performance |
| 60 | + |
| 61 | +You can now login to any pod and use btop to monitor system performance. There are many variables which may affect an individual workload's performance, btop (like top), is a great first step in understanding those variables. |
| 62 | + |
| 63 | +{{% notice Note %}} |
| 64 | +When performing load generation tests from your laptop, local system and network settings may interfere with proper load generation between your machine and the remote cluster services. To mitigate these issues, its suggested to install the nginx_util.sh (or whichever tool you wish to use) on a [remote Azure instance](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/azure/) in the same region and zone as your K8s cluster (us-west-2 if you follow these tutorial instructions exactly) for best results. If you aren't seeing at least 70K+ requests/s to either K8s service endpoint, switching to a better located/tuned system is advised. |
| 65 | +{{% /notice %}} |
| 66 | + |
| 67 | +Bringing up two btop terminals, one for each pod, is a convenient way to view performance in realtime. To bring up btop on both Arm and Intel pods: |
| 68 | + |
| 69 | +1. Open a new terminal window or tab. |
| 70 | +2. Within the terminal, run the `login arm` command from the nginx utility script to enter the pod: |
| 71 | + |
| 72 | +```bash |
| 73 | +# Login to AMD pod (replace with intel or arm as needed) |
| 74 | +./nginx_util.sh login arm |
| 75 | +``` |
| 76 | + |
| 77 | +3. Once inside the pod, run btop to see real-time system monitoring: |
| 78 | + |
| 79 | +```bash |
| 80 | +btop --utf-force |
| 81 | +``` |
| 82 | +4. Repeat, from Step 1, but this time, using the `login intel` command. |
| 83 | + |
| 84 | +You should now see something similar to below, that is, one terminal for each Arm and Intel, running btop: |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | +To visualize performance with btop against the Arm and Intel pods via the load balancer service endpoints, you can use the nginx_util.sh wrapper to generate the load two both simultaneoulsy: |
| 89 | + |
| 90 | +```bash |
| 91 | +./nginx_util.sh wrk both |
| 92 | +``` |
| 93 | + |
| 94 | +This runs wrk with predefined setting (1 thread, 50 simultaneous connections) to generate load to the K8s architecture-specific endpoints. While it runs (for a default of 30s), you can observe some performance characteristics from the btop outputs: |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +Of particular interest is memory and CPU resource usage per pod. For Intel, figure 1 shows memory usage for the process, with figure 2 showing total cpu usage. Figures 3 and 4 show us the same metrics, but for Arm. |
| 99 | + |
| 100 | + |
| 101 | + |
| 102 | +In addition to the visual metrics, the script also returns runtime results including requests per second, and latencies: |
| 103 | + |
| 104 | +```output |
| 105 | +azureuser@gcohen-locust-1:/tmp/1127$ ./nginx_util.sh wrk both |
| 106 | +Running wrk against both architectures in parallel... |
| 107 | +
|
| 108 | +Intel: wrk -t1 -c50 -d30 http://172.193.227.195/ |
| 109 | +ARM: wrk -t1 -c50 -d30 http://20.252.73.72/ |
| 110 | +
|
| 111 | +======================================== |
| 112 | +
|
| 113 | +INTEL RESULTS: |
| 114 | +Running 30s test @ http://172.193.227.195/ |
| 115 | + 1 threads and 50 connections |
| 116 | + Thread Stats Avg Stdev Max +/- Stdev |
| 117 | + Latency 752.40us 1.03ms 28.95ms 94.01% |
| 118 | + Req/Sec 84.49k 12.14k 103.08k 73.75% |
| 119 | + 2528743 requests in 30.10s, 766.88MB read |
| 120 | +Requests/sec: 84010.86 |
| 121 | +Transfer/sec: 25.48MB |
| 122 | +
|
| 123 | +ARM RESULTS: |
| 124 | +Running 30s test @ http://20.252.73.72/ |
| 125 | + 1 threads and 50 connections |
| 126 | + Thread Stats Avg Stdev Max +/- Stdev |
| 127 | + Latency 621.56us 565.90us 19.75ms 95.43% |
| 128 | + Req/Sec 87.54k 10.22k 107.96k 82.39% |
| 129 | + 2620567 requests in 30.10s, 789.72MB read |
| 130 | +Requests/sec: 87062.21 |
| 131 | +Transfer/sec: 26.24MB |
| 132 | +
|
| 133 | +======================================== |
| 134 | +Both tests completed |
| 135 | +``` |
| 136 | + |
| 137 | +### Experimenting with wrk |
| 138 | + |
| 139 | +The nginx_util.sh script shows the results of the load generation, as well as the command lines used to generate them. |
| 140 | + |
| 141 | +```output |
| 142 | +... |
| 143 | +Intel: wrk -t1 -c50 -d30 http://172.193.227.195/ |
| 144 | +ARM: wrk -t1 -c50 -d30 http://20.252.73.72/ |
| 145 | +... |
| 146 | +``` |
| 147 | + |
| 148 | + |
| 149 | +Feel free to experiment increasing/decreasing client threads, connections, and durations to better understand the performance characteristics under different scenarios. |
| 150 | + |
| 151 | +For example, to generate load using 500 connections across 4 threads to the Arm service for five minutes (300s), you could use the following commandline: |
| 152 | + |
| 153 | +```bash |
| 154 | +wrk -t4 -c500 -d300 http://20.252.73.72/ |
| 155 | +``` |
| 156 | + |
| 157 | +As mentioned earlier, unless your local system is tuned to handle load generation, you may find better traffic generation results by running on a VM. If aren't seeing at least 70K+ requests/s to either K8s service endpoint when running `wrk`, switching to a better located/tuned system is advised. |
| 158 | + |
| 159 | +## Next Steps |
| 160 | + |
| 161 | +You learned in this learning path how to run a sample nginx workload on a dual-architecture (Arm and Intel) Azure Kubernetes Service. Once setup, you learned how to generate load with the wrk utility, and monitor runtime metrics with btop. If you wish to continue experimenting with this learning path, some ideas you may wish to explore include: |
| 162 | + |
| 163 | +* What do the performance curves look like between the two architectures as a function of load? |
| 164 | +* How do larger instance types scale versus smaller ones? |
| 165 | + |
| 166 | +Most importantly, you now possess the knowledge needed to begin experimenting with your own workloads on Arm-based AKS nodes to identify performance and efficiency opportunities unique to your own environments. |
0 commit comments