|
| 1 | + |
| 2 | + |
| 3 | +# Application Observability Code Challenge 2 |
| 4 | + |
| 5 | + |
| 6 | +This is the second puzzler of the Application Observability Code Challenges. |
| 7 | +See the [announcement blog](https://goto.ceesbos.nl/aocc) or the [repository README](../README.md) for more information about the challenges in general. |
| 8 | +Check out the first challenge as well, if you did not complete it yet. |
| 9 | + |
| 10 | +> 🚨 **Challenge**: |
| 11 | +> - Run the sample applications |
| 12 | +> - Run the tests to see what happens |
| 13 | +> - Try to find out what happens, make a hypothesis❗ |
| 14 | +> - **Improve the observability** of the applications to **prove the hypothesis** |
| 15 | +> - Optional: fix the problem and **prove it with observability data that it is really fixed** |
| 16 | +> - Optional, but highly appreciated 🙏: Share your findings, insights you learned and potential solution, either as a ['discussion'](https://github.com/cbos/application-observability-code-challenges/discussions) or as a pull request |
| 17 | +
|
| 18 | +An online guided environment is available with KillerCode, see [https://killercoda.com/observability-code-challenges](https://goto.ceesbos.nl/aocckk) |
| 19 | + |
| 20 | +## Challenge |
| 21 | + |
| 22 | +## Summary |
| 23 | +This challenge will learn you more about the observability of regular threads and virtual threads and how to make it more observable. |
| 24 | + |
| 25 | +## Setup in this repository |
| 26 | + |
| 27 | +- The setup is a set of applications with actual the same code base, but running with different configuration and Java versions |
| 28 | +- The applications are instrumented using OpenTelemetry auto instrumentation. |
| 29 | +- You can run the applications with Docker or directly. |
| 30 | + |
| 31 | +## Pre-requisites |
| 32 | + |
| 33 | +- Java 21 (if you want to use a lower version, you will need to modify the pom.xml) |
| 34 | +- Docker/Podman |
| 35 | +- Just CLI (optional, but recommended) |
| 36 | +- [K6](https://grafana.com/docs/k6/latest/set-up/install-k6/) for load testing |
| 37 | + |
| 38 | +## Prepare the environment |
| 39 | + |
| 40 | +### Clone the repository |
| 41 | +Clone the repository to your local machine and go to the folder of `challenge-02`. |
| 42 | + |
| 43 | +```shell |
| 44 | +git clone https://github.com/cbos/application-observability-code-challenges |
| 45 | +cd application-observability-code-challenges/challenge-02 |
| 46 | +``` |
| 47 | + |
| 48 | +### Download the OpenTelemetry Java agent jar |
| 49 | + |
| 50 | +```shell |
| 51 | +just download-otel |
| 52 | +``` |
| 53 | +This downloads the OpenTelemetry Java agent jar to the `.otel` directory. |
| 54 | + |
| 55 | +### Build the application |
| 56 | + |
| 57 | +```shell |
| 58 | +just build |
| 59 | +# or if you want to do it manually |
| 60 | +./mvnw clean verify |
| 61 | +``` |
| 62 | + |
| 63 | +### Observability Toolkit or your own stack |
| 64 | +Launch your observability stack or use the Observability Toolkit. |
| 65 | +The sample application assumes that you have an OpenTelemetry endpoint running at `localhost:4318`. |
| 66 | + |
| 67 | +If you don't have any observability tools running, you can run a preconfigured setup with the following commands: |
| 68 | + |
| 69 | +```shell |
| 70 | +git clone https://github.com/cbos/observability-toolkit |
| 71 | +cd observability-toolkit |
| 72 | +just up # or docker-compose up -d |
| 73 | +``` |
| 74 | +Now you can open http://localhost:3000 to open Grafana. |
| 75 | + |
| 76 | +## Run the application setup |
| 77 | + |
| 78 | +The setup of application contains 4 applications: |
| 79 | +- `common-backend`: A common backend used by three other applications |
| 80 | +- `regular-threads`: A version of the application with Java 21 and using regular threads to execute the requests |
| 81 | +- `virtual-threads-21`: A version of the application with Java 21 and using `Virtual Threads` to execute the requests |
| 82 | +- `virtual-threads-25`: A version of the application with Java 25 and using `Virtual Threads` to execute the requests |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +To run the setup, you can use the following command: |
| 87 | + |
| 88 | +```shell |
| 89 | +# or if you want to run it with Docker |
| 90 | +just up |
| 91 | +just down # to stop the docker container |
| 92 | +just ps # to see the status of the container |
| 93 | +just logs <application_name> # to see the logs of the specific applications |
| 94 | + |
| 95 | +# Without Just you can use: |
| 96 | +# docker-compose -f docker/docker-compose.yml up -d --build |
| 97 | +# docker-compose -f docker/docker-compose.yml down |
| 98 | +# docker-compose -f docker/docker-compose.yml ps |
| 99 | +# docker-compose -f docker/docker-compose.yml logs <application_name> |
| 100 | +``` |
| 101 | +The applications are using auto-instrumentation of OpenTelemetry, which provides a foundation for observability, but is it enough for this challenge? |
| 102 | + |
| 103 | +# Test runs with k6 |
| 104 | + |
| 105 | +To get more information about the problems, a set of test scripts is available. |
| 106 | +These testscripts are implemented with [K6](https://grafana.com/oss/k6/), a load testing tool that can also be used for performance testing. |
| 107 | +K6 also produces metrics as well in OpenTelemetry format. |
| 108 | + |
| 109 | +## Run load scripts |
| 110 | + |
| 111 | +The [first challenge](../challenge-01/README.md) contains a detailed description on how to run the tests and how the output can be viewed and interpreted. |
| 112 | +Please read that first, if you are not familiar with K6. |
| 113 | + |
| 114 | +To run the load tests, you can use the following command: |
| 115 | +```shell |
| 116 | +# Just a putting load on the '/' endpoint of the 3 services |
| 117 | +just k6-scenario-1 |
| 118 | +# Or run k6 with the following command, but this does not have metrics pushed: |
| 119 | +# k6 run k6/scenario1.js |
| 120 | + |
| 121 | +# Put load on the /random/<id> endpoint of the 3 services |
| 122 | +just k6-scenario-2 |
| 123 | +```` |
| 124 | +These tests give load and will give you more insights in the Grafana dashboards. |
| 125 | + |
| 126 | +### K6 Load test - Grafana Dashboard |
| 127 | + |
| 128 | +See [http://localhost:3000/d/o11ytk-k6-load-test/k6-load-test](http://localhost:3000/d/o11ytk-k6-load-test/k6-load-test) for the K6 dashboard. |
| 129 | +The first challenge also explains how to use this dashboard. |
| 130 | + |
| 131 | +### Service details - Grafana Dashboard |
| 132 | + |
| 133 | +See [http://localhost:3000/d/o11ytk-service-details/service-details](http://localhost:3000/d/o11ytk-service-details/service-details) for the Service details dashboard. |
| 134 | +The dashboard has details about incoming requests with the RED metrics. Rate, errors and duration. |
| 135 | +It also shows a heat map of request duration for incoming requests. |
| 136 | + |
| 137 | +There are more details per endpoint of the service, but that is not relevant for this test. |
| 138 | +And there are details for the outgoing requests, but that is not relevant for this test either. |
| 139 | + |
| 140 | +There is also a section with JVM metrics such as memory usage (heap and non-heap), garbage collection and threads. |
| 141 | +This gives you more information about the behaviour of the service. |
| 142 | +And there is information about Tomcat, the number of threads specific to Tomcat request handling. |
| 143 | + |
| 144 | +And as you might have noticed already, this challenge is about regular threads and virtual threads, these sections are relevant for this challenge. |
| 145 | + |
| 146 | +## Run the final scenario |
| 147 | + |
| 148 | +The last scenario is the actual scenario you need to run, this again puts load on the 3 applications and with that you will see response times of these services will go up, while the response times of the common backend will be fine and stable. |
| 149 | + |
| 150 | +```shell |
| 151 | +just k6-scenario-3 |
| 152 | +``` |
| 153 | + |
| 154 | +In the setup you can see that the 3 applications are using the common-backend. |
| 155 | +That is what you can see from the node graph as well in Grafana: |
| 156 | + |
| 157 | +Open Grafana > Explore > Select 'Tempo' as datasource > Select 'Service Graph' tab: |
| 158 | + |
| 159 | + |
| 160 | + |
| 161 | + |
| 162 | +Goals for this challenge: |
| 163 | +1) Check if you can find the Java version in the observability data (logs, metrics, traces) |
| 164 | +1) Can you spot the latency gap between the services in traces? |
| 165 | +1) Can you find the problem based on metrics? |
| 166 | +1) And the most important: Can you see the difference between the 3 application `regular-threads` and `virtual-threads-21` and `virtual-threads-25`? |
| 167 | +2) What can be done to make it more observable? |
| 168 | + |
0 commit comments