Skip to content

Commit aa2c00d

Browse files
committed
Application Observability Code Challenge 01 - step by step revealed
1 parent 1821751 commit aa2c00d

File tree

2 files changed

+40
-15
lines changed

2 files changed

+40
-15
lines changed

content/posts/20250501-challenge-1-step-by-step/index.md

Lines changed: 40 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
title: " Challenge 1 - step by step revealed"
3-
date: 2025-05-01T11:58:54+02:00
4-
draft: true
3+
date: 2025-05-02T16:30:54+02:00
54
pin: true
65

76
resources:
@@ -25,7 +24,8 @@ summary:
2524

2625
![Application Observability Code Challenge 1 step by step revealed](challenge01_revealed.png)
2726

28-
> [!CAUTION] **Spoiler alert**
27+
> [!CAUTION]
28+
> **Spoiler alert**
2929
> This page contains a step by step explanation of [challenge 1](../aocc-challenge-01).
3030
3131
At the start of the [Application Observability Code Challenges](../application-observability-code-challenges) I described observability as:
@@ -45,7 +45,7 @@ This is a screenshot of a part of the dashboard showing the request rate, durati
4545
![Service metrics rate, duration and threads](service_metrics_rate_duration_threads.png)
4646
In the blocks I marked when the 3 scripts were executed.
4747

48-
> [!QUESTION]
48+
> [!TIP]
4949
> What can we learn from these metrics?
5050
5151
## Request rate panel (on top)
@@ -58,7 +58,7 @@ I did not take the full screenshot, but after this it even goes way up, which ar
5858

5959
## JVM Threads
6060
The thread panel shows a very consistent number of threads. Even with 40 concurrent users during the second test run, the number of threads does not increase much.
61-
This indicates that at least one thread is not being created per user.
61+
If it would have created a thread per user, there would be much more threads, but that is not the case.
6262

6363
## Tomcat threads
6464
This is also supported by the panel with Tomcat threads.
@@ -70,7 +70,7 @@ And this is also expected, as our application uses `async' message handling. In
7070
Can we see any special behaviour regarding memory usage? Not really, it behaves normally. The garbage collector kicks in from time to time and is able to free the memory as expected.
7171
![Service metrics memory](service_metrics_memory.png)
7272

73-
> [!QUESTION]
73+
> [!TIP]
7474
> What about traces?
7575
7676
A trace has also been created for each request. What can we learn from those traces?
@@ -85,7 +85,7 @@ This is an example trace:
8585

8686
Unfortunately we cannot see anything based on this, we can see that it took 2 seconds to process this request, no further details...
8787

88-
> [!QUESTION]
88+
> [!TIP]
8989
> What can we do to improve the observability to get more insight?
9090
9191
What kind of additional information can we extract from the application to get more information about what is going on?
@@ -185,7 +185,7 @@ public class ManagedAsyncExecutorProvider extends ThreadPoolExecutorProvider {
185185
```
186186
**But wait... we did not see the number of threads increase during the runs. How does the executor handle the tasks? Does it scale properly?
187187

188-
> [!QUESTION]
188+
> [!TIP]
189189
> What can we do to improve the observability of the async executor?
190190
191191
## Extending instrumentation with the OpenTelemetry API
@@ -283,13 +283,16 @@ The results of the K6 test will not be different, but we will have more metrics
283283
We can use Grafana's 'Explore' feature or drill down metrics feature as long as we do not add these metrics to a dashboard.
284284

285285
![Threadpool](threadpool.png)
286+
286287
In this graph we see that the core size is 4 and the maximum pool size is 40. This is expected.
287288
The green line shows the number of active threads. It is going up, but not above the core size.
288289

289290
![Created and completed tasks](tasks.png)
291+
290292
The total number of tasks created and the number of tasks completed follow each other closely. This is expected, as all requests have been completed.
291293

292294
![Tasks in queue](queue.png)
295+
293296
The queue of tasks to be executed is growing, especially during the phase when 40 concurrent users were active. To process the tasks faster, more threads could have been created, but this is not done. Why not?
294297

295298
## Why is it not creating more threads?
@@ -313,15 +316,25 @@ Ok ... and what is more in the Java doc ...
313316
Ah ... this is exactly the behaviour we see. The queue is growing and at these moments the duration of requests is high.
314317
The core threads continue to process the requests, but no new threads are created.
315318

316-
> [!QUESTION]
319+
## How to solve this?
320+
> [!TIP]
317321
> How to solve this? Can we prove that our solution solves it?
318322
319-
There are several solutions to this problem, to name a few:
323+
There is no one size fits all solution, it depends on what you are trying to achieve. And in certain situations it depends on the architecture of the whole environment and what is expected of that service.
324+
For all solutions, it is important to think in advance about what you want to achieve, whether the solution fits and, importantly(!), to validate that the solution works as expected.
320325

321-
- Remove the use of `@ManagedAsync`. This would allow Tomcat to handle each request in a separate thread.
322-
- Change the way you use async requests.
326+
Even for this simple setup, there are several solutions. To name a few:
327+
328+
- Just increase the corePoolSize
329+
- Just increasing the core size is an easy change
330+
- Consideration: This does not change the behaviour of the solution, it just delays the tipping point.
331+
- Remove the use of `@ManagedAsync`.
332+
- This is also a simple change. Instead of using a separate thread pool to handle the requests, use the Tomcat thread pool to handle each request by a thread.
333+
- Consideration: This is a more fundamental change to your application. It changes from async to sync. In this example it is not a problem, but in other cases the impact may be greater.
323334
- Tune the `ManagedAsyncExecutorProvider' to make it work.
324-
-
335+
- By tuning the `ManagedAsyncExecutorProvider' we keep the architecture of the application the same, we just make sure we make more use of the up and down scaling of the thread pool.
336+
- Consideration: this requires more tuning and testing to see if it behaves as expected.
337+
325338
> [!TIP]
326339
> You can try some solutions yourself and see what the results are when you run the tests again!
327340
@@ -339,11 +352,12 @@ protected BlockingQueue<Runnable> getWorkQueue() {
339352

340353
Now we can redeploy the change and run the test again.
341354

342-
343355
![Threadpool](threadpool_fixed.png)
356+
344357
Now we see the active threads go up above the core size. Once the test is finished, you see after a while the number of threads going down again.
345358

346359
![Tasks in queue](queue_fixed.png)
360+
347361
We see that the queue is almost empty the whole time. The metric is a sampled value. Most likely it reached the limit of 2 more often, as more threads have been created.
348362
But due to the sampling, you cannot see that.
349363

@@ -352,4 +366,15 @@ And when we have a look at the results of K6 we see results as we hoped to see a
352366

353367
# Summary
354368

355-
<>
369+
> #### Observability: Every Engineer’s Job, Not Just Ops’ Problem
370+
> Observability is more about software engineering than it is about operations.
371+
> by Martin Thwaites on [The New Stack](https://thenewstack.io/observability-every-engineers-job-not-just-ops-problem/)
372+
373+
I totally agree with this and what Martin wrote in the article. I also think that this code challenge shows that this is not just an Ops problem.
374+
As a developer, you are the person who created and maintains the software. You should know how it works and which libraries are used.
375+
For the important parts of these libraries, observability should be activated or created to make sure you can see the actual behaviour.
376+
377+
Practising your skills will also help you to be able to quickly see what is going on. If you have seen the graphs for the first time, you need time to understand what they mean.
378+
If you have seen them quite often, you can quickly see where they deviate from normal situations and where problems might be.
379+
380+
I have some more challenges in mind, so **keep posted if you want to practice more**. I just need to find time to create the next challenge :smile:
133 KB
Loading

0 commit comments

Comments
 (0)