Skip to content

Commit a679b5e

Browse files
committed
added part 3 to profiling workshop
1 parent d824338 commit a679b5e

File tree

2 files changed

+111
-0
lines changed

2 files changed

+111
-0
lines changed

content/en/scenarios/2-debug-problems/3-find-and-fix-slowness.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,114 @@ title: Find and Fix Application Slowness
33
linkTitle: 5.3 Find and Fix Application Slowness
44
weight: 3
55
---
6+
7+
In this section, we pick up where we left off last time.
8+
We will use the profiling tool to explore our slow span and link it
9+
back to the source code that caused the slowness. We will update the code to improve
10+
the performance of this span, and we will use the APM profiling tools to verify
11+
that our change is successful.
12+
13+
### Exploring the Span
14+
15+
At the end of the previous section, we have identified a span that was taking more
16+
than 5 seconds to complete. We observed that the span showed that it was linked
17+
to 2 or more sampled call stacks. Let's proceed by clicking on the span to expand its details.
18+
19+
![View Stack on Span](../images/view_stack_on_span.png)
20+
21+
When a span does not have CPU call stack data, we would normally just see the span details.
22+
(you can drill into the parent stack to see this in action). When we expand the child span, however,
23+
we have the option to see "CPU Call Stack" data as well.
24+
25+
At the bottom of the call stack, we can see that the thread was busy sleeping:
26+
27+
```
28+
com.splunk.profiling.workshop.ServiceMain$$Lambda$.handle(Unknown Source:0)
29+
com.splunk.profiling.workshop.ServiceMain.lambda$main$(ServiceMain.java:34)
30+
com.splunk.profiling.workshop.DoorGame.getOutcome(DoorGame.java:41)
31+
com.splunk.profiling.workshop.DoorChecker.isWinner(DoorChecker.java:14)
32+
com.splunk.profiling.workshop.DoorChecker.checkDoorTwo(DoorChecker.java:30)
33+
com.splunk.profiling.workshop.DoorChecker.precheck(DoorChecker.java:36)
34+
com.splunk.profiling.workshop.Util.sleep(Util.java:9)
35+
java.util.concurrent.TimeUnit.sleep(Unknown Source:0)
36+
java.lang.Thread.sleep(Unknown Source:0)
37+
java.lang.Thread.sleep(Native Method:0)
38+
```
39+
40+
The call stack tells us a story -- reading from the bottom up, it lets us describe
41+
what is happening inside the service code. A developer, even one unfamiliar with the
42+
source code, should be able to look at this call stack to craft a narrative like:
43+
> We are getting the outcome of a game. We leverage the DoorChecker to
44+
> see if something is the winner, but the check for door two somehow issues
45+
> a precheck() that, for some reason, is deciding to sleep for a long time.
46+
47+
Our workshop application is left intentionally simple -- a real-world service might see the
48+
thread being sampled during a database call or calling into an un-traced external service.
49+
It is also possible that a slow span is executing a complicated business process,
50+
in which case maybe none of the stack traces relate to each other at all.
51+
52+
The longer a method or process is, the greater chance we will have call stacks
53+
sampled during its execution.
54+
55+
### Let's Fix That Bug
56+
57+
By using the profiling tool, we were able to determine that our application is slow
58+
when issuing the `DoorChecker.precheck()` method from inside `DoorChecker.checkDoorTwo()`.
59+
Let's open the `doorgame/src/main/java/com/splunk/profiling/workshop/DoorChecker.java` source file in our editor.
60+
61+
By quickly glancing through the file, we see that there are methods for checking
62+
each door, and all of them call `precheck()`. In a real service, we might be uncomfortable
63+
simply removing the `precheck()` call because there could be unseen/unaccounted side
64+
effects.
65+
66+
Down on line 29 we see the following:
67+
```java
68+
private boolean checkDoorTwo(GameInfo gameInfo) {
69+
precheck(2);
70+
return gameInfo.isWinner(2);
71+
}
72+
73+
private void precheck(int doorNum) {
74+
long extra = (int)Math.pow(70, doorNum);
75+
sleep(300 + extra);
76+
}
77+
```
78+
79+
With our developer hat on, we notice that the door number is zero based, so
80+
the first door is 0, the second is 1, and the 3rd is 2 (this is conventional).
81+
The `extra` value is used as extra/additional sleep time, and it is computed by taking
82+
`70^doorNum` (`Math.pow` performs an exponent calculation). That's odd, because this means:
83+
* door 0 => 70^0 => 1ms
84+
* door 1 => 70^1 => 70ms
85+
* door 2 => 70^2 => 4900ms
86+
87+
We've found the root cause of our slow bug! This also explains why the first two doors
88+
weren't ever very slow.
89+
90+
We have a quick chat with our product manager and team lead, and we agree that the `precheck()`
91+
method must stay but that the extra padding isn't required. Let's remove the `extra` variable
92+
and make `precheck` now read like this:
93+
94+
```java
95+
private void precheck(int doorNum) {
96+
sleep(300);
97+
}
98+
```
99+
100+
Now all doors will have a consistent behavior. Save your work and then rebuild and redeploy the application using the following command:
101+
102+
```
103+
cd workshop/profiling
104+
./4-redeploy-doorgame.sh
105+
```
106+
107+
Once the application has been redeployed successfully, visit The Door Game again to confirm that your fix is in place:
108+
`http://<your IP address>:81`
109+
110+
## What did we accomplish?
111+
112+
* We learned how to view a series of call stacks captured during a Span.
113+
* We learned how the call stack can tell us a story and point us to suspect lines of code.
114+
* We identified the slow code and fixed it to make it faster.
115+
116+
In the next section, we'll explore how to enable the memory profiling component of AlwaysOn Profiling, which can tell us which objects are consuming the most heap memory.
350 KB
Loading

0 commit comments

Comments
 (0)