You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/scenarios/2-debug-problems/2-enable-cpu-profiling.md
+106Lines changed: 106 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,3 +3,109 @@ title: Enable CPU Profiling
3
3
linkTitle: 5.2 Enable CPU Profiling
4
4
weight: 2
5
5
---
6
+
7
+
Let's learn how to enable the CPU profiler, verify its operation,
8
+
and use the results in Splunk Observability Cloud to find out why our application sometimes runs slowly.
9
+
10
+
### Update the application configuration
11
+
12
+
We will need to pass an additional configuration argument to the Splunk OpenTelemetry Java agent in order to
13
+
enable the profiler. The configuration is [documented here](https://docs.splunk.com/observability/en/gdi/get-data-in/application/java/instrumentation/instrument-java-application.html#activate-alwayson-profiling)
14
+
in detail, but for now we just need one single setting:
15
+
16
+
`SPLUNK_PROFILER_ENABLED="true"`
17
+
18
+
Since our application is deployed in Kubernetes, we can update the Kubernetes manifest file to set this environment variable. Open the `doorgame/doorgame.yaml` file for editing, and ensure the value of the `SPLUNK_PROFILER_ENABLED` environment variable is set to "true":
19
+
20
+
````
21
+
- name: SPLUNK_PROFILER_ENABLED
22
+
value: "true"
23
+
````
24
+
25
+
Next, let's redeploy the Door Game application by running the following command:
26
+
27
+
```
28
+
cd workshop/profiling
29
+
./4-redeploy-doorgame.sh
30
+
```
31
+
32
+
After a few minutes, a new pod will be deployed with the updated application settings.
33
+
34
+
### Confirm operation
35
+
36
+
To ensure the profiler is enabled, let's review the application logs with the following commands:
You should see a line in the application log output that shows the profiler is active:
43
+
44
+
````
45
+
[otel.javaagent 2024-02-05 19:01:12:416 +0000] [main] INFO com.splunk.opentelemetry.profiler.JfrActivator - Profiler is active.```
46
+
````
47
+
48
+
This confirms that the profiler is enabled and sending data to the OpenTelemetry collector deployed in our Kubernetes cluster, which in turn sends profiling data to Splunk Observability Cloud.
49
+
50
+
### Profiling in APM
51
+
52
+
Visit `http://<your IP address>:81` and play a few more rounds of The Door Game.
53
+
54
+
Then head on over to Splunk Observability Cloud, click on APM,
55
+
and click on the `doorgame` service at the bottom of the screen.
56
+
57
+
In the rightmost column you should see the "AlwaysOn Profiling"
Click the card title to go to the AlwaysOn Profiling view. It will look something
63
+
like this:
64
+
65
+

66
+
67
+
By default, we show both the table and [flamegraph](https://www.brendangregg.com/flamegraphs.html).
68
+
Take some time to explore this view by doing some of the following:
69
+
70
+
* toggle between flamegraph and table views
71
+
* click a table item and notice the change in flamegraph
72
+
* navigate the flamegraph by clicking on a stack frame to zoom in, and a parent frame to zoom out
73
+
* add a search term like `splunk` or `jetty` to highlight some matching stack frames
74
+
75
+
We should note that the sample app is greatly underutilized and most of the time
76
+
is spent waiting for user input and service requests. As a result, the flame graph
77
+
should be somewhat less interesting than a high-volume, real-world production service.
78
+
79
+
### Traces with Call Stacks
80
+
81
+
Now that we've seen the profiling view, let's go back to the trace list view. We want to find a
82
+
trace that was long enough so that we increase the chance of having sampled call stacks.
83
+
If you haven't already, you should play The Door Game enough to stick with door 3
84
+
(either by choosing it initially and staying, or choosing another door and switching when given the chance).
85
+
You'll notice that it's slow, and it should show up at around 5s in the trace list view:
86
+
87
+

88
+
89
+
Identify the slow trace in the trace list view and click it to view the
90
+
individual trace. In the single trace view, you should see that the innermost span
91
+
`DoorGame.getOutcome` is responsible for the entire slow duration of the span.
92
+
There should be 2 call stacks sampled during the execution of that span.
93
+
94
+

95
+
96
+
If you're up for the challenge, expand the span and explore the Java stack frames on your own
97
+
before we tackle it in the next section.
98
+
99
+
100
+
## What did we accomplish?
101
+
102
+
We've come a long way already!
103
+
104
+
* We learned how to enable the profiler in the Splunk OpenTelemetry Java instrumentation agent.
105
+
* We learned how to verify in the agent output that the profiler is enabled.
106
+
* We have explored several profiling related workflows in APM:
107
+
* How to navigate to AlwaysOn Profiling from the troubleshooting view
108
+
* How to explore the flamegraph and method call duration table through navigation and filtering
109
+
* How to identify when a span has sampled call stacks associated with it
110
+
111
+
In the next section, we'll explore the profiling data further to determine what's causing the slowness, and then apply a fix to our application to resolve the issue.
0 commit comments