Skip to content

Commit d824338

Browse files
committed
added part two to the profiling workshop
1 parent 5841629 commit d824338

File tree

7 files changed

+119
-3
lines changed

7 files changed

+119
-3
lines changed

content/en/scenarios/2-debug-problems/2-enable-cpu-profiling.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,109 @@ title: Enable CPU Profiling
33
linkTitle: 5.2 Enable CPU Profiling
44
weight: 2
55
---
6+
7+
Let's learn how to enable the CPU profiler, verify its operation,
8+
and use the results in Splunk Observability Cloud to find out why our application sometimes runs slowly.
9+
10+
### Update the application configuration
11+
12+
We will need to pass an additional configuration argument to the Splunk OpenTelemetry Java agent in order to
13+
enable the profiler. The configuration is [documented here](https://docs.splunk.com/observability/en/gdi/get-data-in/application/java/instrumentation/instrument-java-application.html#activate-alwayson-profiling)
14+
in detail, but for now we just need one single setting:
15+
16+
`SPLUNK_PROFILER_ENABLED="true"`
17+
18+
Since our application is deployed in Kubernetes, we can update the Kubernetes manifest file to set this environment variable. Open the `doorgame/doorgame.yaml` file for editing, and ensure the value of the `SPLUNK_PROFILER_ENABLED` environment variable is set to "true":
19+
20+
````
21+
- name: SPLUNK_PROFILER_ENABLED
22+
value: "true"
23+
````
24+
25+
Next, let's redeploy the Door Game application by running the following command:
26+
27+
```
28+
cd workshop/profiling
29+
./4-redeploy-doorgame.sh
30+
```
31+
32+
After a few minutes, a new pod will be deployed with the updated application settings.
33+
34+
### Confirm operation
35+
36+
To ensure the profiler is enabled, let's review the application logs with the following commands:
37+
38+
````
39+
kubectl logs -l app=doorgame --tail=100 | grep JfrActivator
40+
````
41+
42+
You should see a line in the application log output that shows the profiler is active:
43+
44+
````
45+
[otel.javaagent 2024-02-05 19:01:12:416 +0000] [main] INFO com.splunk.opentelemetry.profiler.JfrActivator - Profiler is active.```
46+
````
47+
48+
This confirms that the profiler is enabled and sending data to the OpenTelemetry collector deployed in our Kubernetes cluster, which in turn sends profiling data to Splunk Observability Cloud.
49+
50+
### Profiling in APM
51+
52+
Visit `http://<your IP address>:81` and play a few more rounds of The Door Game.
53+
54+
Then head on over to Splunk Observability Cloud, click on APM,
55+
and click on the `doorgame` service at the bottom of the screen.
56+
57+
In the rightmost column you should see the "AlwaysOn Profiling"
58+
card that looks similar to this:
59+
60+
![AlwaysOn Profiling](../images/always-on-profiling.png)
61+
62+
Click the card title to go to the AlwaysOn Profiling view. It will look something
63+
like this:
64+
65+
![Flamegraph and table](../images/flamegraph_and_table.png)
66+
67+
By default, we show both the table and [flamegraph](https://www.brendangregg.com/flamegraphs.html).
68+
Take some time to explore this view by doing some of the following:
69+
70+
* toggle between flamegraph and table views
71+
* click a table item and notice the change in flamegraph
72+
* navigate the flamegraph by clicking on a stack frame to zoom in, and a parent frame to zoom out
73+
* add a search term like `splunk` or `jetty` to highlight some matching stack frames
74+
75+
We should note that the sample app is greatly underutilized and most of the time
76+
is spent waiting for user input and service requests. As a result, the flame graph
77+
should be somewhat less interesting than a high-volume, real-world production service.
78+
79+
### Traces with Call Stacks
80+
81+
Now that we've seen the profiling view, let's go back to the trace list view. We want to find a
82+
trace that was long enough so that we increase the chance of having sampled call stacks.
83+
If you haven't already, you should play The Door Game enough to stick with door 3
84+
(either by choosing it initially and staying, or choosing another door and switching when given the chance).
85+
You'll notice that it's slow, and it should show up at around 5s in the trace list view:
86+
87+
![Slow Trace](../images/slow_trace.png)
88+
89+
Identify the slow trace in the trace list view and click it to view the
90+
individual trace. In the single trace view, you should see that the innermost span
91+
`DoorGame.getOutcome` is responsible for the entire slow duration of the span.
92+
There should be 2 call stacks sampled during the execution of that span.
93+
94+
![Span with Stacks](../images/span_with_stacks.png)
95+
96+
If you're up for the challenge, expand the span and explore the Java stack frames on your own
97+
before we tackle it in the next section.
98+
99+
100+
## What did we accomplish?
101+
102+
We've come a long way already!
103+
104+
* We learned how to enable the profiler in the Splunk OpenTelemetry Java instrumentation agent.
105+
* We learned how to verify in the agent output that the profiler is enabled.
106+
* We have explored several profiling related workflows in APM:
107+
* How to navigate to AlwaysOn Profiling from the troubleshooting view
108+
* How to explore the flamegraph and method call duration table through navigation and filtering
109+
* How to identify when a span has sampled call stacks associated with it
110+
111+
In the next section, we'll explore the profiling data further to determine what's causing the slowness, and then apply a fix to our application to resolve the issue.
301 KB
Loading
439 KB
Loading
322 KB
Loading
68.9 KB
Loading

workshop/profiling/doorgame/doorgame.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ spec:
3838
- name: SPLUNK_METRICS_ENDPOINT
3939
value: "http://$(NODE_IP):9943"
4040
- name: SPLUNK_PROFILER_ENABLED
41-
value: "true"
41+
value: "false"
4242
- name: SPLUNK_PROFILER_MEMORY_ENABLED
43-
value: "true"
43+
value: "false"
4444
resources:
4545
requests:
4646
cpu: 150m

workshop/profiling/otel/values.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,14 @@ agent:
1515
- batch
1616
- resourcedetection
1717
- resource
18-
- resource/envname
18+
- resource/envname
19+
logs:
20+
processors:
21+
- memory_limiter
22+
- k8sattributes
23+
- filter/logs
24+
- batch
25+
- resourcedetection
26+
- resource
27+
- resource/logs
28+
- resource/envname

0 commit comments

Comments
 (0)