You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For this workshop, we'll be using a microservices-based application. This application is for an online retailer and normally includes more than a dozen services. However, to keep the workshop simple, we'll be focusing on two services used by the retailer as part of their payment processing workflow: the credit check service and the credit processor service.
8
12
9
13
## Pre-requisites
@@ -14,9 +18,9 @@ You will start with an EC2 environment that already has some useful components,
14
18
* Deploy a load generator to send traffic to the services
15
19
16
20
## Initial Steps
17
-
To begin the exercise you will need a Splunk Observablity Cloud environment that you can send data to. For this environment you'll need:
21
+
To begin the exercise you will need a **Splunk Observablity Cloud** environment that you can send data to. For this environment you'll need:
18
22
19
-
* The realm (i.e. us1)
23
+
* The realm (i.e. `us1`)
20
24
* An access token
21
25
22
26
The initial setup can be completed by executing the following steps on the command line of your EC2 instance, which runs Ubuntu 22.04:
@@ -37,15 +41,15 @@ cd observability-workshop/workshop/tagging
37
41
38
42
## View your application in Splunk Observability Cloud
39
43
40
-
Now that the setup is complete, let's confirm that it's sending data to Splunk Observability Cloud.
44
+
Now that the setup is complete, let's confirm that it's sending data to **Splunk Observability Cloud**.
41
45
42
-
Navigate to APM, then use the Environment dropdown to select your environment (i.e. tagging-workshop-name).
46
+
Navigate to APM, then use the Environment dropdown to select your environment (i.e. `tagging-workshop-name`).
43
47
44
-
If everything was deployed correctly, you should see creditprocessorservice and creditcheckservice displayed in the list of services:
48
+
If everything was deployed correctly, you should see `creditprocessorservice` and `creditcheckservice` displayed in the list of services:
45
49
46
50

47
51
48
-
Click on Explore on the right-hand side to view the service map. We can see that the creditcheckservice makes calls to the creditprocessorservice, with an average response time of around 3.5 seconds:
52
+
Click on Explore on the right-hand side to view the service map. We can see that the `creditcheckservice` makes calls to the `creditprocessorservice`, with an average response time of around 3.5 seconds:
49
53
50
54

51
55
@@ -57,13 +61,13 @@ You'll also notice that some traces have errors:
57
61
58
62

59
63
60
-
Sort the traces by duration then click on one of the longer running traces. In this example, the trace took five seconds, and we can see that most of the time was spent calling the /runCreditCheck operation, which is part of the creditprocessorservice.
64
+
Sort the traces by duration then click on one of the longer running traces. In this example, the trace took five seconds, and we can see that most of the time was spent calling the `/runCreditCheck` operation, which is part of the `creditprocessorservice`.
Currently, we don't have enough details in our traces to understand why some requests finish in a few milliseconds, and others take several seconds. To provide the best possible customer experience, this will be critical for us to understand.
65
69
66
-
We also don't have enough information to understand why some requests result in errors, and others don't. For example, if we look at one of the error traces, we can see that the error occurs when the creditprocessorservice attempts to call another service named "otherservice". But why do some requests results in a call to otherservice, and others don't?
70
+
We also don't have enough information to understand why some requests result in errors, and others don't. For example, if we look at one of the error traces, we can see that the error occurs when the `creditprocessorservice` attempts to call another service named `otherservice`. But why do some requests results in a call to `otherservice`, and others don't?
To understand why some requests have errors or slow performance, we'll need to add context to our traces. We'll do this by adding tags.
8
10
9
11
## What are tags?
@@ -25,7 +27,7 @@ A note about terminology before we proceed. While this workshop is about **tags*
25
27
26
28
## What are tags so important?
27
29
28
-
Tags are essential for an application to be truly observable. As we saw with our credit score application, some users are having a great experience: fast with no errors. But other users get a slow experience or encounter errors.
30
+
Tags are essential for an application to be truly observable. As we saw with our credit check service, some users are having a great experience: fast with no errors. But other users get a slow experience or encounter errors.
29
31
30
32
Tags add the context to the traces to help us understand why some users get a great experience and others don't. And powerful features in **Splunk Observability Cloud** utilize tags to help you jump quickly to root cause.
Let's add some tags to our traces, so we can find out why some customers receive a poor experience from our application.
8
9
9
10
## Identify Useful Tags
10
11
11
-
We'll start by reviewing the code for the **credit_check** function of **creditcheckservice** (which can be found in the **main.py** file):
12
+
We'll start by reviewing the code for the `credit_check` function of `creditcheckservice` (which can be found in the `main.py` file):
12
13
13
14
````
14
15
def credit_check():
@@ -28,13 +29,13 @@ def credit_check():
28
29
29
30
We can see that this function accepts a **customer number** as an input. This would be helpful to capture as part of a trace. What else would be helpful?
30
31
31
-
Well, the **credit score** returned for this customer by the **creditprocessorservice** may be interesting (we want to ensure we don't capture any PII data though). It would also be helpful to capture the **credit score category**, and the **credit check result**.
32
+
Well, the **credit score** returned for this customer by the `creditprocessorservice` may be interesting (we want to ensure we don't capture any PII data though). It would also be helpful to capture the **credit score category**, and the **credit check result**.
32
33
33
34
Great, we've identified four tags to capture from this service that could help with our investigation. But how do we capture these?
34
35
35
36
## Capture Tags
36
37
37
-
We start by adding importing the trace module by adding an import statement to the top of the creditcheckservice/main.py file:
38
+
We start by adding importing the trace module by adding an import statement to the top of the `creditcheckservice/main.py` file:
38
39
39
40
````
40
41
import requests
@@ -82,7 +83,7 @@ def credit_check():
82
83
83
84
## Redeploy Service
84
85
85
-
Once these changes are made, let's run the following script to rebuild the Docker image used for creditcheckservice and redeploy it to our Kubernetes cluster:
86
+
Once these changes are made, let's run the following script to rebuild the Docker image used for `creditcheckservice` and redeploy it to our Kubernetes cluster:
Now that we've captured several tags from our application, lets explore some of the trace data we've captured that include this additional context, and see if we can identify what's causing poor user experience in some cases.
8
10
9
11
## Use Trace Analyzer
10
12
11
-
Navigate to **APM**, then select **Traces**. This takes us to the **Trace Analyzer**, where we can add filters to search for traces of interest. For example, we can filter on traces where the credit score starts with "7":
13
+
Navigate to **APM**, then select **Traces**. This takes us to the **Trace Analyzer**, where we can add filters to search for traces of interest. For example, we can filter on traces where the credit score starts with `7`:
12
14
13
15

14
16
@@ -18,12 +20,12 @@ We can apply similar filters for the customer number, credit score category, and
18
20
19
21
## Explore Traces With Errors
20
22
21
-
Let's remove the credit score filter and toggle "Errors only" to on, which results in a list of only those traces where an error occurred:
23
+
Let's remove the credit score filter and toggle **Errors only** to on, which results in a list of only those traces where an error occurred:
22
24
23
25

24
26
25
27
Click on a few of these traces, and look at the tags we captured. Do you notice any patterns?
26
28
27
29
If you found a pattern - great job! But keep in mind that this is a difficult way to troubleshoot, as it requires you to look through many traces and remember what you saw in each one to see if you can identify a pattern.
28
30
29
-
Thankfully, Splunk Observability cloud provides a more efficient way to do this, which we'll explore next.
31
+
Thankfully, **Splunk Observability Cloud** provides a more efficient way to do this, which we'll explore next.
Why did we choose to index the **credit.score.category** tag and not the others?
30
+
Why did we choose to index the `credit.score.category` tag and not the others?
29
31
30
-
To understand this, it’s helpful to understand the primary use cases for attributes:
32
+
To understand this, let's review the primary use cases for attributes:
31
33
32
34
* Filtering
33
35
* Grouping
34
36
35
-
With the filtering use case, we can use the Trace Analyzer capability of Splunk Observability Cloud to filter on traces that match a particular attribute value. We saw an example of this earlier, when we filtered on traces where the credit score started with "7". Or if a customer called in to complain about slow service, we could use Trace Analyzer to locate all traces with their particular cusotmer number.
37
+
### Filtering
38
+
39
+
With the filtering use case, we can use the **Trace Analyzer** capability of **Splunk Observability Cloud** to filter on traces that match a particular attribute value.
40
+
41
+
We saw an example of this earlier, when we filtered on traces where the credit score started with "7".
42
+
43
+
Or if a customer called in to complain about slow service, we could use **Trace Analyzer** to locate all traces with that particular customer number.
44
+
45
+
Attributes used for filtering use cases are generally high-cardinality, meaning that there could be thousands or even hundreds of thousands of unique values. In fact, **Splunk Observability Cloud** can handle an effectively infinite number of unique attribute values! Filtering using these attributes allows us to rapidly locate the traces of interest.
46
+
47
+
Note that we aren't required to index tags to use them for filtering with **Trace Analyzer**.
36
48
37
-
To use Trace Analyzer, it's not necessary to index tags.
49
+
### Grouping
38
50
39
-
On the other hand, with the grouping use case, we can surface trends for attributes that we collect using the powerful Tag Spotlight feature in Splunk Observability Cloud, which we'll see in action shortly.
51
+
With the grouping use case, we can surface trends for attributes that we collect using the powerful **Tag Spotlight** feature in **Splunk Observability Cloud**, which we'll see in action shortly.
40
52
41
-
Applying grouping to our trace data allows us to rapidly surface trends and identify patterns.
53
+
Attributes used for grouping use cases should be low to medium-cardinality, with hundreds of unique values.
42
54
43
-
Attributes used for grouping should be low to medium-cardinality, with hundreds of unique values. For custom attributes to be used with Tag Spotlight, they first need to be indexed.
55
+
For custom attributes to be used with **Tag Spotlight**, they first need to be indexed.
44
56
45
-
So we decided to index the **credit.score.category** tag because it has a few distinct values that would be useful for grouping. In contract, the customer number and credit score tags can have hundreds or thousands of values, and are more valuable for filtering rather than grouping.
57
+
We decided to index the `credit.score.category` tag because it has a few distinct values that would be useful for grouping. In contrast, the customer number and credit score tags have hundreds or thousands of unique values, and are more valuable for filtering use cases rather than grouping.
46
58
47
-
## Troubleshooting vs. Monitoring Metric Sets
59
+
## Troubleshooting vs. Monitoring MetricSets
48
60
49
-
You may have noticed that, to index this tag, we created something called a **Troubleshooting Metric Set**. It's named this was because a Troubleshooting Metric Set, or TMS, allows us to troubleshoot features with this tag using features such as Tag Spotlight, which we'll explore next.
61
+
You may have noticed that, to index this tag, we created something called a **Troubleshooting MetricSet**. It's named this was because a Troubleshooting MetricSet, or TMS, allows us to troubleshoot issues with this tag using features such as **Tag Spotlight**.
50
62
51
-
You may have also noticed that there's another option, which we didn't choose, which is called a **Monitoring Metric Set**. Monitoring Metric Sets go beyond troubleshooting and allow us to use tags for alerts, dashboards, and more. We'll explore this later in the workshop.
63
+
You may have also noticed that there's another option which we didn't choosecalled a **Monitoring MetricSet** (or MMS). Monitoring MetricSets go beyond troubleshooting and allow us to use tags for alerting and dashboards. We'll explore this later in the workshop.
0 commit comments