splunk
diff --git a/‎content/en/scenarios/5-understand-impact/1-build-application.md‎
Lines changed: 10 additions & 7 deletions b/‎content/en/scenarios/5-understand-impact/1-build-application.md‎
Lines changed: 10 additions & 7 deletions
diff --git a/‎content/en/scenarios/5-understand-impact/2-what-are-tags.md‎
Lines changed: 1 addition & 1 deletion b/‎content/en/scenarios/5-understand-impact/2-what-are-tags.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎content/en/scenarios/5-understand-impact/3-capture-tags.md‎
Lines changed: 15 additions & 11 deletions b/‎content/en/scenarios/5-understand-impact/3-capture-tags.md‎
Lines changed: 15 additions & 11 deletions
diff --git a/‎content/en/scenarios/5-understand-impact/4-explore-trace-data.md‎
Lines changed: 4 additions & 2 deletions b/‎content/en/scenarios/5-understand-impact/4-explore-trace-data.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎content/en/scenarios/5-understand-impact/5-index-tags.md‎
Lines changed: 10 additions & 10 deletions b/‎content/en/scenarios/5-understand-impact/5-index-tags.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎content/en/scenarios/5-understand-impact/6-use-tags.md‎
Lines changed: 6 additions & 6 deletions b/‎content/en/scenarios/5-understand-impact/6-use-tags.md‎
Lines changed: 6 additions & 6 deletions
@@ -11,7 +11,7 @@ weight: 1
 For this workshop, we'll be using a microservices-based application. This application is for an online retailer and normally includes more than a dozen services.  However, to keep the workshop simple, we'll be focusing on two services used by the retailer as part of their payment processing workflow:  the credit check service and the credit processor service. 
 
 ## Pre-requisites
-You will start with an EC2 environment that already has some useful components, but we will perform some [initial steps](#initial-steps) in order to get to the following state:
+You will start with a t2.medium EC2 instance with 20 GB of disk storage, and perform some [initial steps](#initial-steps) in order to get to the following state:
 * Install Kubernetes (k3s) and Docker
 * Deploy the **Splunk distribution of the OpenTelemetry Collector**
 * Build and deploy `creditcheckservice` and `creditprocessorservice`
@@ -33,6 +33,9 @@ cd observability-workshop/workshop/tagging
 
 # Exit and ssh back to this instance
 
+# return to the same directory as before 
+cd observability-workshop/workshop/tagging
+
 ./2-deploy-otel-collector.sh
 ./3-deploy-creditcheckservice.sh
 ./4-deploy-creditprocessorservice.sh
@@ -41,34 +44,34 @@ cd observability-workshop/workshop/tagging
 
 ## View your application in Splunk Observability Cloud 
 
-Now that the setup is complete, let's confirm that it's sending data to **Splunk Observability Cloud**.
+Now that the setup is complete, let's confirm that it's sending data to **Splunk Observability Cloud**.  Note that when the application is deployed for the first time, it may take a few minutes for the data to appear. 
 
 Navigate to APM, then use the Environment dropdown to select your environment (i.e. `tagging-workshop-name`). 
 
 If everything was deployed correctly, you should see `creditprocessorservice` and `creditcheckservice` displayed in the list of services: 
 
 ![APM Overview](../images/apm_overview.png)
 
-Click on Explore on the right-hand side to view the service map.  We can see that the `creditcheckservice` makes calls to the `creditprocessorservice`, with an average response time of around 3.5 seconds: 
+Click on **Explore** on the right-hand side to view the service map.  We can see that the `creditcheckservice` makes calls to the `creditprocessorservice`, with an average response time of at least 3 seconds: 
 
 ![Service Map](../images/service_map.png)
 
-Next, click on Traces on the right-hand side to see the traces captured for this application. You'll see that some traces run relatively fast (i.e. just a few milliseconds), whereas others take a few seconds.  
+Next, click on **Traces** on the right-hand side to see the traces captured for this application. You'll see that some traces run relatively fast (i.e. just a few milliseconds), whereas others take a few seconds.  
 
 ![Traces](../images/traces.png)
 
-You'll also notice that some traces have errors: 
+If you toggle **Errors only** to `on`, you'll also notice that some traces have errors: 
 
 ![Traces](../images/traces_with_errors.png)
 
-Sort the traces by duration then click on one of the longer running traces. In this example, the trace took five seconds, and we can see that most of the time was spent calling the `/runCreditCheck` operation, which is part of the `creditprocessorservice`. 
+Toggle **Errors only** back to `off` and sort the traces by duration, then click on one of the longer running traces. In this example, the trace took five seconds, and we can see that most of the time was spent calling the `/runCreditCheck` operation, which is part of the `creditprocessorservice`. 
 
 ![Long Running Trace](../images/long_running_trace.png)
 
 Currently, we don't have enough details in our traces to understand why some requests finish in a few milliseconds, and others take several seconds. To provide the best possible customer experience, this will be critical for us to understand. 
 
 We also don't have enough information to understand why some requests result in errors, and others don't. For example, if we look at one of the error traces, we can see that the error occurs when the `creditprocessorservice` attempts to call another service named `otherservice`.  But why do some requests results in a call to `otherservice`, and others don't? 
 
-![Long Running Trace](../images/error_trace.png)
+![Trace with Errors](../images/error_trace.png)
 
 We'll explore these questions and more in the workshop. 
@@ -6,7 +6,7 @@ weight: 2
 
 {{% badge icon="clock" style="primary" %}}3 minutes{{% /badge %}}
 
-To understand why some requests have errors or slow performance, we'll need to add context to our traces. We'll do this by adding tags. 
+To understand why some requests have errors or slow performance, we'll need to add context to our traces. We'll do this by adding tags. But first, let's take a moment to discuss what tags are, and why they're so important for observability. 
 
 ## What are tags? 
 
 
@@ -9,19 +9,23 @@ Let's add some tags to our traces, so we can find out why some customers receive
 
 ## Identify Useful Tags
 
-We'll start by reviewing the code for the `credit_check` function of `creditcheckservice` (which can be found in the `main.py` file): 
+We'll start by reviewing the code for the `credit_check` function of `creditcheckservice` (which can be found in the `/home/ubuntu/observability-workshop/workshop/tagging/creditcheckservice/main.py` file): 
 
 ````
+@app.route('/check')
 def credit_check():
     customerNum = request.args.get('customernum')
-    
+
     # Get Credit Score
     creditScoreReq = requests.get("http://creditprocessorservice:8899/getScore?customernum=" + customerNum)
+    creditScoreReq.raise_for_status()
     creditScore = int(creditScoreReq.text)
+
     creditScoreCategory = getCreditCategoryFromScore(creditScore)
 
     # Run Credit Check
     creditCheckReq = requests.get("http://creditprocessorservice:8899/runCreditCheck?customernum=" + str(customerNum) + "&score=" + str(creditScore))
+    creditCheckReq.raise_for_status()
     checkResult = str(creditCheckReq.text)
 
     return checkResult
@@ -41,42 +45,42 @@ We start by adding importing the trace module by adding an import statement to t
 import requests
 from flask import Flask, request
 from waitress import serve
-from opentelemetry import trace  # <--- ADD THIS
+from opentelemetry import trace  # <--- ADDED BY WORKSHOP
 ...
 ````
 
 Next, we need to get a reference to the current span so we can add an attribute (aka tag) to it: 
 
 ````
 def credit_check():
-    current_span = trace.get_current_span()
+    current_span = trace.get_current_span()  # <--- ADDED BY WORKSHOP
     customerNum = request.args.get('customernum')
-    current_span.set_attribute("customer.num", customerNum)
+    current_span.set_attribute("customer.num", customerNum)  # <--- ADDED BY WORKSHOP
 ...
 ````
 
 That was pretty easy, right?  Let's capture some more, with the final result looking like this: 
 
 ````
 def credit_check():
-    current_span = trace.get_current_span()
+    current_span = trace.get_current_span()  # <--- ADDED BY WORKSHOP
     customerNum = request.args.get('customernum')
-    current_span.set_attribute("customer.num", customerNum)
+    current_span.set_attribute("customer.num", customerNum)  # <--- ADDED BY WORKSHOP
 
     # Get Credit Score
     creditScoreReq = requests.get("http://creditprocessorservice:8899/getScore?customernum=" + customerNum)
     creditScoreReq.raise_for_status()
     creditScore = int(creditScoreReq.text)
-    current_span.set_attribute("credit.score", creditScore)
+    current_span.set_attribute("credit.score", creditScore)  # <--- ADDED BY WORKSHOP
 
     creditScoreCategory = getCreditCategoryFromScore(creditScore)
-    current_span.set_attribute("credit.score.category", creditScoreCategory)
+    current_span.set_attribute("credit.score.category", creditScoreCategory)  # <--- ADDED BY WORKSHOP
 
     # Run Credit Check
     creditCheckReq = requests.get("http://creditprocessorservice:8899/runCreditCheck?customernum=" + str(customerNum) + "&score=" + str(creditScore))
     creditCheckReq.raise_for_status()
     checkResult = str(creditCheckReq.text)
-    current_span.set_attribute("credit.check.result", checkResult)
+    current_span.set_attribute("credit.check.result", checkResult)  # <--- ADDED BY WORKSHOP
 
     return checkResult
 ````
@@ -91,7 +95,7 @@ Once these changes are made, let's run the following script to rebuild the Docke
 
 ## Confirm Tag is Captured Successfully
 
-After a few minutes, return to **Splunk Observability Cloud** and load one of the traces to confirm that the tags were captured successfully: 
+After a few minutes, return to **Splunk Observability Cloud** and load one of the latest traces to confirm that the tags were captured successfully (hint: sort by duration to find the latest traces): 
 
 **![Trace with Attributes](../images/trace_with_attributes.png)**
 
 
@@ -6,7 +6,7 @@ weight: 4
 
 {{% badge icon="clock" style="primary" %}}5 minutes{{% /badge %}}
 
-Now that we've captured several tags from our application, lets explore some of the trace data we've captured that include this additional context, and see if we can identify what's causing poor user experience in some cases. 
+Now that we've captured several tags from our application, let's explore some of the trace data we've captured that include this additional context, and see if we can identify what's causing poor user experience in some cases. 
 
 ## Use Trace Analyzer
 
@@ -26,6 +26,8 @@ Let's remove the credit score filter and toggle **Errors only** to on, which res
 
 Click on a few of these traces, and look at the tags we captured. Do you notice any patterns? 
 
-If you found a pattern - great job!  But keep in mind that this is a difficult way to troubleshoot, as it requires you to look through many traces and remember what you saw in each one to see if you can identify a pattern. 
+Next, toggle **Errors only** to off, and sort traces by duration.  Look at a few of the slowest running traces, and compare them to the fastest running traces.  Do you notice any patterns? 
+
+If you found a pattern that explains the slow performance and errors - great job!  But keep in mind that this is a difficult way to troubleshoot, as it requires you to look through many traces and mentally keep track of what you saw, so you can identify a pattern.  
 
 Thankfully, **Splunk Observability Cloud** provides a more efficient way to do this, which we'll explore next.  
@@ -11,7 +11,7 @@ To use advanced features in **Splunk Observability Cloud** such as **Tag Spotlig
 
 To do this, navigate to **Settings** -> **APM MetricSets**.  Then click the **+ New MetricSet** button.  
 
-Let's index the `credit.score.category` tag to start with by providing the following details: 
+Let's index the `credit.score.category` tag by entering the following details (**note**: since everyone in the workshop is using the same organization, the instructor will do this step on your behalf): 
 
 ![Create Troubleshooting MetricSet](../images/create_troubleshooting_metric_set.png)
 
@@ -29,37 +29,37 @@ Once analysis is complete, click on the checkmark in the **Actions** column.
 
 Why did we choose to index the `credit.score.category` tag and not the others? 
 
-To understand this, let's review the primary use cases for attributes:
+To understand this, let's review the primary use cases for tags:
 
 * Filtering
 * Grouping
 
 ### Filtering
 
-With the filtering use case, we can use the **Trace Analyzer** capability of **Splunk Observability Cloud** to filter on traces that match a particular attribute value.  
+With the filtering use case, we can use the **Trace Analyzer** capability of **Splunk Observability Cloud** to filter on traces that match a particular tag value.  
 
 We saw an example of this earlier, when we filtered on traces where the credit score started with "7". 
 
-Or if a customer called in to complain about slow service, we could use **Trace Analyzer** to locate all traces with that particular customer number. 
+Or if a customer calls in to complain about slow service, we could use **Trace Analyzer** to locate all traces with that particular customer number. 
 
-Attributes used for filtering use cases are generally high-cardinality, meaning that there could be thousands or even hundreds of thousands of unique values.  In fact, **Splunk Observability Cloud** can handle an effectively infinite number of unique attribute values!  Filtering using these attributes allows us to rapidly locate the traces of interest.
+Tags used for filtering use cases are generally high-cardinality, meaning that there could be thousands or even hundreds of thousands of unique values.  In fact, **Splunk Observability Cloud** can handle an effectively infinite number of unique tag values!  Filtering using these tags allows us to rapidly locate the traces of interest.
 
 Note that we aren't required to index tags to use them for filtering with **Trace Analyzer**. 
 
 ### Grouping
 
-With the grouping use case, we can surface trends for attributes that we collect using the powerful **Tag Spotlight** feature in **Splunk Observability Cloud**, which we'll see in action shortly.
+With the grouping use case, we can surface trends for tags that we collect using the powerful **Tag Spotlight** feature in **Splunk Observability Cloud**, which we'll see in action shortly.
 
-Attributes used for grouping use cases should be low to medium-cardinality, with hundreds of unique values. 
+Tags used for grouping use cases should be low to medium-cardinality, with hundreds of unique values. 
 
-For custom attributes to be used with **Tag Spotlight**, they first need to be indexed.
+For custom tags to be used with **Tag Spotlight**, they first need to be indexed.
 
 We decided to index the `credit.score.category` tag because it has a few distinct values that would be useful for grouping. In contrast, the customer number and credit score tags have hundreds or thousands of unique values, and are more valuable for filtering use cases rather than grouping. 
 
 ## Troubleshooting vs. Monitoring MetricSets 
 
-You may have noticed that, to index this tag, we created something called a **Troubleshooting MetricSet**. It's named this was because a Troubleshooting MetricSet, or TMS, allows us to troubleshoot issues with this tag using features such as **Tag Spotlight**. 
+You may have noticed that, to index this tag, we created something called a **Troubleshooting MetricSet**. It's named this way because a Troubleshooting MetricSet, or TMS, allows us to troubleshoot issues with this tag using features such as **Tag Spotlight**. 
 
-You may have also noticed that there's another option which we didn't choose called a **Monitoring MetricSet** (or MMS).  Monitoring MetricSets go beyond troubleshooting and allow us to use tags for alerting and dashboards.  We'll explore this later in the workshop. 
+You may have also noticed that there's another option which we didn't choose called a **Monitoring MetricSet** (or MMS).  Monitoring MetricSets go beyond troubleshooting and allow us to use tags for alerting and dashboards.  We'll explore this concept later in the workshop. 
 
 
@@ -20,19 +20,19 @@ With **Tag Spotlight**, we can see 100% of credit score requests that result in
 
 This illustrates the power of **Tag Spotlight**! Finding this pattern would be time-consuming without it, as we'd have to manually look through hundreds of traces to identify the pattern (and even then, there's no guarantee we'd find it). 
 
-We've looked at errors, but what about latency? Let's click on **Latency** near the top of the screen. 
+We've looked at errors, but what about latency? Let's click on **Latency** near the top of the screen to find out. 
 
 Here, we can see that the requests with a `poor` credit score request are running slowly, with P50, P90, and P99 times of around 3 seconds, which is too long for our users to wait, and much slower than other requests. 
 
-We can also see that some requests with an `exceptional` credit score request are running slowly, with P99 times of around 5 seconds, though the P50 and P90 response times are relatively quick.
+We can also see that some requests with an `exceptional` credit score request are running slowly, with P99 times of around 5 seconds, though the P50 response time is relatively quick.
 
 **![Tag Spotlight with Latency](../images/tag_spotlight_latency.png)**
 
 ## Using Dynamic Service Maps 
 
 Now that we know the credit score category associated with the request can impact performance and error rates, let's explore another feature that utilizes indexed tags: **Dynamic Service Maps**. 
 
-With Dynamic Service Maps, we can breakdown a particular service by an attribute. For example, let's click on **APM**, then click **Explore** to view the service map. 
+With Dynamic Service Maps, we can breakdown a particular service by a tag. For example, let's click on **APM**, then click **Explore** to view the service map. 
 
 Click on `creditcheckservice`. Then, on the right-hand menu, click on the drop-down that says **Breakdown**, and select the `credit.score.category` tag. 
 
@@ -44,14 +44,14 @@ This view makes it clear that performance for `good` and `fair` credit scores is
 
 ## Summary
 
-**Tag Spotlight** has uncovered several interesting patterns that we need to explore further: 
+**Tag Spotlight** has uncovered several interesting patterns for the engineers that own this service to explore further: 
 
 * Why are all the `impossible` credit score requests resulting in error?
 * Why are all the `poor` credit score requests running slowly?
 * Why do some of the `exceptional` requests run slowly?
 
-As an SRE, passing this context to the service owner would be extremely helpful for their investigation, as it would allow them to track down the issue much more quickly than if we only told them that the service was "sometimes slow". 
+As an SRE, passing this context to the engineering team would be extremely helpful for their investigation, as it would allow them to track down the issue much more quickly than if we simply told them that the service was "sometimes slow". 
 
 If you're curious, have a look at the source code for the `creditprocessorservice`. You'll see that requests with impossible, poor, and exceptional credit scores are handled differently, thus resulting in the differences in error rates and latency that we uncovered.
 
-The behavior we saw with our application is typical for modern cloud-native applications, where different inputs passed to a service lead to different code paths, some of which result in slower performance or errors. For example, in a real credit check service, requests resulting in low credit scores may be sent to another downstream service to further evaluate risk, and may perform more slowly than requests resulting in higher scores.  
+The behavior we saw with our application is typical for modern cloud-native applications, where different inputs passed to a service lead to different code paths, some of which result in slower performance or errors. For example, in a real credit check service, requests resulting in low credit scores may be sent to another downstream service to further evaluate risk, and may perform more slowly than requests resulting in higher scores, or encounter higher error rates.