Skip to content

Commit 83bc6b4

Browse files
committed
Section 2 updates
1 parent 2d28c35 commit 83bc6b4

File tree

5 files changed

+35
-48
lines changed

5 files changed

+35
-48
lines changed

content/en/conf/1-advanced-collector/1-agent-gateway/1-2-send-metrics.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,4 +93,7 @@ jq '.resourceMetrics[].scopeMetrics[].metrics[] | select(.name == "system.cpu.ti
9393
{{% /tab %}}
9494
{{% /tabs %}}
9595

96+
> [!IMPORTANT]
97+
> Stop the **Agent** and the **Gateway** processes by pressing `Ctrl-C` in their respective terminals.
98+
9699
{{% /notice %}}

content/en/conf/1-advanced-collector/2-building-resilience/2-1-configuration.md

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,18 @@ While these components do not process telemetry data directly, they provide valu
1010

1111
{{% notice title="Exercise" style="green" icon="running" %}}
1212

13-
**Update the `agent.yaml`**: In the **Agent terminal** window, add the `file_storage` extension and name it `checkpoint`:
13+
> [!IMPORTANT]
14+
> **Change _ALL_ terminal windows to the `2-building-resilience` directory and run the `clear` command.**
15+
16+
Your directory structure will look like this:
17+
18+
```text { title="Updated Directory Structure" }
19+
.
20+
├── agent.yaml
21+
└── gateway.yaml
22+
```
23+
24+
**Update the `agent.yaml`**: In the **Agent terminal** window, add the `file_storage` extension under the existing `health_check` extension:
1425

1526
```yaml
1627
file_storage/checkpoint: # Extension Type/Name
@@ -24,11 +35,9 @@ While these components do not process telemetry data directly, they provide valu
2435
max_transaction_size: 65536 # Max. size limit before compaction occurs
2536
```
2637
27-
**Add `file_storage` to existing `otlphttp` exporter**: Modify the `otlphttp:` exporter to configure retry and queuing mechanisms, ensuring data is retained and resent if failures occur:
38+
**Add `file_storage` to the exporter**: Modify the `otlphttp` exporter to configure retry and queuing mechanisms, ensuring data is retained and resent if failures occur. Add the following under the `endpoint: "http://localhost:5318"` and make sure the indentation matches `endpoint`:
2839

2940
```yaml
30-
otlphttp:
31-
endpoint: "http://localhost:5318"
3241
retry_on_failure:
3342
enabled: true # Enable retry on failure
3443
sending_queue: #
@@ -38,7 +47,7 @@ While these components do not process telemetry data directly, they provide valu
3847
storage: file_storage/checkpoint # File storage extension
3948
```
4049

41-
**Update the `services` section**: Add the `file_storage/checkpoint` extension to the existing `extensions:` section. This will cause the extension to be enabled:
50+
**Update the `services` section**: Add the `file_storage/checkpoint` extension to the existing `extensions:` section and the configuration needs to look like this:
4251

4352
```yaml
4453
service:
@@ -47,18 +56,18 @@ service:
4756
- file_storage/checkpoint # Enabled extensions for this collector
4857
```
4958

50-
**Update the `metrics` pipeline**: For this exercise we are going to comment out the `hostmetrics` receiver from the Metric pipeline to reduce debug and log noise:
59+
**Update the `metrics` pipeline**: For this exercise we are going to comment out the `hostmetrics` receiver from the Metric pipeline to reduce debug and log noise, again the configuration needs to look like this:
5160

5261
```yaml
5362
metrics:
5463
receivers:
64+
# - hostmetrics # Hostmetric reciever (cpu only)
5565
- otlp
56-
# - hostmetrics # Hostmetrics Receiver
5766
```
5867

5968
{{% /notice %}}
6069

61-
Validate the **Agent** configuration using **[otelbin.io](https://www.otelbin.io/)**. For reference, the `metrics:` section of your pipelines will look similar to this:
70+
<!-- Validate the **Agent** configuration using **[otelbin.io](https://www.otelbin.io/)**. For reference, the `metrics:` section of your pipelines will look similar to this:
6271

6372
```mermaid
6473
%%{init:{"fontFamily":"monospace"}}%%
@@ -76,16 +85,16 @@ graph LR
7685
subgraph " "
7786
subgraph subID1[**Metrics**]
7887
direction LR
79-
REC1 --> PRO1
80-
PRO1 --> PRO2
81-
PRO2 --> PRO3
82-
PRO3 --> PRO4
83-
PRO4 --> EXP1
84-
PRO4 --> EXP2
88+
REC1 -- > PRO1
89+
PRO1 -- > PRO2
90+
PRO2 -- > PRO3
91+
PRO3 -- > PRO4
92+
PRO4 -- > EXP1
93+
PRO4 -- > EXP2
8594
end
8695
end
8796
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
8897
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
8998
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
9099
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;
91-
```
100+
``` -->

content/en/conf/1-advanced-collector/2-building-resilience/2-2-test-environment.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,25 @@ Next, we will configure our environment to be ready for testing the **File Stora
88

99
{{% notice title="Exercise" style="green" icon="running" %}}
1010

11-
**Start the Gateway**: In the **Gateway terminal** window navigate to the `[WORKSHOP]/4-resilience` directory and run:
11+
**Start the Gateway**: In the **Gateway terminal** window run:
1212

1313
```bash { title="Start the Gateway" }
1414
../otelcol --config=gateway.yaml
1515
```
1616

17-
**Start the Agent**: In the **Agent terminal** window navigate to the `[WORKSHOP]/4-resilience` directory and run:
17+
**Start the Agent**: In the **Agent terminal** window run:
1818

1919
```bash { title="Start the Agent" }
2020
../otelcol --config=agent.yaml
2121
```
2222

23-
**Send five test spans**: In the **Loadgen terminal** window navigate to the `[WORKSHOP]/4-resilience` directory and run:
23+
**Send five test spans**: In the **Loadgen terminal** window run:
2424

2525
```bash { title="Start Load Generator" }
2626
../loadgen -count 5
2727
```
2828

2929
Both the **Agent** and **Gateway** should display debug logs, and the **Gateway** should create a `./gateway-traces.out` file.
3030

31-
{{% /notice %}}
32-
3331
If everything functions correctly, we can proceed with testing system resilience.
32+
{{% /notice %}}

content/en/conf/1-advanced-collector/2-building-resilience/2-3-failure.md

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,12 @@ weight: 3
66

77
To assess the **Agent's** resilience, we'll simulate a temporary **Gateway** outage and observe how the **Agent** handles it:
88

9-
**Summary**:
10-
11-
1. **Send Traces to the Agent** – Generate traffic by sending traces to the **Agent**.
12-
2. **Stop the Gateway** – This will trigger the **Agent** to enter retry mode.
13-
3. **Restart the Gateway** – The **Agent** will recover traces from its persistent queue and forward them successfully. Without the persistent queue, these traces would have been lost permanently.
14-
159
{{% notice title="Exercise" style="green" icon="running" %}}
1610

17-
**Simulate a network failure**: In the **Gateway terminal** stop the **Gateway** with `Ctrl-C` and wait until the gateway console shows that it has stopped:
11+
**Simulate a network failure**: In the **Gateway terminal** stop the **Gateway** with `Ctrl-C` and wait until the gateway console shows that it has stopped. The **Agent** will continue running, but it will not be able to send data to the gateway. The output in the **Gateway terminal** should look similar to this:
1812

1913
```text
20-
2025-01-28T13:24:32.785+0100 info service@v0.120.0/service.go:309 Shutdown complete.
14+
2025-07-09T10:22:37.941Z info service@v0.126.0/service.go:345 Shutdown complete. {"resource": {}}
2115
```
2216

2317
**Send traces**: In the **Loadgen terminal** window send five more traces using the `loadgen`.
@@ -31,16 +25,13 @@ Notice that the agent’s retry mechanism is activated as it continuously attemp
3125
**Stop the Agent**: In the **Agent terminal** window, use `Ctrl-C` to stop the agent. Wait until the agent’s console confirms it has stopped:
3226

3327
```text
34-
2025-01-28T14:40:28.702+0100 info extensions/extensions.go:66 Stopping extensions...
35-
2025-01-28T14:40:28.702+0100 info [email protected]/service.go:309 Shutdown complete.
28+
2025-07-09T10:25:59.344Z info [email protected]/service.go:345 Shutdown complete. {"resource": {}}
3629
```
3730

3831
{{% /notice %}}
3932

40-
{{% notice title="Tip" style="primary" icon="lightbulb" %}}
41-
Stopping the agent will halt its retry attempts and prevent any future retry activity.
33+
By stopping the agent will halt its retry attempts and prevent any future retry activity.
4234

4335
If the agent runs for too long without successfully delivering data, it may begin dropping traces, depending on the retry configuration, to conserve memory. By stopping the agent, any metrics, traces, or logs currently stored in memory are lost before being dropped, ensuring they remain available for recovery.
4436

4537
This step is essential for clearly observing the recovery process when the agent is restarted.
46-
{{% /notice %}}

content/en/conf/1-advanced-collector/2-building-resilience/_index.md

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,3 @@ This solution will work for metrics as long as the connection downtime is brief
1616
For logs, there are plans to implement a more enterprise-ready solution in one of the upcoming Splunk OpenTelemetry Collector releases.
1717

1818
{{% /notice %}}
19-
20-
{{% notice title="Exercise" style="green" icon="running" %}}
21-
22-
> [!IMPORTANT]
23-
> **Change _ALL_ terminal windows to the `[WORKSHOP]/2-building-resilience` directory.**
24-
25-
Your directory structure will look like this:
26-
27-
```text { title="Updated Directory Structure" }
28-
.
29-
├── agent.yaml
30-
└── gateway.yaml
31-
```
32-
33-
{{% /notice %}}

0 commit comments

Comments
 (0)