Skip to content

Commit f05a1e4

Browse files
committed
spellings and first steps of test resilience
1 parent fce8d71 commit f05a1e4

File tree

2 files changed

+26
-42
lines changed

2 files changed

+26
-42
lines changed

content/en/ninja-workshops/10-advanced-otel/30-filelog/1-agent-filelog.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ linkTitle: 3.1 Agent Filelog Config
44
weight: 1
55
---
66

7-
### Upate the agent configuration
7+
### Update the agent configuration
88

99
Check that you are in the `[WORKSHOP]/3-filelog` folder. Open the `agent.yaml` copied across earlier and in your editor and add the `filelog` receiver to the `agent.yaml`.
1010

content/en/ninja-workshops/10-advanced-otel/40-resillience/1-test-resilience.md

Lines changed: 25 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,35 @@ title: Test your Resilience Setup
33
linkTitle: 4.1 Testing the Setup
44
weight: 1
55
---
6-
3. **Run the Collector:**
7-
Now, run the OpenTelemetry Collector using the configuration file you just created. You can do this by executing the following command in your terminal:
6+
7+
### Step 1: Setup Test environment
8+
9+
In this section we are going to simulate an out age on the network and see if our configuration helps the Collector recover from that issue:
10+
11+
1. **Run the Gateway**
12+
Now, run the OpenTelemetry Collector using the existing gateway configuration file. You can do this by executing the following command in your terminal:
813

914
```bash
10-
otelcol --config agent.yaml
15+
../otelcol --config gateway.yaml
1116
```
1217

13-
This will start the collector with the configurations specified in the YAML file.
18+
2. **Run the Agent**
19+
Next, run the OpenTelemetry Collector using the configuration file you just created. You can do this by executing the following command in your terminal:
20+
21+
```bash
22+
../otelcol --config agent.yaml
23+
```
24+
25+
This will start the collector with the resilience configurations specified in the YAML file.
26+
27+
3. **Run the log-gen script**
28+
To generate traffic, we going to start our Log generating script:
29+
30+
```bash
31+
./log-gen.sh
32+
```
1433

15-
### Step 4: Testing the Resilience
34+
### Step 2: Testing the Resilience
1635

1736
To test the resilience built into the system:
1837

@@ -28,43 +47,8 @@ To test the resilience built into the system:
2847
4. **Inspect Logs and Files:**
2948
Inspect the logs to see the retry attempts. The `debug` exporter will output detailed logs, which should show retry attempts and any failures.
3049

31-
### Step 5: Fine-Tuning the Configuration for Production
32-
33-
- **Timeouts and Interval Adjustments:**
34-
You may want to adjust the `retry_on_failure` parameters for different network environments. In high-latency environments, increasing the `max_interval` might reduce unnecessary retries.
35-
36-
```yaml
37-
retry_on_failure:
38-
enabled: true
39-
initial_interval: 1s
40-
max_interval: 5s
41-
max_elapsed_time: 20s
42-
```
43-
44-
- **Compaction and Transaction Size:**
45-
Depending on your use case, adjust the `max_transaction_size` for checkpoint compaction. A smaller transaction size will make checkpoint files more frequent but smaller, while a larger size might reduce disk I/O but require more memory.
46-
47-
### Step 6: Monitoring and Maintenance
48-
49-
- **Monitoring the Collector:**
50-
Use Prometheus or other monitoring tools to collect metrics from the OpenTelemetry Collector. You can monitor retries, the state of the sending queue, and other performance metrics to ensure the collector is behaving as expected.
51-
52-
- **Log Rotation:**
53-
The `file` exporter has a built-in log rotation mechanism to ensure that logs do not fill up your disk.
54-
55-
```yaml
56-
exporters:
57-
file:
58-
path: ./agent.out
59-
rotation:
60-
max_megabytes: 2
61-
max_backups: 2
62-
```
63-
64-
This configuration rotates the log file when it reaches 2 MB, and keeps up to two backups.
65-
6650
### Conclusion
6751

6852
In this section, you learned how to enhance the resilience of the OpenTelemetry Collector by configuring the `file_storage/checkpoint` extension, setting up retry mechanisms for the OTLP exporter, and using a sending queue backed by file storage for storing data during temporary failures.
6953

70-
By leveraging file storage for checkpointing and queue persistence, you can ensure that your telemetry pipeline can recover gracefully from failures, making it more reliable for production environments.
54+
By leveraging file storage for checkpointing and queue persistence, you can ensure that your telemetry pipeline can recover gracefully from failures, making it more reliable for short interruptions for production environments.

0 commit comments

Comments
 (0)