Skip to content

Commit a20ff37

Browse files
committed
rewritten last resillence exercise
1 parent 411fb6d commit a20ff37

File tree

1 file changed

+26
-6
lines changed
  • content/en/ninja-workshops/10-advanced-otel/4-building-resilience

1 file changed

+26
-6
lines changed

content/en/ninja-workshops/10-advanced-otel/4-building-resilience/4-4-recovery.md

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,33 @@ In this exercise, we’ll test how the **OpenTelemetry Collector** recovers from
2020
../otelbin --config=gateway.yaml
2121
```
2222

23-
**Inspect the Agent logs**: Once the **Gateway** is up and running the **Agent** will resume sending data from the last checkpointed state, ensuring no data is lost. You should see the **Gateway** begin receiving the previously missed traces without requiring any additional action on your part.
24-
2523
{{% /notice %}}
26-
{{% notice title="Tip" style="primary" icon="lightbulb" %}}
27-
Note that only the **Gateway** will show that the checkpointed traces have arrived. The agent will not display any indication that data new or old has been sent.
24+
25+
After the **Agent** is up and running, the **File_Storage** extension will detect buffered data in the checkpoint folder.
26+
It will start to dequeue the stored spans from the last checkpoint folder, ensuring no data is lost.
27+
28+
{{% notice title="Exercise" style="green" icon="running" %}}
29+
30+
**Verify the Agent Debug output**
31+
Note that the Agent Debug Screen does **NOT** change and still shows the following line indicating no new data is being exported.
32+
33+
```text
34+
2025-02-07T13:40:12.195+0100 info [email protected]/service.go:253 Everything is ready. Begin running and processing data.
35+
```
36+
37+
**Watch the Gateway Debug output**
38+
You should see from the **Gateway** debug screen, it has started receiving the previously missed traces without requiring any additional action on your part.
39+
40+
```txt
41+
2025-02-07T12:44:32.651+0100 info [email protected]/service.go:253 Everything is ready. Begin running and processing data.
42+
2025-02-07T12:47:46.721+0100 info Traces {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 4, "spans": 4}
43+
2025-02-07T12:47:46.721+0100 info ResourceSpans #0
44+
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
45+
Resource attributes:
46+
```
47+
48+
**Check the `gateway-traces.out` file**
49+
Count the number of traces in the recreated `./gateway-traces.out`. It should match the number you send when the **Gateway** was down
2850
{{% /notice %}}
2951

3052
### Conclusion
@@ -33,6 +55,4 @@ This exercise demonstrated how to enhance the resilience of the OpenTelemetry Co
3355

3456
By implementing file-based checkpointing and queue persistence, you ensure the telemetry pipeline can gracefully recover from temporary interruptions, making it a more robust and reliable for production environments.
3557

36-
If you want to know more about the `FileStorage` extension, you can find it [**here**](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage).
37-
3858
Stop the **Agent** and **Gateway** using `Ctrl-C`.

0 commit comments

Comments
 (0)