You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/ninja-workshops/10-advanced-otel/4-building-resilience/4-4-recovery.md
+26-6Lines changed: 26 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,11 +20,33 @@ In this exercise, we’ll test how the **OpenTelemetry Collector** recovers from
20
20
../otelbin --config=gateway.yaml
21
21
```
22
22
23
-
**Inspect the Agent logs**: Once the **Gateway** is up and running the **Agent** will resume sending data from the last checkpointed state, ensuring no data is lost. You should see the **Gateway** begin receiving the previously missed traces without requiring any additional action on your part.
Note that only the **Gateway** will show that the checkpointed traces have arrived. The agent will not display any indication that data new or old has been sent.
24
+
25
+
After the **Agent** is up and running, the **File_Storage** extension will detect buffered data in the checkpoint folder.
26
+
It will start to dequeue the stored spans from the last checkpoint folder, ensuring no data is lost.
Note that the Agent Debug Screen does **NOT** change and still shows the following line indicating no new data is being exported.
32
+
33
+
```text
34
+
2025-02-07T13:40:12.195+0100 info [email protected]/service.go:253 Everything is ready. Begin running and processing data.
35
+
```
36
+
37
+
**Watch the Gateway Debug output**
38
+
You should see from the **Gateway** debug screen, it has started receiving the previously missed traces without requiring any additional action on your part.
39
+
40
+
```txt
41
+
2025-02-07T12:44:32.651+0100 info [email protected]/service.go:253 Everything is ready. Begin running and processing data.
Count the number of traces in the recreated `./gateway-traces.out`. It should match the number you send when the **Gateway** was down
28
50
{{% /notice %}}
29
51
30
52
### Conclusion
@@ -33,6 +55,4 @@ This exercise demonstrated how to enhance the resilience of the OpenTelemetry Co
33
55
34
56
By implementing file-based checkpointing and queue persistence, you ensure the telemetry pipeline can gracefully recover from temporary interruptions, making it a more robust and reliable for production environments.
35
57
36
-
If you want to know more about the `FileStorage` extension, you can find it [**here**](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage).
37
-
38
58
Stop the **Agent** and **Gateway** using `Ctrl-C`.
0 commit comments