|
| 1 | +--- |
| 2 | +title: "Observable Framework ❤️ Downdetector" |
| 3 | +date: 2025-12-19T11:55:58+01:00 |
| 4 | +slug: observable-framework-loves-downdetector |
| 5 | +pin: true |
| 6 | + |
| 7 | +resources: |
| 8 | + - src: "**.{png,jpg,webp}" |
| 9 | + title: "Image #:counter" |
| 10 | + |
| 11 | +tags: |
| 12 | + - Observable Framework |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | +summary: |
| 17 | + Incident reports are often static and lack the detail needed for deep analysis. |
| 18 | + What if you would use Observable Framework to create interactive, data-driven incident reports. |
| 19 | + By using customer impact data from DownDetector, you can build reproducible and interactive incident reports that clearly share your line of thinking. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## Reports for incident analysis |
| 24 | + |
| 25 | +When you face an incident it is good to make an incident report to see if you can find the root causes of it and describe actions to prevent it next time. |
| 26 | +Such a report has roughly the following structure: |
| 27 | +- What happened? |
| 28 | +- What has been done to mitigate the issue? |
| 29 | +- What can be done to prevent it from happening again? |
| 30 | + |
| 31 | +I have seen quite some reports where the first part was only a short description without details. |
| 32 | +Then you have to trust the authors of the report that the described improvement actions are sufficient to prevent it next time. |
| 33 | + |
| 34 | +## Describe 'what happened' in a 'scientific paper' approach |
| 35 | + |
| 36 | +In order to judge whether the actions taken are sufficient to prevent the incident from happening again, you have to analyze the incident data. |
| 37 | +The observability data is an excellent source for that, but that data is sometimes scattered across different tools and systems, not everyone has access and you have to interpret that data as well. |
| 38 | + |
| 39 | +During the analysis sessions I did, I captured this on a Confluence page. I stored screenshots and links to some dashboards. |
| 40 | +Next to that I wrote my observations and the relation between the captured screenshots, like I did in a blog post like this [one](../aocc-challenge-01-step-by-step). |
| 41 | + |
| 42 | +The reason why I take this approach is that it is **easy to reproduce the analysis** and I can **share my line of thinking** with others. |
| 43 | +And if others disagree or have a different view, or maybe they have more data, this document is a source to have a discussion. |
| 44 | +If you are able to understand all details of an incident together, you can better come to a conclusion and have better actions to prevent it next time. |
| 45 | + |
| 46 | +Capturing data on a page has some drawbacks: |
| 47 | +- Screenshots are not interactive |
| 48 | +- Dashboard links are fine, but the link is sometimes to the 'last 24 hours', which is not applicable if you read it later |
| 49 | +- Observability data might already be deleted due to retention policies |
| 50 | + |
| 51 | +## Interactive reports |
| 52 | + |
| 53 | +To share the analysis, you can create a report with captured observability data and which is interactive as well. |
| 54 | +[Observable Framework](https://observablehq.com/framework) provides a simple way to build interactive data reports. |
| 55 | + |
| 56 | +> **_What is Observable Framework?_** |
| 57 | +> |
| 58 | +> Create fast, beautiful data apps, dashboards, and reports from the command line. Write Markdown, JavaScript, SQL, Python, R… and any language you like. Free and open-source. |
| 59 | +> |
| 60 | +> _source:_ _[Observable Framework](https://observablehq.com/framework)_ |
| 61 | +
|
| 62 | + |
| 63 | +The engineers of Observable Framework created a framework to create reports and provided an example of that as well: |
| 64 | + |
| 65 | +See [Analyzing web logs](https://observablehq.observablehq.cloud/framework-example-api/) |
| 66 | + |
| 67 | +It is also very easy to get started: https://observablehq.com/framework/getting-started |
| 68 | +And with the Observable Framework, you can use the [Observable Plot library](https://observablehq.com/plot/) to easily create graphs |
| 69 | + |
| 70 | +## Observable Framework + DownDetector |
| 71 | + |
| 72 | +When an incident occurs, customers report problems pretty quickly. [DownDetector](https://downdetector.com) and [Allestoringen](https://allestoringen.nl) provide a nice overview of the problems. |
| 73 | +That data can be a starting point for your own incident report to indicate the customer impact. |
| 74 | + |
| 75 | +I created a [sample repository](https://github.com/cbos/observable-framework-and-allestoringen) with an example interactive report which is available at: |
| 76 | +[https://ceesbos.nl/observable-framework-and-allestoringen/](https://ceesbos.nl/observable-framework-and-allestoringen/) |
| 77 | + |
| 78 | +The example looks like this, including the instructions on how to use it: |
| 79 | + |
| 80 | + |
| 81 | +And a part of an actual report can look like this: |
| 82 | + |
| 83 | + |
| 84 | + |
| 85 | +## What is next? |
| 86 | + |
| 87 | +I have quite some ideas in mind how this can be used even more. I use Grafana a lot, and I see a lot of potential in combining Grafana and Observable Framework or Observable Notebook to create interactive incident reports. |
0 commit comments