CogStack
diff --git a/‎docs/observability/customization/custom-dashboards.md‎
Lines changed: 3 additions & 2 deletions b/‎docs/observability/customization/custom-dashboards.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/observability/customization/custom-prometheus-configs.md‎
Lines changed: 2 additions & 4 deletions b/‎docs/observability/customization/custom-prometheus-configs.md‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎docs/observability/reference/_index.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/observability/reference/_index.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/observability/reference/concept-materials.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/observability/reference/concept-materials.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/observability/reference/project-details.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/observability/reference/project-details.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/observability/setup/_index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/observability/setup/_index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/observability/setup/alerting.md‎
Lines changed: 84 additions & 22 deletions b/‎docs/observability/setup/alerting.md‎
Lines changed: 84 additions & 22 deletions
diff --git a/‎docs/observability/setup/full-installation.md‎
Lines changed: 0 additions & 43 deletions b/‎docs/observability/setup/full-installation.md‎
Lines changed: 0 additions & 43 deletions
diff --git a/‎docs/observability/setup/probing.md‎
Lines changed: 122 additions & 21 deletions b/‎docs/observability/setup/probing.md‎
Lines changed: 122 additions & 21 deletions
@@ -1,8 +1,9 @@
 # Custom Dashboards
-//TODO
+You can setup custom dashboards as json files, and include them along with the defaults in this project. 
+
 Grafana is setup with preconfigured dashboards, datasource, and alerting. These will work when prometheus is run in this stack, and is dependent on all the metrics following defined rules. 
 
-it is advised that any edits or new configs get committed back into your git repository, and stick with grafana provisioning instead of allowing manual edits
+It is advised that any edits or new configs get committed back into your git repository, and stick with grafana provisioning instead of allowing manual edits.
 
 
 ## How to add a new dashboard with provisioning 
 
@@ -1,11 +1,9 @@
 # Custom Prometheus Configuration
-//TODO
-
 You can add compeltely custom prometheus scrape configs and recording rules by mounting in docker.
 
+- `site/prometheus/scrape-configs/*.yml`. This is for advanced configuration. 
 
-
-- `site/prometheus/scrape-configs/*.yml`. This is for advanced configuration. Any yml file put in this directory will be used as standard promethues scrape configs. This will give full flexibility over what metrics are collected and all features in prometheus. Add any further configs that you want prometheus to use.
+Any yml file put in this directory will be used as standard promethues scrape configs. This will give full flexibility over what metrics are collected and all features in prometheus. Add any further configs that you want prometheus to use.
 
 ```yaml
 # Custom scrape config definition
 
@@ -3,6 +3,7 @@
 ```{toctree}
 :maxdepth: 2
 
-quickstart-manual.md
+project-details.md
+concept-materials.md
 
 ```
@@ -0,0 +1,7 @@
+# Concepts
+```{toctree}
+:maxdepth: 2
+understanding-metrics.md
+
+```
+
@@ -0,0 +1,9 @@
+# Further Project Details 
+
+
+```{toctree}
+:maxdepth: 2
+quickstart-manual.md
+
+```
+
@@ -3,7 +3,7 @@
 ```{toctree}
 :maxdepth: 2
 
-full-installation.md
+production-setup.md
 probing.md
 telemetry.md
 alerting.md
 
@@ -1,42 +1,104 @@
 # Alerting
-//TODO
-By default, alerts are paused. The project is configured to easily send alerts to any Slack Webhook out of the box, but can be customized further. 
- 
-There are two sets of rules :
 
-- Basic alerts using uptime. If over 5m or 6h, if it drops below a certain percentage uptime, send an alert
-- Alerting on SLOs by using burn rates, for multi-window multi-rate alerts Google SRE - Prometheus Alerting: Turn SLOs into Alerts. 
+This guide explains how to enable and customize alerting in the CogStack observability stack using Grafana and Prometheus.
 
+By default, alerts are **paused**. The system is preconfigured to send alerts to a **Slack Webhook**, but this can be customized.
 
+There are two categories of alerting:
+
+* **Basic availability alerts**: Triggered when uptime falls below a threshold over short windows (5m or 6h).
+* **Burn rate alerts**: Using multi-window multi-rate alerts following best practices in [Google SRE principles](https://sre.google/workbook/alerting-on-slos/), used to track compliance with SLOs.
+
+---
 
 ## How to Enable Alerting
 
-### Define a SLO
-To enable the burn rate alerting feature, create prometheus recording rule file with the following contents.
+### 1. Define Your SLO
+
+To configure burn rate alerting, create a Prometheus recording rule to define your target SLO:
 
-```yaml
+```
 groups:
   - name:  slo-target-rules
     rules:
-      - record: slo_target_over_30_days # (Dont change)
-        expr: 0.95 # Mandatory - Specify the SLO you want to target, for example 0.95 for 95% uptime over 30 days
+      - record: slo_target_over_30_days
+        expr: 0.95
         labels:
-          job: "probe-cogstack-availability" #Mandatory - name the job, which must match the job in the probe targets defined
+          job: "probe-services"
+```
+
+* `expr`: Target SLO (e.g., `0.95` for 95% over 30 days)
+* `job`: Must match the probe job name defined in your configuration. This allows you to have different SLOs for different endpoints. 
+
+Place this file at:
+
+```
+prometheus/recording-rules/slo.yml
 ```
 
-In docker, mount the file in `site/prometheus/recording-rules/slo.yml`.
+This should be mounted in the docker container under `/etc/prometheus/cogstack/site/prometheus/recording-rules/slo.yml`, which should be already setup if you followed the setup instructions. 
+
+---
+
+### 2. Configure Alerting Environment
+
+Set these environment variables to control alerting behavior:
+
+```
+ALERTING_PAUSE_AVAILABILITY_5M=true
+ALERTING_PAUSE_AVAILABILITY_6H=true
+ALERTING_PAUSE_BURN_RATE=true
+SLACK_WEBHOOK_URL=https://hooks.slack.com/services/your-webhook
+```
+
+* Set any of the `ALERTING_PAUSE_*` variables to `false` to enable that alert type.
+* `SLACK_WEBHOOK_URL` should be set to a webhook, which will send any alerts to slack. 
+
+---
+
+## Advanced Customization
+### Customize Alert Contact points
 
-### Turn on alerting
-- Enable/Disable alerts using environment variables 
-- By default alerts will send to slack. Provide the env variable `SLACK_WEBHOOK_URL` to go there
+You can customize where alerts are sent by defining a new contact point in Grafana:
+
+```
+notifiers:
+  - name: "custom-contact"
+    type: "slack"
+    settings:
+      url: "https://hooks.slack.com/services/..."
+```
+
+Mount this file into:
+
+```
+/etc/grafana/provisioning/alerting/custom-contact.yml
+```
+
+Then update the environment variable:
+
+```
+ALERTING_DEFAULT_CONTACT=custom-contact
+```
+
+**Note** to be only mount the exact file, and not override the whole provisioning folder in the image, as this is already used to contain the defaults. 
+
+---
+
+### Add Custom Alerts
+To define additional alert rules, create files in:
+
+```
+/etc/grafana/provisioning/alerting/
+```
 
+Grafana will automatically load these at startup.
 
-## Configuration
+---
 
-Alerting is setup using Grafana Alerts. 
-- To change where the alerts are sent: create and mount custom a custom contact point in `/etc/grafana/provisioning/alerting/custom-contact.yml`. Then change the environment variable `ALERTING_DEFAULT_CONTACT` to use that name
-- Add custom alerts by mounting alert files in `/etc/grafana/provisioning/alerting/`.
+## Further Reading
 
-For more info see [Grafana Provisioning](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/)
+* [Grafana Alerting Provisioning](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/)
+* [Google SRE – Burn Rate Alerting](https://sre.google/workbook/alerting-on-slos/#4-alert-on-burn-rate)
 
-See [Google SRE Guide](https://sre.google/workbook/alerting-on-slos/#4-alert-on-burn-rate) which explains burn rate alerting. The alerting setup here follows the recommendations in the SRE handbook for Multiwindow, Multiburn rate alerting.
+Let me know if you'd like to split this into multiple focused guides, e.g., one for basic uptime, one for SLO-based alerts.
@@ -1,29 +1,130 @@
-# Availability Probing
-//TODO
+# Availability
 
-HTTP Probers are setup to scrape the real endpoints exposed by our services, and we can calculate a percentage uptime and latency based on those.
+This guide explains how to configure HTTP probers using Blackbox Exporter to monitor the availability of your services. These probers generate uptime and latency metrics, which can then be visualized in Grafana.
 
-See the [Reference](../reference/understanding-metrics.md) for more details. 
+See the [Reference](../reference/understanding-metrics.md) for an explanation of the metrics this generates.
 
+---
 
-## Adding Probers
-- `site/prometheus/scrape-configs/probers/*.yml`. 
-Add yaml files to this folder as probe targets. Any yml files put into this directory, for example "probe.example.yml", will be used as targets to probe for availability using blackbox exporter. Add any URLs that you want to measure the availability of. 
+## How to Add New Probers
 
-```yaml 
-# Prober yml
-- targets:
-    - https://google.com/something
-  labels:
-    name: google-homepage # Mandatory - the name of the service being probed
-    job: override_job # (Optional. Default is "probe-cogstack-availability") Customise a job to enable grouping in the dashboard
-    ip_address: "123.0.0.1" # (Optional) The IP address
-    host: a_hostname # (Optional) A readable hostname
-    custom_label: a_custom_label # (Optional)  Any other label
-    
+To add a new prober target:
+
+1. Navigate to the folder:
+
+   ```
+    prometheus/scrape-configs/probers/
+   ```
+
+2. Create a new YAML file (e.g., `probe.my-services.yml`) with the following structure:
+
+   ```
+    # probe.my-services.yml
+    - targets:
+        - https://myservice.example.com/health
+      labels:
+        name: my-service             # Mandatory - the name of the service being probed
+        job: my-services             # Mandatory - used to group probes in dashboards
+        ip_address: "10.0.0.12"      # Optional - IP of the host being probed
+        host: service-hostname       # Optional - Human-readable hostname
+        region: eu-west              # Optional - Any additional metadata label
+   ```
+
+3. Ensure the folder is mounted in docker under `/etc/prometheus/cogstack/site/prometheus/scrape-configs/probers`, which it should be by default if you've followed the setup guids. Any valid `.yml` files in this folder will be automatically picked up and used as Blackbox targets.
+
+---
+
+## Advanced Setup
+
+### How to add Auth to the prober or further configurations
+
+To define how a probe behaves (e.g., add basic auth, headers, timeout, method), we will configure a module in the Blackbox Exporter config.
+
+#### Create a Blackbox Exporter Config file
+You will need to create a new file, and then mount it over the existing provided vconfig
+
+
+1. Create a new file:
+
+   ```
+    prometheus/blackbox-exporter/custom-blackbox-config.yml
+   ```
+
+2. Add the existing defaults
+
+```  
+modules:
+    http_get_200:
+        prober: http
+        timeout: 5s
+        http:
+        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
+        valid_status_codes: [200]  # Defaults to 2xx
+        method: GET
+        preferred_ip_protocol: "ip4" # defaults to "ip6"
+        tls_config:
+            insecure_skip_verify: true
+```
+
+3. Add your own module to the modules in that file
 ```
+  http_2xx_custom:
+    prober: http
+    timeout: 5s
+    http:
+      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
+      valid_status_codes: [200]  # Defaults to 2xx
+      method: GET
+      preferred_ip_protocol: "ip4" # defaults to "ip6"
+      tls_config:
+        insecure_skip_verify: true
+      basic_auth:
+        username: my-user
+        password: example-pass
+```
+
+This example adds a module named `http_2xx_custom` that adds some basic auth credentials
+
+---
+
+#### Reference the new module in your prober config
+
+In your probe YAML file, reference the module in the `module` field of the `labels` section:
+
+```
+    - targets:
+        - https://myservice.example.com/health
+      labels:
+        name: my-service
+        module: http_2xx_custom      # Optional - overrides the default Blackbox module
+```
+
+#### Mount the config file
+You lastly need to mount the new config file and refer to it in docker compose
+
+```
+  blackbox-exporter:
+    image: cogstacksystems/cogstack-observability-blackbox-exporter:latest
+    restart: unless-stopped
+    networks:
+      - observability
+    volumes:
+      - ./prometheus/blackbox-exporter:/config
+    command:
+      - "--config.file=/config/custom-blackbox-config.yml" 
+```
+
+---
+
+## Notes
+
+* Changes will take effect on the next Prometheus reload or container restart.
+* Jobs with the same `job` label are grouped in dashboards to simplify analysis.
+* Job labels need to line up with defined SLOs to enable Alerting
+* Probers can be used for both external URLs, and direct to local docker containers. For example, we probe grafana on "cogstack-observability-grafana-1:3000/". If you want to probe local docker containers, note that the network has to line up
+
 
-## Configuring Probers
-- How to setup custom exporter module
-- How to use the module in my yml
+## External links
+For full Blackbox Exporter documentation, see:
 
+- [Prometheus Blackbox Exporter](https://github.com/prometheus/blackbox_exporter)