up

buixor · buixor · commit 2f91b592d5b1 · 2025-11-05T14:28:11.000+01:00
diff --git a/crowdsec-docs/unversioned/troubleshooting/log_processor_offline.md b/crowdsec-docs/unversioned/troubleshooting/log_processor_offline.md
@@ -0,0 +1,142 @@
+---
+title: Log Processor Offline
+id: log_processor_offline
+---
+
+When the Console or a notification rule reports **Log Processor Offline**, the local agent has not checked in with the Local API (LAPI) for more than 24 hours. The alert is different from **Log Processor No Alert**, which only means logs were parsed but no scenarios fired. Use the sections below to identify why the heartbeat stopped and how to bring the agent back online.
+
+## Common Root Causes & Diagnostics
+
+### Service stopped or stuck
+
+- Confirm the service state on the host:
+
+```bash
+sudo systemctl status crowdsec
+sudo journalctl -u crowdsec -n 50
+```
+
+- For containerised deployments, verify the workload is still running:
+
+```bash
+docker ps --filter name=crowdsec
+kubectl get pods -n crowdsec
+```
+
+- On the LAPI node, run `sudo cscli machines list` and check whether the `Last Update` column is older than 24 hours for the affected machine.
+
+### Machine not validated or credentials revoked
+
+- `sudo cscli machines list` on the LAPI shows the machine in `PENDING` state or missing entirely.
+- On the agent host, ensure `/etc/crowdsec/local_api_credentials.yaml` exists and contains the expected login and password.
+- If you recently reinstalled or renamed the machine, it must be re-validated. See [Machines management](/u/user_guides/machines_mgmt) for details.
+
+### Local API unreachable
+
+- From the agent, run:
+
+```bash
+sudo cscli lapi status
+```
+
+  Errors such as `401 Unauthorized`, TLS failures, or connection timeouts indicate an authentication or network issue.
+
+- Verify the API endpoint declared in `/etc/crowdsec/config.yaml` (`api.client.credentials_path`, `url`, `ca_cert`, `insecure_skip_verify`) matches your LAPI setup. Refer to [Local API configuration](/docs/local_api/configuration) and [TLS authentication](/docs/local_api/tls_auth) if certificates changed.
+- Confirm the network path between the agent and the LAPI host is open (default port `8080/TCP`). Firewalls or reverse proxies introduced after installation commonly block the heartbeat.
+
+### Local API unavailable
+
+- If several agents show as offline simultaneously, the LAPI service might be down. Check its status on the LAPI machine:
+
+```bash
+sudo systemctl status crowdsec
+sudo journalctl -u crowdsec -n 50
+```
+
+- Inspect `/var/log/crowdsec/` (or container logs) for database or authentication errors that prevent the LAPI from responding.
+- Use `sudo cscli metrics show engine` on the LAPI to confirm it is still ingesting events from other agents. See the [Health Check guide](/u/getting_started/health_check) for additional diagnostics.
+
+## Recovery Actions
+
+### Restart the Log Processor service
+
+- Systemd:
+
+```bash
+sudo systemctl restart crowdsec
+```
+
+- Docker:
+
+```bash
+docker restart crowdsec
+```
+
+- Kubernetes:
+
+```bash
+kubectl rollout restart deployment/crowdsec -n crowdsec
+```
+
+After the restart, re-run `sudo cscli machines list` on the LAPI to confirm the `Last Update` timestamp is refreshed.
+
+### Validate or re-register the machine
+
+#### Using credentials
+
+:::info
+More suitable for single machine setups.
+:::
+
+- To regenerate credentials directly on the LAPI host when the agent runs locally, run:
+
+```bash
+sudo cscli machines add -a
+```
+
+#### Using registration system
+
+:::info
+Registration system is more suitable for distributed setups.
+:::
+
+
+
+- Approve pending machines on the LAPI:
+
+```bash
+sudo cscli machines validate <machine_name>
+```
+
+- If credentials were removed or the agent was rebuilt, re-register it against the LAPI:
+
+```bash
+sudo cscli lapi register --url http://<lapi_host>:8080 --machine <machine_name>
+sudo systemctl restart crowdsec
+```
+
+Update the `--url` to match your deployment. Auto-registration tokens are covered in [Machines management](/u/user_guides/machines_mgmt#machine-auto-validation).
+
+### Restore connectivity to the Local API
+
+- Open the required port on firewalls or security groups and verify with:
+
+```bash
+nc -zv <lapi_host> 8080
+```
+
+- If TLS certificates were renewed, update the agent trust store (`ca_cert`) or temporarily enable `insecure_skip_verify: true` for testing. Follow the hardening recommendations in [TLS authentication](/docs/local_api/tls_auth).
+- When using proxies or load balancers, ensure they forward HTTP headers and TLS material expected by the LAPI.
+
+### Stabilise the Local API
+
+- Restart the LAPI service or pod if it was unresponsive:
+
+```bash
+sudo systemctl restart crowdsec
+kubectl rollout restart deployment/crowdsec-lapi -n crowdsec
+```
+
+- Run `sudo cscli support dump` to collect diagnostics if the LAPI repeatedly crashes or loses database access. Review the resulting archive for database connectivity errors and consult the [Security Engine troubleshooting guide](/u/troubleshooting/security_engine) when escalation is required.
+
+Once the heartbeat is restored, the Console alert clears automatically during the next polling cycle. Consider adding a [notification rule](/u/console/notification_integrations/rule) for **Log Processor Offline** so you are alerted promptly when it happens again.
diff --git a/crowdsec-docs/unversioned/troubleshooting/security_engine_offline.md b/crowdsec-docs/unversioned/troubleshooting/security_engine_offline.md
@@ -0,0 +1,99 @@
+---
+title: Security Engine Offline
+id: security_engine_offline
+---
+
+The **Security Engine Offline** alert appears in the Console and notification integrations when an enrolled engine has not reported or logged in to CrowdSec for more than 48 hours. This usually means the core `crowdsec` service (Log Processor + Local API) has stopped working or communicating with our infrastructure.
+
+## Common Root Causes & Diagnostics
+
+### Host or service down
+
+- Check that the `crowdsec` service is running:
+
+```bash
+sudo systemctl status crowdsec
+sudo journalctl -u crowdsec -n 50
+```
+
+- For container or Kubernetes deployments, confirm the workload is still healthy:
+
+```bash
+docker ps --filter name=crowdsec
+kubectl get pods -n crowdsec
+```
+
+- If the host itself is unreachable (hypervisor, VM, or cloud instance down), the Console cannot receive a heartbeat and marks the engine offline.
+
+### Enrollment revoked or pending
+
+- On the engine, run `sudo cscli console status` to verify it is still enrolled and accepted.
+- In the Console, visit **Security Engines** and confirm the engine is not archived or removed. Follow [Pending Security Engines](/u/console/security_engines/pending_security_engines) if it shows as waiting for approval.
+- Review `/etc/crowdsec/console.yaml` for disabled options (`console_management`, `custom`, `tainted`, `context`) that may prevent expected data from being sent.
+
+### Console connectivity issues
+
+- `sudo cscli console status` may show errors such as `permission denied`, `unable to reach console`, or TLS failures. Inspect `/var/log/crowdsec/crowdsec.log` (or container stdout) for more details.
+- Ensure outbound access to the CrowdSec Console endpoints listed in [Network management](/docs/configuration/network_management). Firewalls or proxy changes often block the HTTPS calls required for heartbeats.
+- Verify system time is synced (via NTP). Large clock drifts can invalidate console tokens.
+
+### Local API unavailable
+
+- If the Local API is stopped, the Security Engine cannot gather or forward alerts. Check its status on the same host:
+
+  ```bash
+  sudo cscli machines list
+  sudo cscli metrics show engine
+  ```
+
+- Errors in `/var/log/crowdsec/local_api.log` regarding database connectivity or TLS indicate the Local API is not processing alerts, which will in turn stop console updates. Refer to [Security Engine troubleshooting](/u/troubleshooting/security_engine) and [Log Processor Offline](/u/troubleshooting/log_processor_offline) if needed.
+
+## Recovery Actions
+
+### Restart the Security Engine service
+
+- Systemd:
+
+  ```bash
+  sudo systemctl restart crowdsec
+  ```
+
+- Docker:
+
+  ```bash
+  docker restart crowdsec
+  ```
+
+- Kubernetes:
+
+  ```bash
+  kubectl rollout restart deployment/crowdsec -n crowdsec
+  ```
+
+After restarting, re-run `sudo cscli console status` to ensure the heartbeat is restored.
+
+### Re-enroll the engine in the Console
+
+- If the engine was removed or enrollment expired, obtain a fresh key from **Settings > Enrollment** in the Console and run:
+
+  ```bash
+  sudo cscli console enroll <ENROLLMENT_KEY>
+  sudo systemctl restart crowdsec
+  ```
+
+- When replacing an existing enrollment, append `--overwrite` so the Console updates the existing record.
+- Confirm the engine appears as **Healthy** in the Console after the restart.
+
+### Restore connectivity to the Console
+
+- Check that you can access crowdsec services and APIs listed in [network management](https://doc.crowdsec.net/docs/next/configuration/network_management/)
+- If a proxy is required, configure it in `/etc/crowdsec/config.yaml` under `common.http_proxies` and reload the service.
+- Renew TLS trust stores if the host cannot validate the Console certificate chain.
+
+### Stabilise the Local API
+
+- Restart the Local API component (same `crowdsec` service or the dedicated LAPI pod) and confirm it responds to local commands:
+
+- Investigate persistent database or authentication errors using `sudo cscli support dump`, then consult the [Security Engine troubleshooting guide](/u/troubleshooting/security_engine) if issues remain.
+
+Once the engine resumes contact, the Console clears the **Security Engine Offline** alert during the next poll. Consider enabling the **Security Engine Offline** notification in your preferred integration so future outages are caught quickly.