Discord-first watchdog skill for OpenClaw Gateway health monitoring, alerting, and safe recovery.
Tell OpenClaw: "Install the gateway-watchdog skill." The agent will handle the installation and configuration automatically.
If you prefer the terminal, run:
clawhub install gateway-watchdog- Monitors Gateway health with a state machine (
healthy,degraded,critical) - Sends Discord incident messages (
ALERT,RECOVERED) - Deduplicates noisy failures using threshold + cooldown
- Supports optional bounded auto-restart
- Runs via two paths:
- Internal OpenClaw cron (normal operations)
- External macOS LaunchAgent fallback (when Gateway is unhealthy)
- Cron path is the native OpenClaw scheduler route and easy to manage with
openclaw cron. - LaunchAgent path is a fallback that can still run even when OpenClaw Gateway scheduling is impaired.
Using both gives better resilience than cron-only monitoring.
SKILL.md- Agent-facing skill instructionsscripts/gateway-watchdog.sh- Core watchdog logicscripts/install-launchd.sh- LaunchAgent installer helperreferences/com.openclaw.gateway-watchdog.plist.template- launchd templatereferences/cron-agent-turn.md- Isolated cron prompt template
All runtime artifacts stay in:
~/.openclaw/watchdogs/gateway-discord/
Files:
state.json- current watchdog stateevents.jsonl- append-only event historybackups/state-*.json- rolling state backupsconfig.env- local deployment config (tokens/channel ids)
This avoids touching OpenClaw core files for normal watchdog operation.
The watchdog checks:
openclaw gateway status --jsonopenclaw health --json --timeout <ms>
Failure classes:
runtime_stoppedrpc_probe_failedhealth_unreachableauth_mismatchconfig_rewritten(baseline drift detected:openclaw.json!=openclaw.json.good)config_invalidgateway_check_failed
Alerting rules:
- Alert only after
GW_WATCHDOG_FAIL_THRESHOLDconsecutive failures - Suppress repeated alerts during
GW_WATCHDOG_COOLDOWN_SECONDS - Send
RECOVEREDonce when transitioning back to healthy - Alert body is human-readable (reason label + observed symptom + suggested action)
- OpenClaw CLI installed and available (
openclaw) - Python 3 available (
python3) - macOS (for LaunchAgent mode)
- Discord delivery config (webhook or bot token)
Priority order:
DISCORD_WEBHOOK_URLDISCORD_BOT_TOKEN+DISCORD_CHANNEL_ID
If webhook is set, webhook is used first. Otherwise bot API is used.
Run once manually:
bash "./scripts/gateway-watchdog.sh"Optional env:
export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."
export DISCORD_BOT_TOKEN="<your_discord_bot_token>"
export DISCORD_CHANNEL_ID="<your_discord_channel_id>"
export GW_WATCHDOG_SOURCE="manual"
export GW_WATCHDOG_FAIL_THRESHOLD=2
export GW_WATCHDOG_COOLDOWN_SECONDS=300Install and load with 30s interval:
bash "./scripts/install-launchd.sh" --interval 30 --loadCheck status:
launchctl list | grep "com.openclaw.gateway-watchdog"openclaw cron add \
--name "gateway-watchdog-internal" \
--cron "*/1 * * * *" \
--session isolated \
--message "Run bash /absolute/path/to/scripts/gateway-watchdog.sh. Announce only state changes." \
--announce \
--channel discord \
--to "channel:<your_channel_id>" \
--best-effort-deliverDisabled by default:
export GW_WATCHDOG_ENABLE_RESTART=0Enable bounded restart:
export GW_WATCHDOG_ENABLE_RESTART=1
export GW_WATCHDOG_MAX_RESTART_ATTEMPTS=2Safety behavior:
- restart only after threshold failures
- max attempts per incident window
- no reinstall behavior in this skill
Common variables:
GW_WATCHDOG_BASE_DIR(default:~/.openclaw/watchdogs/gateway-discord)GW_WATCHDOG_FAIL_THRESHOLD(default:2)GW_WATCHDOG_COOLDOWN_SECONDS(default:300)GW_WATCHDOG_HEALTH_TIMEOUT_MS(default:10000)GW_WATCHDOG_ENABLE_RESTART(default:0)GW_WATCHDOG_MAX_RESTART_ATTEMPTS(default:2)GW_WATCHDOG_KEEP_BACKUPS(default:50)GW_WATCHDOG_SOURCE(default:unknown)GW_WATCHDOG_CONFIG_FILE(default:~/.openclaw/openclaw.json)GW_WATCHDOG_CONFIG_BASELINE_FILE(default:~/.openclaw/openclaw.json.good)
Binary overrides:
OPENCLAW_BINPYTHON_BIN
Use this before production rollout:
- Syntax checks
bash -n scripts/gateway-watchdog.shbash -n scripts/install-launchd.sh
- Manual smoke run
GW_WATCHDOG_SOURCE=test bash scripts/gateway-watchdog.sh
- Discord delivery test
- verify one test message arrives in your target channel
- Failure test
- stop/impair Gateway and verify
ALERT
- stop/impair Gateway and verify
- Recovery test
- restore Gateway and verify
RECOVERED
- restore Gateway and verify
- No Discord messages
- Check
config.envvalues and token/channel correctness - Validate bot has permission to post in target channel
- Check
watchdog lock exists, exiting- Another run is active; this is expected for overlap protection
- Repeated suppressed events
- Cooldown/threshold is working; lower values for aggressive alerting
- Gateway healthy but still alerting
- Re-run
openclaw gateway status --jsonandopenclaw health --json - Ensure
OPENCLAW_BINresolves to the expected OpenClaw install
- Re-run
- Do not commit
config.env(contains credentials/ids in real deployments) - Use minimum required Discord permissions for the bot
- Prefer webhook mode for simple one-channel alerting
- Keep
GW_WATCHDOG_ENABLE_RESTART=0until you are confident in detection quality
This repository is structured for ClawHub publishing:
clawhub publish . \
--slug openclaw-gateway-watchdog-skill \
--name "Gateway Watchdog Discord" \
--version <x.y.z> \
--changelog "..."