-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Problem
ClawHub's moderation pipeline (static scanner + LLM evaluator) treats security tools identically to potentially malicious skills. Security tools -- scanners, audit tools, IOC databases, threat detectors -- legitimately contain patterns that the static scanner flags as suspicious:
- Shell execution (
child_process) -- security scanners run commands to audit configurations, check file integrity, inspect network state - Credential access (
process.env+ network) -- credential exposure scanners check environment variables specifically to detect leaked secrets - Encoded payloads (base64, hex sequences) -- IOC databases contain malware signatures and encoded samples as detection references
- File read + network send -- reporting scan results to a dashboard or output file is core scanner functionality
- Non-standard ports -- network scanning tools probe ports outside 80/443
The result: any genuinely useful security tool will trigger multiple suspicious.* reason codes and get flagged or hidden, regardless of whether its behaviour is coherent with its stated purpose. There is currently no mechanism for the moderation pipeline to distinguish between a weather app that harvests credentials (incoherent, actually suspicious) and a security scanner that checks for exposed credentials (coherent, expected behaviour).
The LLM evaluator has no concept of a "security tool" category, so it evaluates these patterns without the context that would make them coherent.
This is not a theoretical concern -- we encountered this directly when publishing agent security scanning tools.
Impact
- Blocks an entire category of useful skills from the ecosystem
- Forces security tool authors to remove the very patterns that make their tools effective, or to never publish at all
- As the OpenClaw ecosystem grows, the demand for security tooling will increase (the ClawHavoc incidents demonstrated this clearly)
Proposed solution
A skill category system where publishers declare their skill's purpose via frontmatter metadata:
metadata:
openclaw:
category: securityWhen declared:
- The static scanner contextualises expected patterns (re-prefixes
suspicious.*toinfo.security_context.*) so they don't trigger false-positive verdicts - The LLM evaluator receives the category and assesses coherence through the appropriate lens
- Malicious codes (
crypto_mining,install_terminal_payload,known_blocked_signature) are never suppressed regardless of category - Declaring a false category is treated as misdirection evidence by the LLM -- makes the skill more suspicious, not less
The category system is extensible for future use cases beyond security.
We have a PR ready: Caelguard/clawhub#feat/security-tool-category -- 26 new tests, all 846 existing tests pass, zero lint issues.