Security tools are systematically false-positived by the moderation pipeline

## Problem

ClawHub's moderation pipeline (static scanner + LLM evaluator) treats security tools identically to potentially malicious skills. Security tools -- scanners, audit tools, IOC databases, threat detectors -- **legitimately contain patterns** that the static scanner flags as suspicious:

- **Shell execution** (`child_process`) -- security scanners run commands to audit configurations, check file integrity, inspect network state
- **Credential access** (`process.env` + network) -- credential exposure scanners check environment variables specifically to detect leaked secrets
- **Encoded payloads** (base64, hex sequences) -- IOC databases contain malware signatures and encoded samples as detection references
- **File read + network send** -- reporting scan results to a dashboard or output file is core scanner functionality
- **Non-standard ports** -- network scanning tools probe ports outside 80/443

The result: any genuinely useful security tool will trigger multiple `suspicious.*` reason codes and get flagged or hidden, regardless of whether its behaviour is coherent with its stated purpose. There is currently no mechanism for the moderation pipeline to distinguish between a weather app that harvests credentials (incoherent, actually suspicious) and a security scanner that checks for exposed credentials (coherent, expected behaviour).

The LLM evaluator has no concept of a "security tool" category, so it evaluates these patterns without the context that would make them coherent.

This is not a theoretical concern -- we encountered this directly when publishing agent security scanning tools.

## Impact

- Blocks an entire category of useful skills from the ecosystem
- Forces security tool authors to remove the very patterns that make their tools effective, or to never publish at all
- As the OpenClaw ecosystem grows, the demand for security tooling will increase (the ClawHavoc incidents demonstrated this clearly)

## Proposed solution

A **skill category system** where publishers declare their skill's purpose via frontmatter metadata:

```yaml
metadata:
  openclaw:
    category: security
```

When declared:
1. The static scanner contextualises expected patterns (re-prefixes `suspicious.*` to `info.security_context.*`) so they don't trigger false-positive verdicts
2. The LLM evaluator receives the category and assesses coherence through the appropriate lens
3. Malicious codes (`crypto_mining`, `install_terminal_payload`, `known_blocked_signature`) are **never** suppressed regardless of category
4. Declaring a false category is treated as misdirection evidence by the LLM -- makes the skill *more* suspicious, not less

The category system is extensible for future use cases beyond security.

We have a PR ready: Caelguard/clawhub#feat/security-tool-category -- 26 new tests, all 846 existing tests pass, zero lint issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Security tools are systematically false-positived by the moderation pipeline #1131

Problem

Impact

Proposed solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Security tools are systematically false-positived by the moderation pipeline #1131

Description

Problem

Impact

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions