The current executor is single-shot: one Claude API call produces a Python script, which is blindly written and executed. The agent cannot react to failures, install missing tools, or learn from runtime feedback.
The goal is to upgrade the executor into a closed-loop agentic system that follows an OODA cycle: Attempt → Fail → Diagnose → Provision → Learn → Retry.
| Area | Current | Target |
|---|---|---|
| Claude interaction | Single messages.create() call, text-only response |
Multi-turn agentic loop with tool_use stop reason |
| MCP tools | write_file, execute_script (Python files only) |
+ execute_shell_cmd, manage_packages |
| Error recovery | None — script output is returned as-is | Agent observes errors, diagnoses, provisions, retries |
| Skill memory | None | Persistent .md skill files, indexed at startup, injected into system prompt |
| Tool installation | Static (baked into Dockerfile) | Dynamic apt-get install at runtime via MCP |
| Docker persistence | Scripts and logs only | + apt cache and installed binaries via named volumes |
┌─────────────────────────────────────────┐
│ Agentic Loop (executor.ts) │
│ │
User task ──► Claude API call ◄──────────────────────┐ │
│ │ │
▼ │ │
stop_reason? │ │
┌─────────┐ │ │
│end_turn │──► return final text │ │
├─────────┤ │ │
│tool_use │──► dispatch tool call ──► result ─────────┘
└─────────┘ │ │
▼ │
┌──── Tool Router ────┐ │
│ │ │
MCP Tools Host-Local Tools │
(Kali container) (Node.js process) │
├─ execute_shell_cmd ├─ save_new_skill │
├─ write_file ├─ read_skill_file │
├─ execute_script └─ list_skills │
└─ manage_packages │
│
Skill Library (./skills/*.md) ◄──────────┘
indexed at startup, summary
injected into system prompt
Add two new tools to the Kali container's MCP server.
Runs an arbitrary shell command inside the Kali container. This is the foundation for the OODA loop — the agent needs to run ad-hoc commands (not just Python scripts).
- Runs via subprocess with shell=True
- Same timeout (120s) and output truncation (4000 chars) as execute_script
- Returns exit_code + stdout + stderr
Why this is needed: currently the agent can only execute pre-written Python files. The OODA loop requires running tool --help, which <tool>, and other ad-hoc commands.
Package management tool with two actions:
check: runsshutil.which(package_name)— returns INSTALLED or MISSINGinstall: runsapt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y <package>with a 300s timeout
Security considerations:
- Whitelist validation: only allow
[a-z0-9\-]in package names to prevent injection - Log all install actions to
/app/logs/installs.log
This is the largest change. Replace single-shot messages.create() with a multi-turn tool-use loop.
Claude needs JSON tool definitions to know what it can call. Two categories:
MCP-proxied tools (forwarded to Kali container):
execute_shell_cmd { command: string }
write_file { filename: string, content: string }
execute_script { filename: string, args?: string }
manage_packages { action: "check"|"install", package_name: string }
Host-local tools (executed in Node.js process):
save_new_skill { tool_name: string, content: string }
read_skill_file { tool_name: string }
list_skills { } (returns the full index)
A new dispatchToolCall(name, input) method in Executor that:
- If the tool is an MCP tool → call
this.mcp.callTool(name, input)(make MCPClient expose a genericcallTool) - If the tool is a host-local tool → call the corresponding local function
- Returns the tool result as a string
New core method replacing generateScript() for autonomous mode:
async runAgentLoop(task: string, maxTurns: number = 15): Promise<AgentResult>
1. Build initial messages array: [{ role: "user", content: wrappedTask }]
2. Loop:
a. Call anthropic.messages.create({ tools, messages, system: SYSTEM_PROMPT })
b. Append assistant response to messages
c. If stop_reason === "end_turn" → extract final text, break
d. If stop_reason === "tool_use":
- For each tool_use block in response.content:
- Call dispatchToolCall(block.name, block.input)
- Collect { type: "tool_result", tool_use_id, content: result }
- Append tool_results as a user message
- Continue loop
e. If maxTurns exceeded → break with warning
3. Return { finalText, toolCallHistory, turnsUsed }
generateScript() and the existing CLI menu options (1-5) continue to work as-is. The agentic loop is a new execution mode (option 7 or a flag on auto-run), not a replacement.
Suggested new CLI option:
7. Autonomous Run — Full agentic loop: Claude plans, executes tools, handles errors, retries
skills/
├── _index.json ← auto-generated index (name, category, tags, description)
├── nmap.md
├── wpscan.md
└── sqlmap.md
Each skill file uses YAML frontmatter + markdown body:
---
tool_name: "wpscan"
category: "web_scanner"
tags: ["wordpress", "cms", "enumeration"]
description: "WordPress security scanner for users, plugins, and themes."
---
# Best Practice Commands
## User Enumeration
`wpscan --url {{TARGET}} --enumerate u --force`
# Anti-Patterns
- Do not use on non-WordPress sites.Three functions in a new src/skills.ts module:
listSkills(): Scans ./skills/*.md, reads only the YAML frontmatter from each file, returns a JSON array of { tool_name, category, tags, description }. This is called at startup and its output is appended to the system prompt.
readSkillFile(tool_name): Returns the full markdown content of ./skills/<tool_name>.md. Called by the agent when it decides to use a known skill.
saveNewSkill(tool_name, content): Writes ./skills/<tool_name>.md. Input validation: sanitize tool_name to [a-z0-9\-] only. Called by the agent after running --help and summarizing a new tool.
At Executor initialization, call listSkills() and append to the system prompt:
[AVAILABLE SKILLS]
You have pre-learned skills for the following tools. Use read_skill_file(tool_name)
to load the full usage guide before using any of these:
- wpscan: WordPress security scanner (tags: wordpress, cms, enumeration)
- nmap: Network port scanner (tags: network, discovery, recon)
The system prompt in executor.ts needs a new section for the OODA protocol. Append after the existing Output Rules:
[TOOL MANAGEMENT PROTOCOL]
You have direct access to a Kali Linux shell via execute_shell_cmd.
1. If a required tool is missing (command not found):
a. Verify: manage_packages(action="check", package_name="<tool>")
b. Install: manage_packages(action="install", package_name="<tool>")
c. Learn: execute_shell_cmd("<tool> --help | head -80")
d. Save: save_new_skill(tool_name, content) — write a concise skill file
e. Execute: run your original objective with the new knowledge
[SKILL LIBRARY]
Before using a tool, check if a skill file exists (listed below). If so,
call read_skill_file(tool_name) to load best practices before proceeding.
{SKILLS_INDEX}
Update docker-compose.yml:
services:
kali:
volumes:
- ./logs:/app/logs
- ./scripts:/app/scripts
- kali_apt_cache:/var/cache/apt # persist downloaded .deb files
- kali_apt_lib:/var/lib/apt # persist package index
- kali_installed:/usr/local # persist pip-installed packages
volumes:
kali_apt_cache:
kali_apt_lib:
kali_installed:Note: persisting /usr/bin via a named volume is fragile (it includes base system binaries). A better approach is to persist only the apt cache so re-installs are fast (no re-download), and accept that apt-get install must re-run after container recreation. The agent handles this automatically via the OODA loop.
The manage_packages tool should append to /app/logs/installs.log:
[2026-02-10 14:30:00] INSTALL wpscan → SUCCESS
[2026-02-10 14:31:00] INSTALL gobuster → ALREADY_INSTALLED
This log is volume-mounted and survives container restarts. It can also be read by the agent to check what was previously installed.
| File | Change Type | Description |
|---|---|---|
kali/server.py |
Modify | Add execute_shell_cmd and manage_packages tools |
src/executor.ts |
Major modify | Add agentic loop, tool dispatcher, tool schemas, updated system prompt |
src/mcp-client.ts |
Modify | Add generic callTool(name, args) method |
src/skills.ts |
New file | listSkills(), readSkillFile(), saveNewSkill() |
src/types.ts |
Modify | Add AgentResult, ToolCallRecord, SkillIndex interfaces |
src/index.ts |
Modify | Add menu option 7 (Autonomous Run) |
docker-compose.yml |
Modify | Add named volumes for apt persistence |
skills/ |
New dir | Persistent skill library (.md files) |
Phase 1 ──► Phase 2a/2b ──► Phase 2c ──► Phase 3 ──► Phase 4 ──► Phase 5
MCP tools Tool schemas Agentic loop Skill system Prompt update Docker volumes
(Python) + dispatcher (core change) (host-side) (integration) (persistence)
(TypeScript)
Phase 1 and Phase 3 are independent and can be developed in parallel. Phase 2c (agentic loop) is the critical path — everything else feeds into it. Phase 4 (prompt update) should be done last since it references all other components.
| Risk | Impact | Mitigation |
|---|---|---|
execute_shell_cmd is a wide-open shell |
Command injection inside container | Acceptable — the container is already a disposable pentest sandbox. Log all commands. |
apt-get install can install anything |
Malicious package names | Validate [a-z0-9\-] only. The container is ephemeral. |
| Agentic loop runs forever | Token burn, stuck agent | Hard cap via maxTurns (default 15). Add token budget tracking. |
| Skill files grow unbounded | Disk usage, prompt bloat | Only the index (name + description) goes into the prompt. Full content is loaded on-demand via read_skill_file. Cap index at 50 entries. |
Persisting /usr/bin breaks container |
System binaries overwritten | Don't persist /usr/bin. Only persist apt cache for fast re-downloads. |
| Claude generates bad skill files | Garbage in skill library | Validate frontmatter schema on save. Allow manual cleanup of skills/ dir. |
- MCP tools: Shell into Kali container, call tools manually via
curlto the/mcpendpoint - Agentic loop: Create a mock MCP server that returns canned responses; unit test the loop logic
- Skill system: Unit test
listSkills,readSkillFile,saveNewSkillwith a temp directory - Integration: Run the full CLI option 7 against a deliberately misconfigured container (tool missing) and verify the agent installs it, learns it, and completes the task
This test case simulates the complete agentic lifecycle — from user prompt through tool discovery, installation, skill learning, and task execution.
User prompt:
Scan the WordPress site at http://192.168.1.50 for vulnerabilities, enumerate users and plugins.
Expected agent behavior (turn-by-turn):
Turn 1 — Plan & Skill Check
├─ Claude receives the task
├─ Checks [AVAILABLE SKILLS] in system prompt → no wpscan skill found
├─ Decides wpscan is the right tool
└─ Tool call: execute_shell_cmd("which wpscan")
Turn 2 — Discover tool is missing
├─ Result: exit_code=1, stdout="", stderr="" (not found)
├─ Agent enters OODA: diagnose → tool not installed
└─ Tool call: manage_packages(action="check", package_name="wpscan")
Turn 3 — Confirm missing & install
├─ Result: "MISSING"
├─ Agent decides to install
└─ Tool call: manage_packages(action="install", package_name="wpscan")
Turn 4 — Learn the tool
├─ Result: "SUCCESS — wpscan installed"
├─ Agent wants to learn usage before running
└─ Tool call: execute_shell_cmd("wpscan --help | head -80")
Turn 5 — Save skill & execute scan
├─ Result: wpscan help text (flags, examples, etc.)
├─ Agent extracts key flags and best practices
├─ Tool call: save_new_skill("wpscan", <skill markdown content>)
├─ Tool call: execute_shell_cmd("wpscan --url http://192.168.1.50 --enumerate u,p --force --no-banner")
(parallel tool calls — saving skill and running scan are independent)
Turn 6 — Process results
├─ Skill save result: "Saved skills/wpscan.md"
├─ Scan result: wpscan output with users, plugins, vulnerabilities found
├─ Agent analyzes the output
└─ If vulnerabilities found with known exploits, agent may run deeper checks:
Tool call: execute_shell_cmd("wpscan --url http://192.168.1.50 --enumerate vp --api-token ... --force")
(or proceeds to summarize if no API token is available)
Turn 7 — Final summary
├─ stop_reason: "end_turn"
└─ Returns structured report:
- Target: http://192.168.1.50
- WordPress version detected
- Users enumerated (admin, editor, etc.)
- Plugins found + known CVEs
- Recommended next steps
Mock MCP responses for automated testing:
To run this as an automated test without a real target, create a mock MCP server that returns canned responses keyed on the command string:
MOCK_RESPONSES = {
# Turn 1: tool discovery
"which wpscan": {
"exit_code": 1,
"stdout": "",
"stderr": ""
},
# Turn 4: help text after install
"wpscan --help | head -80": {
"exit_code": 0,
"stdout": """_______________________________________________________________
__ _______ _____
\\ \\ / / __ \\ / ____|
\\ \\ /\\ / /| |__) | (___ ___ __ _ _ __
\\ \\/ \\/ / | ___/ \\___ \\ / __|/ _` | '_ \\
\\ /\\ / | | ____) | (__| (_| | | | |
\\/ \\/ |_| |_____/ \\___|\\__,_|_| |_|
WordPress Security Scanner by the WPScan Team
Usage: wpscan [options]
--url URL The URL of the blog to scan
-e, --enumerate [OPTS] Enumeration (u=users, p=plugins, t=themes, vp=vuln plugins)
--force Do not check if target is WordPress
--no-banner Suppress banner output
--api-token TOKEN WPScan API token for vulnerability data
--detection-mode MODE passive, mixed (default), aggressive
--plugins-detection MODE passive, mixed, aggressive
-o, --output FILE Output to file
-f, --format FORMAT Output format (cli, json, cli-no-color)
--stealthy Alias for --random-user-agent --detection-mode passive
--help Show help
--version Show version""",
"stderr": ""
},
# Turn 5: actual scan
"wpscan --url http://192.168.1.50 --enumerate u,p --force --no-banner": {
"exit_code": 0,
"stdout": """[+] URL: http://192.168.1.50/ [192.168.1.50]
[+] Started: Mon Feb 10 14:30:00 2026
Interesting Finding(s):
[+] Headers: Server: Apache/2.4.41 (Ubuntu)
[+] XML-RPC seems to be enabled: http://192.168.1.50/xmlrpc.php
[+] WordPress version 5.8.1 identified (Insecure, released on 2021-09-09)
| Found By: Meta Generator (passive)
| Confirmed By: Atom Generator (aggressive)
[i] User(s) Identified:
[+] admin
| Found By: Author Posts - Author Pattern (passive)
| Confirmed By: Login Error Messages (aggressive)
[+] editor
| Found By: Author Id Brute Forcing (aggressive)
[+] Enumerating Most Popular Plugins (via Passive Methods)
[+] Checking Known Locations
[i] Plugin(s) Identified:
[+] contact-form-7
| Location: http://192.168.1.50/wp-content/plugins/contact-form-7/
| Latest Version: 5.5.3
| Last Updated: 2021-11-26
[+] akismet
| Location: http://192.168.1.50/wp-content/plugins/akismet/
| Latest Version: 4.2.1
[+] Finished: Mon Feb 10 14:30:45 2026
[+] Requests Done: 137
[+] Cached Requests: 5
[+] Data Sent: 34.2 KB
[+] Data Received: 1.2 MB
[+] Memory used: 128.5 MB
[+] Elapsed time: 00:00:45""",
"stderr": ""
}
}
MOCK_PACKAGE_RESPONSES = {
("check", "wpscan"): "MISSING",
("install", "wpscan"): "SUCCESS — wpscan installed (apt-get install -y wpscan)"
}Test assertions:
def test_wordpress_scan_ooda_loop():
"""Verify the agent completes the full OODA cycle for a WordPress scan."""
result = run_agent_loop(
task="Scan the WordPress site at http://192.168.1.50 for vulnerabilities, enumerate users and plugins.",
mock_mcp=MockMCPServer(MOCK_RESPONSES, MOCK_PACKAGE_RESPONSES),
max_turns=10
)
# 1. Agent discovered wpscan was missing
assert any(
call.name == "execute_shell_cmd" and "which wpscan" in call.input["command"]
for call in result.tool_calls
), "Agent should check if wpscan exists"
# 2. Agent installed the missing tool
assert any(
call.name == "manage_packages"
and call.input["action"] == "install"
and call.input["package_name"] == "wpscan"
for call in result.tool_calls
), "Agent should install wpscan when missing"
# 3. Agent learned the tool (read --help)
assert any(
call.name == "execute_shell_cmd" and "wpscan --help" in call.input["command"]
for call in result.tool_calls
), "Agent should read wpscan help to learn usage"
# 4. Agent saved a skill file
assert any(
call.name == "save_new_skill" and call.input["tool_name"] == "wpscan"
for call in result.tool_calls
), "Agent should save a wpscan skill file for future use"
# 5. Agent actually ran the scan
assert any(
call.name == "execute_shell_cmd"
and "wpscan" in call.input["command"]
and "--url" in call.input["command"]
and "192.168.1.50" in call.input["command"]
for call in result.tool_calls
), "Agent should execute wpscan against the target"
# 6. Final output contains meaningful results
assert "admin" in result.final_text, "Final report should mention discovered users"
assert "5.8.1" in result.final_text, "Final report should mention WordPress version"
assert result.turns_used <= 10, "Should complete within turn budget"
# 7. Skill file persists for next run
skills = list_skills()
assert any(s["tool_name"] == "wpscan" for s in skills), "wpscan skill should be in the index"What this test validates end-to-end:
| OODA Phase | Action | Validated By |
|---|---|---|
| Observe | Agent receives task, checks available tools | Assertion 1 — which wpscan called |
| Orient | Agent diagnoses "tool missing" from exit code 1 | Assertion 2 — manage_packages(check) then install |
| Decide | Agent installs tool, reads help, saves skill | Assertions 3 + 4 — --help read, skill saved |
| Act | Agent runs scan with correct flags, summarizes | Assertions 5 + 6 — scan executed, results reported |
| Learn | Skill persists for future sessions | Assertion 7 — skill in index |