Autonomy & Dynamic Skill Acquisition — Implementation Design

Problem Statement

The current executor is single-shot: one Claude API call produces a Python script, which is blindly written and executed. The agent cannot react to failures, install missing tools, or learn from runtime feedback.

The goal is to upgrade the executor into a closed-loop agentic system that follows an OODA cycle: Attempt → Fail → Diagnose → Provision → Learn → Retry.

Gap Analysis (Current State → Target State)

Area	Current	Target
Claude interaction	Single `messages.create()` call, text-only response	Multi-turn agentic loop with `tool_use` stop reason
MCP tools	`write_file`, `execute_script` (Python files only)	+ `execute_shell_cmd`, `manage_packages`
Error recovery	None — script output is returned as-is	Agent observes errors, diagnoses, provisions, retries
Skill memory	None	Persistent `.md` skill files, indexed at startup, injected into system prompt
Tool installation	Static (baked into Dockerfile)	Dynamic `apt-get install` at runtime via MCP
Docker persistence	Scripts and logs only	+ apt cache and installed binaries via named volumes

Architecture Overview

                          ┌─────────────────────────────────────────┐
                          │           Agentic Loop (executor.ts)    │
                          │                                         │
 User task ──►  Claude API call  ◄──────────────────────┐          │
                    │                                    │          │
                    ▼                                    │          │
              stop_reason?                               │          │
              ┌─────────┐                                │          │
              │end_turn │──► return final text           │          │
              ├─────────┤                                │          │
              │tool_use │──► dispatch tool call ──► result ─────────┘
              └─────────┘        │                                  │
                                 ▼                                  │
                     ┌──── Tool Router ────┐                        │
                     │                     │                        │
              MCP Tools            Host-Local Tools                 │
              (Kali container)     (Node.js process)                │
              ├─ execute_shell_cmd ├─ save_new_skill                │
              ├─ write_file        ├─ read_skill_file               │
              ├─ execute_script    └─ list_skills                   │
              └─ manage_packages                                    │
                                                                    │
                          Skill Library (./skills/*.md)  ◄──────────┘
                          indexed at startup, summary
                          injected into system prompt

Implementation Plan

Phase 1: New MCP Tools (kali/server.py)

Add two new tools to the Kali container's MCP server.

1a. `execute_shell_cmd(command: str) → str`

Runs an arbitrary shell command inside the Kali container. This is the foundation for the OODA loop — the agent needs to run ad-hoc commands (not just Python scripts).

- Runs via subprocess with shell=True
- Same timeout (120s) and output truncation (4000 chars) as execute_script
- Returns exit_code + stdout + stderr

Why this is needed: currently the agent can only execute pre-written Python files. The OODA loop requires running tool --help, which <tool>, and other ad-hoc commands.

1b. `manage_packages(action: str, package_name: str) → str`

Package management tool with two actions:

check: runs shutil.which(package_name) — returns INSTALLED or MISSING
install: runs apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y <package> with a 300s timeout

Security considerations:

Whitelist validation: only allow [a-z0-9\-] in package names to prevent injection
Log all install actions to /app/logs/installs.log

Phase 2: Agentic Loop (executor.ts) — Critical Change

This is the largest change. Replace single-shot messages.create() with a multi-turn tool-use loop.

2a. Define tool schemas for Claude

Claude needs JSON tool definitions to know what it can call. Two categories:

MCP-proxied tools (forwarded to Kali container):

execute_shell_cmd  { command: string }
write_file         { filename: string, content: string }
execute_script     { filename: string, args?: string }
manage_packages    { action: "check"|"install", package_name: string }

Host-local tools (executed in Node.js process):

save_new_skill     { tool_name: string, content: string }
read_skill_file    { tool_name: string }
list_skills        { }  (returns the full index)

2b. Tool dispatcher

A new dispatchToolCall(name, input) method in Executor that:

If the tool is an MCP tool → call this.mcp.callTool(name, input) (make MCPClient expose a generic callTool)
If the tool is a host-local tool → call the corresponding local function
Returns the tool result as a string

2c. Agentic loop (`runAgentLoop`)

New core method replacing generateScript() for autonomous mode:

async runAgentLoop(task: string, maxTurns: number = 15): Promise<AgentResult>

1. Build initial messages array: [{ role: "user", content: wrappedTask }]
2. Loop:
   a. Call anthropic.messages.create({ tools, messages, system: SYSTEM_PROMPT })
   b. Append assistant response to messages
   c. If stop_reason === "end_turn" → extract final text, break
   d. If stop_reason === "tool_use":
      - For each tool_use block in response.content:
        - Call dispatchToolCall(block.name, block.input)
        - Collect { type: "tool_result", tool_use_id, content: result }
      - Append tool_results as a user message
      - Continue loop
   e. If maxTurns exceeded → break with warning
3. Return { finalText, toolCallHistory, turnsUsed }

2d. Preserve existing single-shot paths

generateScript() and the existing CLI menu options (1-5) continue to work as-is. The agentic loop is a new execution mode (option 7 or a flag on auto-run), not a replacement.

Phase 3: Skill System (Host-Side)

3a. Directory structure

skills/
├── _index.json          ← auto-generated index (name, category, tags, description)
├── nmap.md
├── wpscan.md
└── sqlmap.md

3b. Skill file format

Each skill file uses YAML frontmatter + markdown body:

---
tool_name: "wpscan"
category: "web_scanner"
tags: ["wordpress", "cms", "enumeration"]
description: "WordPress security scanner for users, plugins, and themes."
---

# Best Practice Commands
## User Enumeration
`wpscan --url {{TARGET}} --enumerate u --force`

# Anti-Patterns
- Do not use on non-WordPress sites.

3c. Host-local tool implementations

Three functions in a new src/skills.ts module:

listSkills(): Scans ./skills/*.md, reads only the YAML frontmatter from each file, returns a JSON array of { tool_name, category, tags, description }. This is called at startup and its output is appended to the system prompt.

readSkillFile(tool_name): Returns the full markdown content of ./skills/<tool_name>.md. Called by the agent when it decides to use a known skill.

saveNewSkill(tool_name, content): Writes ./skills/<tool_name>.md. Input validation: sanitize tool_name to [a-z0-9\-] only. Called by the agent after running --help and summarizing a new tool.

3d. System prompt injection

At Executor initialization, call listSkills() and append to the system prompt:

[AVAILABLE SKILLS]
You have pre-learned skills for the following tools. Use read_skill_file(tool_name)
to load the full usage guide before using any of these:
- wpscan: WordPress security scanner (tags: wordpress, cms, enumeration)
- nmap: Network port scanner (tags: network, discovery, recon)

Phase 4: System Prompt Update

The system prompt in executor.ts needs a new section for the OODA protocol. Append after the existing Output Rules:

[TOOL MANAGEMENT PROTOCOL]
You have direct access to a Kali Linux shell via execute_shell_cmd.
1. If a required tool is missing (command not found):
   a. Verify: manage_packages(action="check", package_name="<tool>")
   b. Install: manage_packages(action="install", package_name="<tool>")
   c. Learn: execute_shell_cmd("<tool> --help | head -80")
   d. Save: save_new_skill(tool_name, content) — write a concise skill file
   e. Execute: run your original objective with the new knowledge

[SKILL LIBRARY]
Before using a tool, check if a skill file exists (listed below). If so,
call read_skill_file(tool_name) to load best practices before proceeding.

{SKILLS_INDEX}

Phase 5: Docker Persistence

5a. Named volumes for apt cache

Update docker-compose.yml:

services:
  kali:
    volumes:
      - ./logs:/app/logs
      - ./scripts:/app/scripts
      - kali_apt_cache:/var/cache/apt       # persist downloaded .deb files
      - kali_apt_lib:/var/lib/apt           # persist package index
      - kali_installed:/usr/local           # persist pip-installed packages

volumes:
  kali_apt_cache:
  kali_apt_lib:
  kali_installed:

Note: persisting /usr/bin via a named volume is fragile (it includes base system binaries). A better approach is to persist only the apt cache so re-installs are fast (no re-download), and accept that apt-get install must re-run after container recreation. The agent handles this automatically via the OODA loop.

5b. Install log

The manage_packages tool should append to /app/logs/installs.log:

[2026-02-10 14:30:00] INSTALL wpscan → SUCCESS
[2026-02-10 14:31:00] INSTALL gobuster → ALREADY_INSTALLED

This log is volume-mounted and survives container restarts. It can also be read by the agent to check what was previously installed.

File Change Summary

File	Change Type	Description
`kali/server.py`	Modify	Add `execute_shell_cmd` and `manage_packages` tools
`src/executor.ts`	Major modify	Add agentic loop, tool dispatcher, tool schemas, updated system prompt
`src/mcp-client.ts`	Modify	Add generic `callTool(name, args)` method
`src/skills.ts`	New file	`listSkills()`, `readSkillFile()`, `saveNewSkill()`
`src/types.ts`	Modify	Add `AgentResult`, `ToolCallRecord`, `SkillIndex` interfaces
`src/index.ts`	Modify	Add menu option 7 (Autonomous Run)
`docker-compose.yml`	Modify	Add named volumes for apt persistence
`skills/`	New dir	Persistent skill library (`.md` files)

Implementation Order

Phase 1  ──►  Phase 2a/2b  ──►  Phase 2c  ──►  Phase 3  ──►  Phase 4  ──►  Phase 5
MCP tools     Tool schemas      Agentic loop    Skill system   Prompt update  Docker volumes
(Python)      + dispatcher      (core change)   (host-side)    (integration)  (persistence)
              (TypeScript)

Phase 1 and Phase 3 are independent and can be developed in parallel. Phase 2c (agentic loop) is the critical path — everything else feeds into it. Phase 4 (prompt update) should be done last since it references all other components.

Risk & Mitigation

Risk	Impact	Mitigation
`execute_shell_cmd` is a wide-open shell	Command injection inside container	Acceptable — the container is already a disposable pentest sandbox. Log all commands.
`apt-get install` can install anything	Malicious package names	Validate `[a-z0-9\-]` only. The container is ephemeral.
Agentic loop runs forever	Token burn, stuck agent	Hard cap via `maxTurns` (default 15). Add token budget tracking.
Skill files grow unbounded	Disk usage, prompt bloat	Only the index (name + description) goes into the prompt. Full content is loaded on-demand via `read_skill_file`. Cap index at 50 entries.
Persisting `/usr/bin` breaks container	System binaries overwritten	Don't persist `/usr/bin`. Only persist apt cache for fast re-downloads.
Claude generates bad skill files	Garbage in skill library	Validate frontmatter schema on save. Allow manual cleanup of `skills/` dir.

Testing Strategy

MCP tools: Shell into Kali container, call tools manually via curl to the /mcp endpoint
Agentic loop: Create a mock MCP server that returns canned responses; unit test the loop logic
Skill system: Unit test listSkills, readSkillFile, saveNewSkill with a temp directory
Integration: Run the full CLI option 7 against a deliberately misconfigured container (tool missing) and verify the agent installs it, learns it, and completes the task

Example Functional Test: WordPress Scan (Full OODA Loop)

This test case simulates the complete agentic lifecycle — from user prompt through tool discovery, installation, skill learning, and task execution.

User prompt:

Scan the WordPress site at http://192.168.1.50 for vulnerabilities, enumerate users and plugins.

Expected agent behavior (turn-by-turn):

Turn 1 — Plan & Skill Check
├─ Claude receives the task
├─ Checks [AVAILABLE SKILLS] in system prompt → no wpscan skill found
├─ Decides wpscan is the right tool
└─ Tool call: execute_shell_cmd("which wpscan")

Turn 2 — Discover tool is missing
├─ Result: exit_code=1, stdout="", stderr="" (not found)
├─ Agent enters OODA: diagnose → tool not installed
└─ Tool call: manage_packages(action="check", package_name="wpscan")

Turn 3 — Confirm missing & install
├─ Result: "MISSING"
├─ Agent decides to install
└─ Tool call: manage_packages(action="install", package_name="wpscan")

Turn 4 — Learn the tool
├─ Result: "SUCCESS — wpscan installed"
├─ Agent wants to learn usage before running
└─ Tool call: execute_shell_cmd("wpscan --help | head -80")

Turn 5 — Save skill & execute scan
├─ Result: wpscan help text (flags, examples, etc.)
├─ Agent extracts key flags and best practices
├─ Tool call: save_new_skill("wpscan", <skill markdown content>)
├─ Tool call: execute_shell_cmd("wpscan --url http://192.168.1.50 --enumerate u,p --force --no-banner")
   (parallel tool calls — saving skill and running scan are independent)

Turn 6 — Process results
├─ Skill save result: "Saved skills/wpscan.md"
├─ Scan result: wpscan output with users, plugins, vulnerabilities found
├─ Agent analyzes the output
└─ If vulnerabilities found with known exploits, agent may run deeper checks:
   Tool call: execute_shell_cmd("wpscan --url http://192.168.1.50 --enumerate vp --api-token ... --force")
   (or proceeds to summarize if no API token is available)

Turn 7 — Final summary
├─ stop_reason: "end_turn"
└─ Returns structured report:
   - Target: http://192.168.1.50
   - WordPress version detected
   - Users enumerated (admin, editor, etc.)
   - Plugins found + known CVEs
   - Recommended next steps

Mock MCP responses for automated testing:

To run this as an automated test without a real target, create a mock MCP server that returns canned responses keyed on the command string:

MOCK_RESPONSES = {
    # Turn 1: tool discovery
    "which wpscan": {
        "exit_code": 1,
        "stdout": "",
        "stderr": ""
    },

    # Turn 4: help text after install
    "wpscan --help | head -80": {
        "exit_code": 0,
        "stdout": """_______________________________________________________________
        __          _______   _____
        \\ \\        / /  __ \\ / ____|
         \\ \\  /\\  / /| |__) | (___   ___  __ _ _ __
          \\ \\/  \\/ / |  ___/ \\___ \\ / __|/ _` | '_ \\
           \\  /\\  /  | |     ____) | (__| (_| | | | |
            \\/  \\/   |_|    |_____/ \\___|\\__,_|_| |_|

        WordPress Security Scanner by the WPScan Team

Usage: wpscan [options]
        --url URL                    The URL of the blog to scan
    -e, --enumerate [OPTS]           Enumeration (u=users, p=plugins, t=themes, vp=vuln plugins)
        --force                      Do not check if target is WordPress
        --no-banner                  Suppress banner output
        --api-token TOKEN            WPScan API token for vulnerability data
        --detection-mode MODE        passive, mixed (default), aggressive
        --plugins-detection MODE     passive, mixed, aggressive
    -o, --output FILE                Output to file
    -f, --format FORMAT              Output format (cli, json, cli-no-color)
        --stealthy                   Alias for --random-user-agent --detection-mode passive
        --help                       Show help
        --version                    Show version""",
        "stderr": ""
    },

    # Turn 5: actual scan
    "wpscan --url http://192.168.1.50 --enumerate u,p --force --no-banner": {
        "exit_code": 0,
        "stdout": """[+] URL: http://192.168.1.50/ [192.168.1.50]
[+] Started: Mon Feb 10 14:30:00 2026

Interesting Finding(s):
[+] Headers: Server: Apache/2.4.41 (Ubuntu)
[+] XML-RPC seems to be enabled: http://192.168.1.50/xmlrpc.php
[+] WordPress version 5.8.1 identified (Insecure, released on 2021-09-09)
 | Found By: Meta Generator (passive)
 | Confirmed By: Atom Generator (aggressive)

[i] User(s) Identified:
[+] admin
 | Found By: Author Posts - Author Pattern (passive)
 | Confirmed By: Login Error Messages (aggressive)
[+] editor
 | Found By: Author Id Brute Forcing (aggressive)

[+] Enumerating Most Popular Plugins (via Passive Methods)
[+] Checking Known Locations
[i] Plugin(s) Identified:
[+] contact-form-7
 | Location: http://192.168.1.50/wp-content/plugins/contact-form-7/
 | Latest Version: 5.5.3
 | Last Updated: 2021-11-26
[+] akismet
 | Location: http://192.168.1.50/wp-content/plugins/akismet/
 | Latest Version: 4.2.1

[+] Finished: Mon Feb 10 14:30:45 2026
[+] Requests Done: 137
[+] Cached Requests: 5
[+] Data Sent: 34.2 KB
[+] Data Received: 1.2 MB
[+] Memory used: 128.5 MB
[+] Elapsed time: 00:00:45""",
        "stderr": ""
    }
}

MOCK_PACKAGE_RESPONSES = {
    ("check", "wpscan"):  "MISSING",
    ("install", "wpscan"): "SUCCESS — wpscan installed (apt-get install -y wpscan)"
}

Test assertions:

def test_wordpress_scan_ooda_loop():
    """Verify the agent completes the full OODA cycle for a WordPress scan."""
    result = run_agent_loop(
        task="Scan the WordPress site at http://192.168.1.50 for vulnerabilities, enumerate users and plugins.",
        mock_mcp=MockMCPServer(MOCK_RESPONSES, MOCK_PACKAGE_RESPONSES),
        max_turns=10
    )

    # 1. Agent discovered wpscan was missing
    assert any(
        call.name == "execute_shell_cmd" and "which wpscan" in call.input["command"]
        for call in result.tool_calls
    ), "Agent should check if wpscan exists"

    # 2. Agent installed the missing tool
    assert any(
        call.name == "manage_packages"
        and call.input["action"] == "install"
        and call.input["package_name"] == "wpscan"
        for call in result.tool_calls
    ), "Agent should install wpscan when missing"

    # 3. Agent learned the tool (read --help)
    assert any(
        call.name == "execute_shell_cmd" and "wpscan --help" in call.input["command"]
        for call in result.tool_calls
    ), "Agent should read wpscan help to learn usage"

    # 4. Agent saved a skill file
    assert any(
        call.name == "save_new_skill" and call.input["tool_name"] == "wpscan"
        for call in result.tool_calls
    ), "Agent should save a wpscan skill file for future use"

    # 5. Agent actually ran the scan
    assert any(
        call.name == "execute_shell_cmd"
        and "wpscan" in call.input["command"]
        and "--url" in call.input["command"]
        and "192.168.1.50" in call.input["command"]
        for call in result.tool_calls
    ), "Agent should execute wpscan against the target"

    # 6. Final output contains meaningful results
    assert "admin" in result.final_text, "Final report should mention discovered users"
    assert "5.8.1" in result.final_text, "Final report should mention WordPress version"
    assert result.turns_used <= 10, "Should complete within turn budget"

    # 7. Skill file persists for next run
    skills = list_skills()
    assert any(s["tool_name"] == "wpscan" for s in skills), "wpscan skill should be in the index"

What this test validates end-to-end:

OODA Phase	Action	Validated By
Observe	Agent receives task, checks available tools	Assertion 1 — `which wpscan` called
Orient	Agent diagnoses "tool missing" from exit code 1	Assertion 2 — `manage_packages(check)` then `install`
Decide	Agent installs tool, reads help, saves skill	Assertions 3 + 4 — `--help` read, skill saved
Act	Agent runs scan with correct flags, summarizes	Assertions 5 + 6 — scan executed, results reported
Learn	Skill persists for future sessions	Assertion 7 — skill in index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autonomy & Dynamic Skill Acquisition — Implementation Design

Problem Statement

Gap Analysis (Current State → Target State)

Architecture Overview

Implementation Plan

Phase 1: New MCP Tools (kali/server.py)

1a. `execute_shell_cmd(command: str) → str`

1b. `manage_packages(action: str, package_name: str) → str`

Phase 2: Agentic Loop (executor.ts) — Critical Change

2a. Define tool schemas for Claude

2b. Tool dispatcher

2c. Agentic loop (`runAgentLoop`)

2d. Preserve existing single-shot paths

Phase 3: Skill System (Host-Side)

3a. Directory structure

3b. Skill file format

3c. Host-local tool implementations

3d. System prompt injection

Phase 4: System Prompt Update

Phase 5: Docker Persistence

5a. Named volumes for apt cache

5b. Install log

File Change Summary

Implementation Order

Risk & Mitigation

Testing Strategy

Example Functional Test: WordPress Scan (Full OODA Loop)

FilesExpand file tree

autonomy-design.md

Latest commit

History

autonomy-design.md

File metadata and controls

Autonomy & Dynamic Skill Acquisition — Implementation Design

Problem Statement

Gap Analysis (Current State → Target State)

Architecture Overview

Implementation Plan

Phase 1: New MCP Tools (kali/server.py)

1a. execute_shell_cmd(command: str) → str

1b. manage_packages(action: str, package_name: str) → str

Phase 2: Agentic Loop (executor.ts) — Critical Change

2a. Define tool schemas for Claude

2b. Tool dispatcher

2c. Agentic loop (runAgentLoop)

2d. Preserve existing single-shot paths

Phase 3: Skill System (Host-Side)

3a. Directory structure

3b. Skill file format

3c. Host-local tool implementations

3d. System prompt injection

Phase 4: System Prompt Update

Phase 5: Docker Persistence

5a. Named volumes for apt cache

5b. Install log

File Change Summary

Implementation Order

Risk & Mitigation

Testing Strategy

Example Functional Test: WordPress Scan (Full OODA Loop)

1a. `execute_shell_cmd(command: str) → str`

1b. `manage_packages(action: str, package_name: str) → str`

2c. Agentic loop (`runAgentLoop`)