Skip to content

8. AI Agent Guide

“samuele edited this page Feb 28, 2026 · 6 revisions

AI Agent Guide

The AI Agent is RedAmon's autonomous pentesting engine — a LangGraph-based system that reasons about your attack surface, selects security tools, executes exploits, and reports findings, all through a real-time chat interface. This guide walks you through every aspect of using the agent.


Opening the AI Agent

  1. On the Graph Dashboard, click the "AI Agent" button on the right side of the toolbar
  2. The AI Agent Drawer slides in from the right side of the screen

AI Agent Drawer


Drawer Layout

The AI Agent drawer contains several sections:

Area Description
Header Connection status (WiFi icon), phase badge, attack type, iteration counter, stealth toggle
Conversation History Button to open past conversations panel
Chat Area Scrollable area showing messages, thinking timeline, and tool executions
Input Area Message input with Send/Stop buttons

Header Elements

Element Description
Connection Status Green WiFi icon = connected, red = disconnected. The agent uses a WebSocket connection
Phase Badge Current operational phase: Informational (blue), Exploitation (red), Post-Exploitation (purple)
Attack Type Shows "CVE", "BRUTE", or "PHISH" badge when the agent is executing an attack path
Iteration Counter Current step number in the agent's reasoning loop
Stealth Toggle Enable/disable stealth mode during agent operation

Sending Messages

Type your message in the input area at the bottom of the drawer.

  • Enter — send the message
  • Shift + Enter — new line (multiline input)
  • The textarea auto-expands as you type

What to Ask

The agent can handle a wide range of queries:

Informational queries (no exploitation):

  • "What vulnerabilities exist on 192.168.1.100?"
  • "Which technologies have critical CVEs?"
  • "Show me all open ports on the subdomains"
  • "Find all endpoints with injectable parameters"
  • "Summarize the attack surface for this project"

Exploitation requests:

  • "Exploit CVE-2021-41773 on the Apache server"
  • "Try brute forcing SSH on 10.0.0.5"
  • "Generate a phishing payload for Windows"
  • "Create a malicious Word document with a macro"
  • "Find and exploit the most critical vulnerability"
  • "Test the Node.js deserialization vulnerability"

The agent automatically translates natural language into Neo4j graph queries, tool commands, and exploitation workflows.


Understanding the Timeline

As the agent works, you'll see a timeline of its reasoning and actions:

Agent Timeline

Thinking Cards

Show the agent's internal reasoning — what it's considering, planning, and deciding. These are expandable to see full reasoning details.

Tool Execution Cards

Show when the agent runs a tool. Each card displays:

Element Description
Tool name Which tool was executed (e.g., query_graph, execute_nmap, metasploit_console)
Arguments The input sent to the tool
Streaming output Real-time output as the tool runs (updated every 5 seconds for long operations)
Analysis The agent's interpretation of the tool's output
Actionable Findings Key findings extracted from the output
Recommended Next Steps What the agent suggests doing next

Todo List Widget

The agent maintains a todo list that updates as it works. Items are marked as:

  • Pending — not yet started
  • In Progress — currently being worked on
  • Completed — finished
  • Blocked — unable to proceed

The Three Phases

The agent operates in three distinct phases, each with different tool access:

Phase 1: Informational (Default)

Color: Blue

The agent gathers intelligence without any offensive actions:

  • Queries the Neo4j graph for attack surface data
  • Runs web searches for CVE details and exploit PoCs
  • Makes HTTP requests with curl to test endpoints
  • Scans ports with Naabu
  • Runs Nmap for service detection
  • Uses Nuclei for vulnerability verification

Available tools: query_graph, web_search, execute_curl, execute_naabu, execute_nmap, execute_nuclei, kali_shell

Phase 2: Exploitation

Color: Red

When the agent identifies a viable attack path, it requests a phase transition to exploitation. This requires your approval (if approval gates are enabled).

Additional tools unlocked: execute_code, execute_hydra, metasploit_console, msf_restart

Three classified attack paths + unclassified fallback:

Attack Path Badge Description
CVE Exploitation CVE (orange) The agent finds a matching Metasploit module, configures payload (reverse/bind shell), and fires the exploit
Hydra Brute Force BRUTE (purple) Uses THC Hydra to brute force credentials on 50+ protocols (SSH, FTP, RDP, SMB, MySQL, HTTP forms, etc.)
Phishing / Social Engineering PHISH (pink) Generates malicious payloads, documents, or delivery links for human targets. Supports msfvenom, Office macros, PDF, web delivery, HTA, and email sending
Unclassified Fallback grey For techniques that don't match the above (e.g., SQL injection, XSS, SSRF). Uses available tools generically

When an exploit succeeds, the agent records a ChainFinding(exploit_success) in the EvoGraph — recording the attack type, target IP, CVE IDs, module used, payload, and credentials discovered. This finding is linked to the attack chain step and bridged to the recon graph, making it queryable across sessions.

Phishing / Social Engineering Attack Path

The phishing attack path targets human factors rather than software vulnerabilities. Instead of firing an exploit directly, the agent generates a weaponized artifact and delivers it to the target — a person must execute it for the attack to succeed.

6-Step Workflow:

  1. Determine target platform & delivery method — Windows/Linux/macOS/Android + standalone payload, malicious document, web delivery, or HTA delivery
  2. Set up handlerexploit/multi/handler with matching payload, runs in background
  3. Generate payload/document — msfvenom (exe/elf/apk/ps1/war/vba), Metasploit fileformat modules (Word/Excel/PDF/RTF/LNK), web_delivery (one-liner), or HTA server (URL)
  4. Verify generation — confirm file exists, job is running
  5. Deliver — chat download (docker cp), email via Python smtplib, or web link
  6. Wait for callback — check sessions -l, transition to post-exploitation

Four generation methods:

Method Tool Output Delivery
A) Standalone Payload msfvenom via kali_shell Binary/script file (exe, elf, apk, ps1, etc.) File download or email attachment
B) Malicious Document Metasploit fileformat modules Weaponized Word/Excel/PDF/RTF/LNK File download or email attachment
C) Web Delivery exploit/multi/script/web_delivery One-liner command (Python/PHP/PSH/Regsvr32) Paste command in target's terminal
D) HTA Delivery exploit/windows/misc/hta_server URL serving an HTA payload Target visits URL in browser

Email delivery uses execute_code with Python smtplib to send payloads as email attachments. SMTP settings (host, port, credentials) are configured in the project's Attack Paths tab. If no SMTP is configured, the agent asks the user at runtime.

The phishing path shares the same post-exploitation framework as CVE exploits — once a session opens, the agent transitions to post_exploitation with full Meterpreter interactive commands.

Deep dive: For the full payload matrix, all Metasploit fileformat modules, AV evasion techniques, SMTP configuration, troubleshooting, and example scenarios, see the Attack Paths > Phishing / Social Engineering page.

ngrok TCP Tunnel (Reverse Shells over NAT)

If your attacker machine is behind NAT or in a cloud environment, you can route reverse shell traffic through an ngrok TCP tunnel instead of manually configuring LHOST/LPORT:

  1. Create a free account at ngrok.com and complete identity verification (required for TCP tunnels)
  2. Add your authtoken to .env:
    NGROK_AUTHTOKEN=your-token-here
  3. Restart kali-sandbox: docker compose up -d kali-sandbox
  4. Enable "Enable ngrok TCP Tunnel" in the project's Agent Behaviour settings

When enabled, ngrok starts automatically inside the kali-sandbox container and exposes a public TCP endpoint (e.g., tcp://7.tcp.eu.ngrok.io:12345). The agent auto-detects the public host and port from the ngrok API — LHOST and LPORT fields are hidden in the UI since they're no longer needed. All Metasploit reverse shell payloads will use the ngrok tunnel endpoint automatically.

Phase 3: Post-Exploitation

Color: Purple

After a successful exploit, the agent can transition to post-exploitation (if enabled in project settings):

  • Statefull mode — interactive Meterpreter commands: enumeration, lateral movement, data exfiltration
  • Stateless mode — re-runs exploits with different command payloads

Agent Tools Reference

The agent has access to 11 tools, each designed for a specific purpose. Tools are gated by the current operational phase (see Tool Phase Restrictions).

query_graph

Purpose: Query the Neo4j graph database using natural language.

This is the agent's primary source of truth for all reconnaissance data. The graph contains assets (domains, subdomains, IPs, ports, services), web data (endpoints, parameters, certificates, headers), intelligence (technologies, vulnerabilities, CVEs, MITRE CWE/CAPEC), GitHub secrets, and exploit results.

The agent should always check the graph first before reaching for other tools.

Phases: Informational, Exploitation, Post-Exploitation


web_search

Purpose: Search the internet for security research information via Tavily.

Use after query_graph when the agent needs external context not in the graph — CVE details, exploit PoCs, version-specific vulnerabilities, Metasploit module documentation, security advisories, or attack techniques.

Phases: Informational, Exploitation, Post-Exploitation


execute_curl

Purpose: Make HTTP requests to targets.

Primary use is reachability checks (status codes, headers). Fallback use is vulnerability probing (path traversal, LFI/RFI, header injection, SSRF) when the graph has no relevant vulnerability findings for the target.

Phases: Informational, Exploitation, Post-Exploitation


execute_naabu

Purpose: Fast port scanning.

Use only to verify that specific ports are actually open or to scan new targets not yet in the graph. For most cases, port data is already available via query_graph.

Phases: Informational, Exploitation, Post-Exploitation


execute_nmap

Purpose: Deep network scanning with service detection, OS fingerprinting, and NSE scripts.

Use when detailed service analysis is needed (-sV for version detection, -O for OS fingerprinting, -sC for default scripts, --script vuln for vulnerability scripts). Slower than Naabu but much more detailed.

Phases: Informational, Exploitation, Post-Exploitation


execute_nuclei

Purpose: Template-based CVE verification and exploitation.

YAML-based vulnerability scanner with 9,000+ community templates. Primary use is verifying if a target is vulnerable to a specific CVE. Secondary use is detecting vulnerabilities by category (rce, sqli, xss, lfi, etc.). Can verify and exploit many CVEs in a single step.

Phases: Informational, Exploitation, Post-Exploitation


kali_shell

Purpose: General shell execution in the Kali Linux sandbox.

Full bash shell access with all standard Kali tools. Use for downloading PoCs (git clone), payload generation (msfvenom), password cracking (john), SQL injection automation (sqlmap), exploit research (searchsploit), reverse/bind shells (nc, socat, rlwrap), SMB enumeration (smbclient), encoding, DNS lookups, SSH, and any Kali tool not exposed as a dedicated MCP tool.

Do not use for tasks that have a dedicated tool (curlexecute_curl, nmapexecute_nmap, etc.) or for writing multi-line scripts (use execute_code instead).

Timeout: 120 seconds.

Phases: Informational, Exploitation, Post-Exploitation


execute_code

Purpose: Write and execute multi-line code without shell escaping issues.

Code is passed as a clean string parameter, written to a file, and executed with the appropriate interpreter. This eliminates all shell escaping problems that arise when trying to run complex scripts via kali_shell.

Supported languages: Python (default), Bash, Ruby, Perl, C, C++

Timeout: 120 seconds for execution. Compiled languages (C/C++): 60 seconds compile + 120 seconds run.

Files persist at /tmp/{filename}.{ext} and can be re-run via kali_shell if needed.

Pre-installed Python Libraries

The following libraries are available inside the Kali sandbox — import them directly, no pip install needed:

Library Import Use Case
requests import requests HTTP requests for web exploitation, API interaction, form submission, file upload, session management
BeautifulSoup from bs4 import BeautifulSoup Parse HTML responses to extract CSRF tokens, hidden form fields, session nonces, page data, and links. Combine with requests to interact with web apps that require parsing before submission
PyCryptodome from Crypto.Cipher import AES Encrypt/decrypt payloads, hash manipulation, custom crypto attacks, padding oracle, key derivation
PyJWT import jwt Forge, tamper, and decode JWT tokens. Algorithm confusion attacks (none, HS256, RS256), claim manipulation
Paramiko import paramiko Programmatic SSH sessions, SFTP file transfer, SSH tunneling, remote command execution for post-exploitation
Impacket from impacket.smbconnection import SMBConnection Windows/AD attacks: SMB relay, NTLM authentication, Kerberos, secretsdump, psexec, wmiexec, dcomexec
pwntools from pwn import * Binary exploitation, remote TCP/UDP connections, shellcode generation, struct packing, ROP chain building

When to Use execute_code

  • Multi-line exploit scripts — custom PoC code, deserialization payloads, payload generators
  • Web app interaction requiring HTML parsing — fetch a login page, extract a CSRF token with BeautifulSoup, then submit credentials
  • JWT manipulation — decode a token, modify claims (e.g., escalate role to admin), re-sign with a known or guessed secret
  • Crypto attacks — decrypt intercepted traffic, craft encrypted payloads, exploit weak crypto implementations
  • SSH-based post-exploitation — open a Paramiko session to an already-compromised host, enumerate files, exfiltrate data
  • Windows/AD exploitation — use Impacket to dump secrets, enumerate shares, or execute commands via psexec/wmiexec
  • Binary exploitation — connect to a vulnerable service with pwntools, send crafted payloads, receive shells

Examples

Extract CSRF token and submit login form:

import requests
from bs4 import BeautifulSoup

s = requests.Session()
r = s.get('http://target/login', verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
token = soup.find('input', {'name': 'csrf_token'})['value']
r = s.post('http://target/login', data={
    'csrf_token': token,
    'username': 'admin',
    'password': 'admin'
}, verify=False)
print(r.status_code, r.url)

Forge a JWT token with algorithm confusion:

import jwt

# Decode without verification to inspect claims
token = "eyJhbGciOi..."
claims = jwt.decode(token, options={"verify_signature": False})
print("Original claims:", claims)

# Forge with 'none' algorithm (CVE-2015-9235)
forged = jwt.encode({"user": "admin", "role": "admin"}, "", algorithm="HS256")
print("Forged token:", forged)

Enumerate SMB shares with Impacket:

from impacket.smbconnection import SMBConnection

conn = SMBConnection('10.0.0.5', '10.0.0.5')
conn.login('guest', '')
for share in conn.listShares():
    name = share['shi1_netname'][:-1]
    print(f"Share: {name}")

Connect to a vulnerable service with pwntools:

from pwn import *

r = remote('10.0.0.5', 1337)
r.recvuntil(b'> ')
r.sendline(b'payload')
print(r.recvall(timeout=5).decode())

Phases: Informational, Exploitation, Post-Exploitation


execute_hydra

Purpose: Brute force password cracking with THC Hydra.

Fast, parallelized network login cracker supporting 50+ protocols (SSH, FTP, RDP, SMB, VNC, MySQL, MSSQL, PostgreSQL, Redis, MongoDB, HTTP forms, and more). See Hydra Brute Force for configuration options.

Phases: Exploitation, Post-Exploitation


metasploit_console

Purpose: Execute Metasploit Framework commands.

Full access to the Metasploit console — module context and sessions persist between calls. Use for exploit execution, session management, post-exploitation modules, and payload generation. Chain commands with semicolons (;), not &&.

Phases: Exploitation, Post-Exploitation


msf_restart

Purpose: Restart the Metasploit console.

Resets module context and clears stale state. Use when the console becomes unresponsive or when switching between unrelated exploit workflows.

Phases: Exploitation, Post-Exploitation


Agent Container Runtimes

The agent container ships with a full set of language runtimes and development tools. These are available for any agent workload that needs to build, test, or interact with code repositories.

Runtime Version Commands
Node.js 20 LTS node, npm, npx, yarn, pnpm
Python 3.11 python3, pip
Go 1.22 go build, go test, go mod
Ruby 3.3 ruby, gem, bundler
Java OpenJDK 21 java, javac, mvn
PHP 8.4 php, composer
.NET SDK 8.0 dotnet build, dotnet test
Build tools make, gcc, g++
Utilities git, ripgrep (rg), jq, curl, wget, unzip, file, ssh

Approval Workflows

When the agent wants to transition to a more aggressive phase, it pauses and sends an Approval Request.

The approval request includes:

  • Reason — why the agent wants to transition
  • Planned actions — what it intends to do
  • Risks — potential impact

You have three options:

Action Description
Approve Allow the phase transition — agent continues with offensive tools
Modify Approve with modifications — add constraints or redirect the approach
Abort Deny the transition — agent stays in the current phase

Approval gates are configurable per project. You can disable them in the Agent Behaviour tab of project settings to let the agent operate fully autonomously.


Question Requests

Sometimes the agent needs additional information from you. It sends a Question Request with:

  • The question text
  • Optional predefined answer choices

You can select a predefined answer or type a custom response.


Guidance Messages

You can steer the agent while it's working by sending a guidance message:

  • Type your guidance in the input area while the agent is actively processing
  • The guidance is injected into the agent's context before its next reasoning step
  • Examples: "Focus on SSH vulnerabilities", "Skip the web application, look at network services", "Try a different exploit module"

The agent acknowledges guidance with a confirmation message.


Stop and Resume

Stopping the Agent

Click the Stop button (replaces the Send button while the agent is working) to pause execution. The agent's state is checkpointed.

Resuming

After stopping, a Resume button appears. Click it to continue from the last checkpoint with full context preserved.


Conversation History

The agent supports multiple conversations per project. Each conversation is an independent session with its own context.

Viewing Past Conversations

  1. Click the history button (clock icon) in the drawer header
  2. A Conversation History panel slides in showing all past conversations

Each conversation shows:

  • Title (auto-generated from the first message)
  • Status (active, completed)
  • Agent running indicator
  • Current phase
  • Iteration count
  • Timestamp

Switching Conversations

Click on any conversation to load it. The chat area updates with the full message history.

Deleting Conversations

Click the delete icon on any conversation to remove it permanently.

Starting a New Conversation

Click the "New Conversation" button at the top of the history panel.


Downloading Session Reports

You can export any conversation as a Markdown report:

  1. Click the download button (download icon) in the drawer header
  2. The report is saved as a .md file containing:
    • All user messages and agent responses
    • Thinking/reasoning steps
    • Tool executions with output
    • Findings and recommendations
    • Todo list states

Connection Status

The AI Agent uses a WebSocket connection for real-time communication.

Icon Status Meaning
Green WiFi Connected WebSocket is active, agent is reachable
Red WiFi (crossed) Disconnected Connection lost — messages won't send

If disconnected, the agent will attempt to reconnect. You can also try refreshing the page.


Tips for Effective Use

  1. Start with informational queries — ask the agent to summarize the attack surface before requesting exploits
  2. Be specific"Exploit CVE-2021-41773 on 10.0.0.5:8080" works better than "hack the server"
  3. Use guidance — steer the agent if it's going in the wrong direction
  4. Check the todo list — it shows what the agent is planning and what's done
  5. Review tool output — expand tool execution cards to see raw output
  6. Use approval gates — keep them enabled until you're comfortable with the agent's behavior

Agent Configuration

Key settings that control agent behavior (configured in project settings > Agent Behaviour tab):

Setting Default Description
LLM Model claude-opus-4-6 The AI model powering the agent
Max Iterations 100 Maximum reasoning-action loops
Approval for Exploitation true Require your approval before exploitation
Approval for Post-Exploitation true Require your approval before post-exploitation
Post-Exploitation Type statefull Meterpreter sessions vs. one-shot commands
Tool Output Max Chars 20000 Truncation limit for tool output

Full configuration reference: Project Settings Reference > Agent Behavior


Next Steps

Clone this wiki locally