Skip to content
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
9f0c625
refactor: align prompts and scan modes with owasp wstg methodology
0xhis Feb 25, 2026
a54ba27
Merge branch 'main' into prompt-optimization
0xhis Feb 25, 2026
4b72fc0
feat(ui): add live status updates during agent initialization
0xhis Feb 25, 2026
8c5d946
fix(ui): show live status messages during all agent phases, not just …
0xhis Feb 25, 2026
c56631e
fix(ui): stabilize live agent status updates
0xhis Feb 25, 2026
0439d70
style: wrap update_agent_status signature to fix line length lint
0xhis Feb 25, 2026
8f02d52
feat: enforce WSTG ID prefixes and deep agent chaining
0xhis Feb 25, 2026
6c02017
feat: enforce testing of newly exposed surfaces after a bypass
0xhis Feb 25, 2026
8859f2b
feat: enforce spawning specialized subagents for heavy exploitation l…
0xhis Feb 25, 2026
8abbb58
feat: add WAF & rate limit adaptation rule to execution guidelines
0xhis Feb 25, 2026
e5b0464
fix(tui): persist thinking blocks & apply copilot review feedback
0xhis Feb 25, 2026
bf6ea9c
style: address copilot review styling suggestions
0xhis Feb 25, 2026
4a3cc13
feat(prompt): add attacker perspective verification to deep/standard …
0xhis Feb 25, 2026
64aa3b5
style: address PR #328 review suggestions
0xhis Feb 25, 2026
24b5147
refactor: drop thinking_blocks from AgentState.messages and dedup tui.py
Feb 25, 2026
76fcf75
fix: address Copilot review suggestions
Feb 25, 2026
650ec46
chore: simplify PR by removing thinking blocks and redundant code
Mar 9, 2026
e7e03e0
Merge remote-tracking branch 'origin/main' into pr-328
Mar 9, 2026
5be1025
Fix agent telemetry update events
Mar 9, 2026
82bbc11
fix: address Copilot review suggestions
0xhis Feb 25, 2026
ff30eee
fix: revert get_conversation_history copy (memory leak) and remove re…
0xhis Feb 25, 2026
7c7940b
refactor(prompt): enforce subagent delegation for Phase 1 context gat…
0xhis Feb 26, 2026
dc23c1f
fix: address prompt-optimization branch review bugs
0xhis Feb 26, 2026
a567677
refactor(prompt): mitigate exploitation phase refusals and simplify a…
0xhis Feb 26, 2026
19631e2
chore: ignore test_run.sh
0xhis Feb 26, 2026
877af2b
refactor(prompt): update deep scan mode with authorization framing
0xhis Feb 26, 2026
6592a6f
feat(prompt): add mandatory skill assignment triggers for subagent cr…
0xhis Feb 26, 2026
4785d4b
fix(agent): mitigate LLM refusals via explicit authorization and atta…
0xhis Feb 26, 2026
88ffb3c
fix(agent): add todo list instruction and remove WSTG prefixes from a…
0xhis Feb 26, 2026
62bdf09
fix(prompt): tighter legal mandate & target infra bypass framing
0xhis Feb 26, 2026
25f8bd7
Enhance prompt structure with XML bounding and refusal suppression
0xhis Feb 27, 2026
1fc997d
fix(tool): strictly constrain todo priority values to prevent halluci…
0xhis Mar 2, 2026
2f6c1ed
fix(agent): fix XML tag nesting and UI rendering issues from PR review
0xhis Mar 3, 2026
e9f43c3
fix(agent): stabilize sender attribution and align scan/TUI prompt up…
0xhis Mar 7, 2026
a913f76
refactor(prompt): condense quick scan mode to baseline-style flow
0xhis Mar 7, 2026
95e2f88
fix(tui): sanitize merged text spans to prevent render crash
0xhis Mar 7, 2026
9dcb302
fix(agent): address review comments for thinking blocks, empty conten…
0xhis Mar 7, 2026
2bc2522
fix(tui): sanitize text spans on all single-renderable bypass paths
0xhis Mar 7, 2026
1236065
fix(llm): reduce conversation token budget to 80k to prevent exceedin…
0xhis Mar 7, 2026
ce2353a
fix(llm): include system prompt tokens in memory compressor budget
0xhis Mar 7, 2026
b15d3d6
fix(llm): handle malformed function/parameter open tags from GLM-5
0xhis Mar 10, 2026
9573242
Fix GLM-5 regex lookahead and tracer payload None regression
0xhis Mar 12, 2026
cfb8b35
Refactor verification workflow to mirror upstream 3-step process usin…
0xhis Mar 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
244 changes: 151 additions & 93 deletions strix/agents/StrixAgent/system_prompt.jinja

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions strix/agents/base_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,11 @@ async def _initialize_sandbox_and_state(self, task: str) -> None:
sandbox_mode = os.getenv("STRIX_SANDBOX_MODE", "false").lower() == "true"
if not sandbox_mode and self.state.sandbox_id is None:
from strix.runtime import get_runtime
from strix.telemetry.tracer import get_global_tracer

tracer = get_global_tracer()
if tracer:
tracer.update_agent_system_message(self.state.agent_id, "Setting up sandbox environment...")

try:
runtime = get_runtime()
Expand Down Expand Up @@ -355,6 +360,10 @@ async def _initialize_sandbox_and_state(self, task: str) -> None:
async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool:
final_response = None

if tracer:
tracer.update_agent_system_message(self.state.agent_id, "Thinking...")
await asyncio.sleep(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary await asyncio.sleep(0) yield point

This await asyncio.sleep(0) is added to yield to the event loop so the TUI can render the "Thinking..." status message. However, this is fragile — it relies on the event loop scheduler running the TUI's timer callback in this narrow window. Since the status message is already being set on the tracer (which the TUI polls via its animation timer), the sleep(0) is unnecessary and will have no visible effect in practice. If the intent is to ensure the TUI picks up the status change, the TUI's polling timer already handles this asynchronously.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/agents/base_agent.py
Line: 364-365

Comment:
**Unnecessary `await asyncio.sleep(0)` yield point**

This `await asyncio.sleep(0)` is added to yield to the event loop so the TUI can render the "Thinking..." status message. However, this is fragile — it relies on the event loop scheduler running the TUI's timer callback in this narrow window. Since the status message is already being set on the tracer (which the TUI polls via its animation timer), the `sleep(0)` is unnecessary and will have no visible effect in practice. If the intent is to ensure the TUI picks up the status change, the TUI's polling timer already handles this asynchronously.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.


async for response in self.llm.generate(self.state.get_conversation_history()):
final_response = response
if tracer and response.content:
Expand Down Expand Up @@ -383,10 +392,15 @@ async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool:
self.state.add_message("assistant", final_response.content, thinking_blocks=thinking_blocks)
if tracer:
Comment on lines 392 to 394
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking_blocks are now stored directly on AgentState.messages (via add_message(..., thinking_blocks=...)). Those message dicts are later forwarded to the LLM provider as-is in LLM._prepare_messages()/_build_completion_args(), which risks breaking provider requests because chat message objects typically only support keys like role and content (unknown keys may be rejected). Consider keeping thinking_blocks out of AgentState.messages (store separately), or sanitize/strip non-provider fields (e.g., drop thinking_blocks) before calling acompletion() and before passing messages into MemoryCompressor.

Copilot uses AI. Check for mistakes.
tracer.clear_streaming_content(self.state.agent_id)
metadata = {}
if thinking_blocks:
metadata["thinking_blocks"] = thinking_blocks

tracer.log_chat_message(
content=clean_content(final_response.content),
role="assistant",
agent_id=self.state.agent_id,
metadata=metadata if metadata else None,
)

actions = (
Expand All @@ -396,8 +410,13 @@ async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool:
)

if actions:
if tracer:
tool_names = [a.get("toolName") or a.get("tool_name") or "tool" for a in actions]
tracer.update_agent_system_message(self.state.agent_id, f"Executing {', '.join(tool_names[:2])}...")
return await self._execute_actions(actions, tracer)

if tracer:
tracer.update_agent_system_message(self.state.agent_id, "Processing response...")
return False
Comment on lines +422 to 433
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrective message injection has no retry cap

Every time the LLM produces a plain-text response with no tool calls, corrective_message is injected as a user turn into self.state.messages and the iteration returns False (loop continues). There is no guard limiting how many times this can happen per run. If a model consistently produces plain-text (e.g., due to a prompt formatting mismatch or a model that ignores tool-call instructions), every failed iteration appends another ~150-token user message to the conversation history. Over the lifetime of an agent with a high max-iteration budget this can consume a significant portion of the context window with repetitive corrective content, crowding out actual task history and compounding the existing memory growth concern.

Consider tracking a per-agent retry counter and triggering a harder recovery (e.g., agent_finish with an error, or raising LLMRequestFailedError) after N consecutive plain-text responses:

self._no_tool_call_streak = getattr(self, "_no_tool_call_streak", 0) + 1
if self._no_tool_call_streak > MAX_NO_TOOL_CALL_RETRIES:
    raise LLMRequestFailedError("Agent produced too many plain-text responses")
self.state.add_message("user", corrective_message)
return False

Reset _no_tool_call_streak to 0 at the top of _process_iteration whenever actions is non-empty.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/agents/base_agent.py
Line: 422-433

Comment:
**Corrective message injection has no retry cap**

Every time the LLM produces a plain-text response with no tool calls, `corrective_message` is injected as a `user` turn into `self.state.messages` and the iteration returns `False` (loop continues). There is no guard limiting how many times this can happen per run. If a model consistently produces plain-text (e.g., due to a prompt formatting mismatch or a model that ignores tool-call instructions), every failed iteration appends another ~150-token user message to the conversation history. Over the lifetime of an agent with a high max-iteration budget this can consume a significant portion of the context window with repetitive corrective content, crowding out actual task history and compounding the existing memory growth concern.

Consider tracking a per-agent retry counter and triggering a harder recovery (e.g., `agent_finish` with an error, or raising `LLMRequestFailedError`) after `N` consecutive plain-text responses:

```python
self._no_tool_call_streak = getattr(self, "_no_tool_call_streak", 0) + 1
if self._no_tool_call_streak > MAX_NO_TOOL_CALL_RETRIES:
    raise LLMRequestFailedError("Agent produced too many plain-text responses")
self.state.add_message("user", corrective_message)
return False
```

Reset `_no_tool_call_streak` to `0` at the top of `_process_iteration` whenever `actions` is non-empty.

How can I resolve this? If you propose a fix, please make it concise.


async def _execute_actions(self, actions: list[Any], tracer: Optional["Tracer"]) -> bool:
Expand Down
43 changes: 36 additions & 7 deletions strix/interface/tui.py
Original file line number Diff line number Diff line change
Expand Up @@ -1215,14 +1215,19 @@ def keymap_styled(keys: list[tuple[str, str]]) -> Text:
return (Text(" "), keymap, False)

if status == "running":
sys_msg = agent_data.get("system_message", "")
if self._agent_has_real_activity(agent_id):
animated_text = Text()
animated_text.append_text(self._get_sweep_animation(self._sweep_colors))
if sys_msg:
animated_text.append(sys_msg, style="dim italic")
animated_text.append(" ", style="dim")
animated_text.append("esc", style="white")
animated_text.append(" ", style="dim")
animated_text.append("stop", style="dim")
return (animated_text, keymap_styled([("ctrl-q", "quit")]), True)
animated_text = self._get_animated_verb_text(agent_id, "Initializing")
msg = sys_msg or "Initializing..."
animated_text = self._get_animated_verb_text(agent_id, msg)
return (animated_text, keymap_styled([("ctrl-q", "quit")]), True)

return (None, Text(), False)
Expand Down Expand Up @@ -1394,7 +1399,7 @@ def _animate_dots(self) -> None:
if not has_active_agents:
has_active_agents = any(
agent_data.get("status", "running") in ["running", "waiting"]
for agent_data in self.tracer.agents.values()
for agent_data in list(self.tracer.agents.values())
)

if not has_active_agents:
Expand Down Expand Up @@ -1655,21 +1660,45 @@ def _render_chat_content(self, msg_data: dict[str, Any]) -> Any:
content = msg_data.get("content", "")
metadata = msg_data.get("metadata", {})

if not content:
return None

if role == "user":
return UserMessageRenderer.render_simple(content)
Comment on lines 1689 to 1692
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty user content bypasses None guard

Before this change the function started with:

if not content:
    return None

That check ran before the role branch, so user messages with empty content returned None safely.

Now the user branch fires first and immediately calls UserMessageRenderer.render_simple(content) without verifying that content is non-empty. If a user-role message arrives with content == "" (e.g. a synthetic message injected by process_tool_invocations before its content is set, or any future code path that appends an empty user turn), render_simple is called with an empty string and likely returns a blank widget entry in the chat log instead of None.

The assistant branch keeps the guard (if not content and not renderables: return None), so the asymmetry is inconsistent. A minimal fix:

Suggested change
if role == "user":
return UserMessageRenderer.render_simple(content)
if role == "user":
if not content:
return None
return UserMessageRenderer.render_simple(content)
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/tui.py
Line: 1689-1690

Comment:
**Empty user `content` bypasses `None` guard**

Before this change the function started with:

```python
if not content:
    return None
```

That check ran before the `role` branch, so user messages with empty content returned `None` safely.

Now the user branch fires *first* and immediately calls `UserMessageRenderer.render_simple(content)` without verifying that `content` is non-empty. If a user-role message arrives with `content == ""` (e.g. a synthetic message injected by `process_tool_invocations` before its content is set, or any future code path that appends an empty user turn), `render_simple` is called with an empty string and likely returns a blank widget entry in the chat log instead of `None`.

The assistant branch keeps the guard (`if not content and not renderables: return None`), so the asymmetry is inconsistent. A minimal fix:

```suggestion
        if role == "user":
            if not content:
                return None
            return UserMessageRenderer.render_simple(content)
```

How can I resolve this? If you propose a fix, please make it concise.


renderables = []

if "thinking_blocks" in metadata and metadata["thinking_blocks"]:
for block in metadata["thinking_blocks"]:
thought = block.get("thinking", "")
if thought:
text = Text()
text.append("🧠 ")
text.append("Thinking", style="bold #a855f7")
text.append("\n ")
indented_thought = "\n ".join(thought.split("\n"))
text.append(indented_thought, style="italic dim")
renderables.append(Static(text, classes="tool-call thinking-tool completed"))

Comment on lines +1696 to +1705
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking-block UI rendering here duplicates the existing ThinkRenderer implementation (strix/interface/tool_components/thinking_renderer.py) and hard-codes the CSS class string. To avoid divergence (styling/formatting changes in one place but not the other), consider reusing the renderer/helper that already formats "🧠 Thinking" blocks, or centralizing this formatting in a shared function.

Copilot uses AI. Check for mistakes.
if not content and not renderables:
return None

if metadata.get("interrupted"):
streaming_result = self._render_streaming_content(content)
interrupted_text = Text()
interrupted_text.append("\n")
interrupted_text.append("⚠ ", style="yellow")
interrupted_text.append("Interrupted by user", style="yellow dim")
return self._merge_renderables([streaming_result, interrupted_text])
renderables.append(self._merge_renderables([streaming_result, interrupted_text]))
elif content:
msg_renderable = AgentMessageRenderer.render_simple(content)
if getattr(msg_renderable, "plain", True):
renderables.append(msg_renderable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getattr(msg_renderable, "plain", True) check appears unnecessary since AgentMessageRenderer.render_simple() always returns a Text object (which doesn't have a plain attribute). This will always default to True, making the check redundant.

Suggested change
if getattr(msg_renderable, "plain", True):
renderables.append(msg_renderable)
msg_renderable = AgentMessageRenderer.render_simple(content)
renderables.append(msg_renderable)
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/tui.py
Line: 1692-1693

Comment:
The `getattr(msg_renderable, "plain", True)` check appears unnecessary since `AgentMessageRenderer.render_simple()` always returns a `Text` object (which doesn't have a `plain` attribute). This will always default to `True`, making the check redundant.

```suggestion
            msg_renderable = AgentMessageRenderer.render_simple(content)
            renderables.append(msg_renderable)
```

How can I resolve this? If you propose a fix, please make it concise.


if not renderables:
return None

return AgentMessageRenderer.render_simple(content)
if len(renderables) == 1:
return renderables[0]

return self._merge_renderables(renderables)

def _render_tool_content_simple(self, tool_data: dict[str, Any]) -> Any:
tool_name = tool_data.get("tool_name", "Unknown Tool")
Expand Down
60 changes: 31 additions & 29 deletions strix/llm/dedupe.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,45 +11,47 @@

logger = logging.getLogger(__name__)

DEDUPE_SYSTEM_PROMPT = """You are an expert vulnerability report deduplication judge.
Your task is to determine if a candidate vulnerability report describes the SAME vulnerability
as any existing report.

CRITICAL DEDUPLICATION RULES:

1. SAME VULNERABILITY means:
- Same root cause (e.g., "missing input validation" not just "SQL injection")
- Same affected component/endpoint/file (exact match or clear overlap)
- Same exploitation method or attack vector
- Would be fixed by the same code change/patch

2. NOT DUPLICATES if:
- Different endpoints even with same vulnerability type (e.g., SQLi in /login vs /search)
- Different parameters in same endpoint (e.g., XSS in 'name' vs 'comment' field)
- Different root causes (e.g., stored XSS vs reflected XSS in same field)
- Different severity levels due to different impact
- One is authenticated, other is unauthenticated

3. ARE DUPLICATES even if:
- Titles are worded differently
- Descriptions have different level of detail
- PoC uses different payloads but exploits same issue
- One report is more thorough than another
- Minor variations in technical analysis

COMPARISON GUIDELINES:
DEDUPE_SYSTEM_PROMPT = """# Role
You are an expert vulnerability report deduplication judge.
Your task is to determine if a candidate vulnerability report describes
the SAME vulnerability as any existing report.

# Deduplication Rules

## SAME VULNERABILITY means:
- Same root cause (e.g., "missing input validation" not just "SQL injection")
- Same affected component/endpoint/file (exact match or clear overlap)
- Same exploitation method or attack vector
- Would be fixed by the same code change/patch

## NOT DUPLICATES if:
- Different endpoints even with same vulnerability type (e.g., SQLi in /login vs /search)
- Different parameters in same endpoint (e.g., XSS in 'name' vs 'comment' field)
- Different root causes (e.g., stored XSS vs reflected XSS in same field)
- Different severity levels due to different impact
- One is authenticated, other is unauthenticated

## ARE DUPLICATES even if:
- Titles are worded differently
- Descriptions have different level of detail
- PoC uses different payloads but exploits same issue
- One report is more thorough than another
- Minor variations in technical analysis

# Comparison Guidelines
- Focus on the technical root cause, not surface-level similarities
- Same vulnerability type (SQLi, XSS) doesn't mean duplicate - location matters
- Consider the fix: would fixing one also fix the other?
- When uncertain, lean towards NOT duplicate

FIELDS TO ANALYZE:
# Fields to Analyze
- title, description: General vulnerability info
- target, endpoint, method: Exact location of vulnerability
- technical_analysis: Root cause details
- poc_description: How it's exploited
- impact: What damage it can cause

# Output Format
YOU MUST RESPOND WITH EXACTLY THIS XML FORMAT AND NOTHING ELSE:

<dedupe_result>
Expand All @@ -68,7 +70,7 @@
<reason>Different endpoints: candidate is /api/search, existing is /api/login</reason>
</dedupe_result>

RULES:
# Output Rules
- is_duplicate MUST be exactly "true" or "false" (lowercase)
- duplicate_id MUST be the exact ID from existing reports or empty if not duplicate
- confidence MUST be a decimal (your confidence level in the decision)
Expand Down
18 changes: 18 additions & 0 deletions strix/llm/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,20 @@ def set_agent_identity(self, agent_name: str | None, agent_id: str | None) -> No
async def generate(
self, conversation_history: list[dict[str, Any]]
) -> AsyncIterator[LLMResponse]:
from strix.telemetry.tracer import get_global_tracer

tracer = get_global_tracer()
if tracer and self.agent_id:
tracer.update_agent_system_message(self.agent_id, "Compressing memory...")

messages = self._prepare_messages(conversation_history)
max_retries = int(Config.get("strix_llm_max_retries") or "5")

for attempt in range(max_retries + 1):
try:
if tracer and self.agent_id:
tracer.update_agent_system_message(self.agent_id, "Waiting for LLM provider...")

async for response in self._stream(messages):
yield response
return # noqa: TRY300
Expand All @@ -130,11 +139,20 @@ async def _stream(self, messages: list[dict[str, Any]]) -> AsyncIterator[LLMResp
accumulated = ""
chunks: list[Any] = []
done_streaming = 0
first_chunk_received = False

self._total_stats.requests += 1
response = await acompletion(**self._build_completion_args(messages), stream=True)

async for chunk in response:
if not first_chunk_received:
first_chunk_received = True
from strix.telemetry.tracer import get_global_tracer

tracer = get_global_tracer()
if tracer and self.agent_id:
tracer.update_agent_system_message(self.agent_id, "Generating response...")

chunks.append(chunk)
if done_streaming:
done_streaming += 1
Expand Down
51 changes: 35 additions & 16 deletions strix/skills/coordination/root_agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name: root-agent
description: Orchestration layer that coordinates specialized subagents for security assessments
---

<instructions>
# Root Agent

Orchestration layer for security assessments. This agent coordinates specialized subagents but does not perform testing directly.
Expand All @@ -11,8 +11,9 @@ You can create agents throughout the testing process—not just at the beginning

## Role

- Decompose targets into discrete, parallelizable tasks
- Spawn and monitor specialized subagents
- Decompose targets into discrete, parallelizable tasks mapped to OWASP WSTG categories
- Spawn and monitor specialized subagents per WSTG domain
- You MUST name your subagents with the appropriate WSTG ID prefix (e.g., `[WSTG-INFO] Discovery Agent`, `[WSTG-INPV] Injection Testing`)
- Aggregate findings into a cohesive final report
- Manage dependencies and handoffs between agents

Expand All @@ -25,21 +26,36 @@ Before spawning agents, analyze the target:
3. **Determine approach** - blackbox, greybox, or whitebox assessment
4. **Prioritize by risk** - critical assets and high-value targets first

## Agent Architecture
## Agent Architecture (WSTG-Aligned)

Structure agents by function:
Structure agents by WSTG testing category:

**Reconnaissance**
**Information Gathering (WSTG-INFO)**
- Asset discovery and enumeration
- Technology fingerprinting
- Attack surface mapping

**Vulnerability Assessment**
- Injection testing (SQLi, XSS, command injection)
- Authentication and session analysis
**Configuration & Deployment (WSTG-CONF)**
- Server misconfiguration testing
- Default credentials and exposed panels
- HTTP header and TLS analysis

**Authentication & Session (WSTG-ATHN, WSTG-SESS)**
- Authentication mechanism analysis
- Session token testing
- JWT/OAuth flow validation

**Authorization (WSTG-ATHZ)**
- Access control testing (IDOR, privilege escalation)
- Business logic flaws
- Infrastructure vulnerabilities
- Role-based access control validation

**Input Validation (WSTG-INPV)**
- Injection testing (SQLi, XSS, command injection, SSRF, XXE)
- File upload and path traversal testing

**Business Logic (WSTG-BUSL)**
- Workflow and process flow testing
- Race condition and state manipulation

**Exploitation and Validation**
- Proof-of-concept development
Expand All @@ -58,14 +74,14 @@ Create agents with minimal dependencies. Parallel execution is faster than seque

**Clear Objectives**

Each agent should have a specific, measurable goal. Vague objectives lead to scope creep and redundant work.
Each agent should have a specific, measurable goal scoped to a WSTG category. Vague objectives lead to scope creep and redundant work.

**Avoid Duplication**

Before creating agents:
1. Analyze the target scope and break into independent tasks
1. Analyze the target scope and break into independent WSTG-aligned tasks
2. Check existing agents to avoid overlap
3. Create agents with clear, specific objectives
3. Create agents with clear, specific objectives mapped to WSTG domains and name them strictly with the prefix (e.g., `[WSTG-ATHN] API Auth Tester`)

**Hierarchical Delegation**

Expand All @@ -88,5 +104,8 @@ When all agents report completion:

1. Collect and deduplicate findings across agents
2. Assess overall security posture
3. Compile executive summary with prioritized recommendations
4. Invoke finish tool with final report
3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line has trailing whitespace at the end, which will be caught by the trailing-whitespace pre-commit hook and fail CI. Please remove the extra space after the closing quote.

Suggested change
3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?"
3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?"

Copilot uses AI. Check for mistakes.
4. If this verification reveals new potential attack vectors, spawn new agents to investigate them before concluding.
5. Once fully satisfied no stones are left unturned, compile the executive summary with prioritized recommendations.
6. Invoke finish tool with the final report.
</instructions>
Loading
Loading