-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: OWASP WSTG methodology alignment & TUI live status #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 13 commits
9f0c625
a54ba27
4b72fc0
8c5d946
c56631e
0439d70
8f02d52
6c02017
8859f2b
8abbb58
e5b0464
bf6ea9c
4a3cc13
64aa3b5
24b5147
76fcf75
650ec46
e7e03e0
5be1025
82bbc11
ff30eee
7c7940b
dc23c1f
a567677
19631e2
877af2b
6592a6f
4785d4b
88ffb3c
62bdf09
25f8bd7
1fc997d
2f6c1ed
e9f43c3
a913f76
95e2f88
9dcb302
2bc2522
1236065
ce2353a
b15d3d6
9573242
cfb8b35
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -321,6 +321,11 @@ async def _initialize_sandbox_and_state(self, task: str) -> None: | |
| sandbox_mode = os.getenv("STRIX_SANDBOX_MODE", "false").lower() == "true" | ||
| if not sandbox_mode and self.state.sandbox_id is None: | ||
| from strix.runtime import get_runtime | ||
| from strix.telemetry.tracer import get_global_tracer | ||
|
|
||
| tracer = get_global_tracer() | ||
| if tracer: | ||
| tracer.update_agent_system_message(self.state.agent_id, "Setting up sandbox environment...") | ||
|
|
||
| try: | ||
| runtime = get_runtime() | ||
|
|
@@ -355,6 +360,10 @@ async def _initialize_sandbox_and_state(self, task: str) -> None: | |
| async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool: | ||
| final_response = None | ||
|
|
||
| if tracer: | ||
| tracer.update_agent_system_message(self.state.agent_id, "Thinking...") | ||
| await asyncio.sleep(0) | ||
|
|
||
| async for response in self.llm.generate(self.state.get_conversation_history()): | ||
| final_response = response | ||
| if tracer and response.content: | ||
|
|
@@ -383,10 +392,15 @@ async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool: | |
| self.state.add_message("assistant", final_response.content, thinking_blocks=thinking_blocks) | ||
| if tracer: | ||
|
Comment on lines
392
to
394
|
||
| tracer.clear_streaming_content(self.state.agent_id) | ||
| metadata = {} | ||
| if thinking_blocks: | ||
| metadata["thinking_blocks"] = thinking_blocks | ||
|
|
||
| tracer.log_chat_message( | ||
| content=clean_content(final_response.content), | ||
| role="assistant", | ||
| agent_id=self.state.agent_id, | ||
| metadata=metadata if metadata else None, | ||
| ) | ||
|
|
||
| actions = ( | ||
|
|
@@ -396,8 +410,13 @@ async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool: | |
| ) | ||
|
|
||
| if actions: | ||
| if tracer: | ||
| tool_names = [a.get("toolName") or a.get("tool_name") or "tool" for a in actions] | ||
| tracer.update_agent_system_message(self.state.agent_id, f"Executing {', '.join(tool_names[:2])}...") | ||
| return await self._execute_actions(actions, tracer) | ||
|
|
||
| if tracer: | ||
| tracer.update_agent_system_message(self.state.agent_id, "Processing response...") | ||
| return False | ||
|
Comment on lines
+422
to
433
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Corrective message injection has no retry cap Every time the LLM produces a plain-text response with no tool calls, Consider tracking a per-agent retry counter and triggering a harder recovery (e.g., self._no_tool_call_streak = getattr(self, "_no_tool_call_streak", 0) + 1
if self._no_tool_call_streak > MAX_NO_TOOL_CALL_RETRIES:
raise LLMRequestFailedError("Agent produced too many plain-text responses")
self.state.add_message("user", corrective_message)
return FalseReset Prompt To Fix With AIThis is a comment left during a code review.
Path: strix/agents/base_agent.py
Line: 422-433
Comment:
**Corrective message injection has no retry cap**
Every time the LLM produces a plain-text response with no tool calls, `corrective_message` is injected as a `user` turn into `self.state.messages` and the iteration returns `False` (loop continues). There is no guard limiting how many times this can happen per run. If a model consistently produces plain-text (e.g., due to a prompt formatting mismatch or a model that ignores tool-call instructions), every failed iteration appends another ~150-token user message to the conversation history. Over the lifetime of an agent with a high max-iteration budget this can consume a significant portion of the context window with repetitive corrective content, crowding out actual task history and compounding the existing memory growth concern.
Consider tracking a per-agent retry counter and triggering a harder recovery (e.g., `agent_finish` with an error, or raising `LLMRequestFailedError`) after `N` consecutive plain-text responses:
```python
self._no_tool_call_streak = getattr(self, "_no_tool_call_streak", 0) + 1
if self._no_tool_call_streak > MAX_NO_TOOL_CALL_RETRIES:
raise LLMRequestFailedError("Agent produced too many plain-text responses")
self.state.add_message("user", corrective_message)
return False
```
Reset `_no_tool_call_streak` to `0` at the top of `_process_iteration` whenever `actions` is non-empty.
How can I resolve this? If you propose a fix, please make it concise. |
||
|
|
||
| async def _execute_actions(self, actions: list[Any], tracer: Optional["Tracer"]) -> bool: | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1215,14 +1215,19 @@ def keymap_styled(keys: list[tuple[str, str]]) -> Text: | |||||||||||||
| return (Text(" "), keymap, False) | ||||||||||||||
|
|
||||||||||||||
| if status == "running": | ||||||||||||||
| sys_msg = agent_data.get("system_message", "") | ||||||||||||||
| if self._agent_has_real_activity(agent_id): | ||||||||||||||
| animated_text = Text() | ||||||||||||||
| animated_text.append_text(self._get_sweep_animation(self._sweep_colors)) | ||||||||||||||
| if sys_msg: | ||||||||||||||
| animated_text.append(sys_msg, style="dim italic") | ||||||||||||||
| animated_text.append(" ", style="dim") | ||||||||||||||
| animated_text.append("esc", style="white") | ||||||||||||||
| animated_text.append(" ", style="dim") | ||||||||||||||
| animated_text.append("stop", style="dim") | ||||||||||||||
| return (animated_text, keymap_styled([("ctrl-q", "quit")]), True) | ||||||||||||||
| animated_text = self._get_animated_verb_text(agent_id, "Initializing") | ||||||||||||||
| msg = sys_msg or "Initializing..." | ||||||||||||||
| animated_text = self._get_animated_verb_text(agent_id, msg) | ||||||||||||||
| return (animated_text, keymap_styled([("ctrl-q", "quit")]), True) | ||||||||||||||
|
|
||||||||||||||
| return (None, Text(), False) | ||||||||||||||
|
|
@@ -1394,7 +1399,7 @@ def _animate_dots(self) -> None: | |||||||||||||
| if not has_active_agents: | ||||||||||||||
| has_active_agents = any( | ||||||||||||||
| agent_data.get("status", "running") in ["running", "waiting"] | ||||||||||||||
| for agent_data in self.tracer.agents.values() | ||||||||||||||
| for agent_data in list(self.tracer.agents.values()) | ||||||||||||||
| ) | ||||||||||||||
|
|
||||||||||||||
| if not has_active_agents: | ||||||||||||||
|
|
@@ -1655,21 +1660,45 @@ def _render_chat_content(self, msg_data: dict[str, Any]) -> Any: | |||||||||||||
| content = msg_data.get("content", "") | ||||||||||||||
| metadata = msg_data.get("metadata", {}) | ||||||||||||||
|
|
||||||||||||||
| if not content: | ||||||||||||||
| return None | ||||||||||||||
|
|
||||||||||||||
| if role == "user": | ||||||||||||||
| return UserMessageRenderer.render_simple(content) | ||||||||||||||
|
Comment on lines
1689
to
1692
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Empty user Before this change the function started with: if not content:
return NoneThat check ran before the Now the user branch fires first and immediately calls The assistant branch keeps the guard (
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: strix/interface/tui.py
Line: 1689-1690
Comment:
**Empty user `content` bypasses `None` guard**
Before this change the function started with:
```python
if not content:
return None
```
That check ran before the `role` branch, so user messages with empty content returned `None` safely.
Now the user branch fires *first* and immediately calls `UserMessageRenderer.render_simple(content)` without verifying that `content` is non-empty. If a user-role message arrives with `content == ""` (e.g. a synthetic message injected by `process_tool_invocations` before its content is set, or any future code path that appends an empty user turn), `render_simple` is called with an empty string and likely returns a blank widget entry in the chat log instead of `None`.
The assistant branch keeps the guard (`if not content and not renderables: return None`), so the asymmetry is inconsistent. A minimal fix:
```suggestion
if role == "user":
if not content:
return None
return UserMessageRenderer.render_simple(content)
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||||||
|
|
||||||||||||||
| renderables = [] | ||||||||||||||
|
|
||||||||||||||
| if "thinking_blocks" in metadata and metadata["thinking_blocks"]: | ||||||||||||||
| for block in metadata["thinking_blocks"]: | ||||||||||||||
| thought = block.get("thinking", "") | ||||||||||||||
| if thought: | ||||||||||||||
| text = Text() | ||||||||||||||
| text.append("🧠 ") | ||||||||||||||
| text.append("Thinking", style="bold #a855f7") | ||||||||||||||
| text.append("\n ") | ||||||||||||||
| indented_thought = "\n ".join(thought.split("\n")) | ||||||||||||||
| text.append(indented_thought, style="italic dim") | ||||||||||||||
| renderables.append(Static(text, classes="tool-call thinking-tool completed")) | ||||||||||||||
|
|
||||||||||||||
|
Comment on lines
+1696
to
+1705
|
||||||||||||||
| if not content and not renderables: | ||||||||||||||
| return None | ||||||||||||||
|
|
||||||||||||||
| if metadata.get("interrupted"): | ||||||||||||||
| streaming_result = self._render_streaming_content(content) | ||||||||||||||
| interrupted_text = Text() | ||||||||||||||
| interrupted_text.append("\n") | ||||||||||||||
| interrupted_text.append("⚠ ", style="yellow") | ||||||||||||||
| interrupted_text.append("Interrupted by user", style="yellow dim") | ||||||||||||||
| return self._merge_renderables([streaming_result, interrupted_text]) | ||||||||||||||
| renderables.append(self._merge_renderables([streaming_result, interrupted_text])) | ||||||||||||||
| elif content: | ||||||||||||||
| msg_renderable = AgentMessageRenderer.render_simple(content) | ||||||||||||||
| if getattr(msg_renderable, "plain", True): | ||||||||||||||
| renderables.append(msg_renderable) | ||||||||||||||
|
||||||||||||||
| if getattr(msg_renderable, "plain", True): | |
| renderables.append(msg_renderable) | |
| msg_renderable = AgentMessageRenderer.render_simple(content) | |
| renderables.append(msg_renderable) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/tui.py
Line: 1692-1693
Comment:
The `getattr(msg_renderable, "plain", True)` check appears unnecessary since `AgentMessageRenderer.render_simple()` always returns a `Text` object (which doesn't have a `plain` attribute). This will always default to `True`, making the check redundant.
```suggestion
msg_renderable = AgentMessageRenderer.render_simple(content)
renderables.append(msg_renderable)
```
How can I resolve this? If you propose a fix, please make it concise.| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |||||
| name: root-agent | ||||||
| description: Orchestration layer that coordinates specialized subagents for security assessments | ||||||
| --- | ||||||
|
|
||||||
| <instructions> | ||||||
| # Root Agent | ||||||
|
|
||||||
| Orchestration layer for security assessments. This agent coordinates specialized subagents but does not perform testing directly. | ||||||
|
|
@@ -11,8 +11,9 @@ You can create agents throughout the testing process—not just at the beginning | |||||
|
|
||||||
| ## Role | ||||||
|
|
||||||
| - Decompose targets into discrete, parallelizable tasks | ||||||
| - Spawn and monitor specialized subagents | ||||||
| - Decompose targets into discrete, parallelizable tasks mapped to OWASP WSTG categories | ||||||
| - Spawn and monitor specialized subagents per WSTG domain | ||||||
| - You MUST name your subagents with the appropriate WSTG ID prefix (e.g., `[WSTG-INFO] Discovery Agent`, `[WSTG-INPV] Injection Testing`) | ||||||
| - Aggregate findings into a cohesive final report | ||||||
| - Manage dependencies and handoffs between agents | ||||||
|
|
||||||
|
|
@@ -25,21 +26,36 @@ Before spawning agents, analyze the target: | |||||
| 3. **Determine approach** - blackbox, greybox, or whitebox assessment | ||||||
| 4. **Prioritize by risk** - critical assets and high-value targets first | ||||||
|
|
||||||
| ## Agent Architecture | ||||||
| ## Agent Architecture (WSTG-Aligned) | ||||||
|
|
||||||
| Structure agents by function: | ||||||
| Structure agents by WSTG testing category: | ||||||
|
|
||||||
| **Reconnaissance** | ||||||
| **Information Gathering (WSTG-INFO)** | ||||||
| - Asset discovery and enumeration | ||||||
| - Technology fingerprinting | ||||||
| - Attack surface mapping | ||||||
|
|
||||||
| **Vulnerability Assessment** | ||||||
| - Injection testing (SQLi, XSS, command injection) | ||||||
| - Authentication and session analysis | ||||||
| **Configuration & Deployment (WSTG-CONF)** | ||||||
| - Server misconfiguration testing | ||||||
| - Default credentials and exposed panels | ||||||
| - HTTP header and TLS analysis | ||||||
|
|
||||||
| **Authentication & Session (WSTG-ATHN, WSTG-SESS)** | ||||||
| - Authentication mechanism analysis | ||||||
| - Session token testing | ||||||
| - JWT/OAuth flow validation | ||||||
|
|
||||||
| **Authorization (WSTG-ATHZ)** | ||||||
| - Access control testing (IDOR, privilege escalation) | ||||||
| - Business logic flaws | ||||||
| - Infrastructure vulnerabilities | ||||||
| - Role-based access control validation | ||||||
|
|
||||||
| **Input Validation (WSTG-INPV)** | ||||||
| - Injection testing (SQLi, XSS, command injection, SSRF, XXE) | ||||||
| - File upload and path traversal testing | ||||||
|
|
||||||
| **Business Logic (WSTG-BUSL)** | ||||||
| - Workflow and process flow testing | ||||||
| - Race condition and state manipulation | ||||||
|
|
||||||
| **Exploitation and Validation** | ||||||
| - Proof-of-concept development | ||||||
|
|
@@ -58,14 +74,14 @@ Create agents with minimal dependencies. Parallel execution is faster than seque | |||||
|
|
||||||
| **Clear Objectives** | ||||||
|
|
||||||
| Each agent should have a specific, measurable goal. Vague objectives lead to scope creep and redundant work. | ||||||
| Each agent should have a specific, measurable goal scoped to a WSTG category. Vague objectives lead to scope creep and redundant work. | ||||||
|
|
||||||
| **Avoid Duplication** | ||||||
|
|
||||||
| Before creating agents: | ||||||
| 1. Analyze the target scope and break into independent tasks | ||||||
| 1. Analyze the target scope and break into independent WSTG-aligned tasks | ||||||
| 2. Check existing agents to avoid overlap | ||||||
| 3. Create agents with clear, specific objectives | ||||||
| 3. Create agents with clear, specific objectives mapped to WSTG domains and name them strictly with the prefix (e.g., `[WSTG-ATHN] API Auth Tester`) | ||||||
|
|
||||||
| **Hierarchical Delegation** | ||||||
|
|
||||||
|
|
@@ -88,5 +104,8 @@ When all agents report completion: | |||||
|
|
||||||
| 1. Collect and deduplicate findings across agents | ||||||
| 2. Assess overall security posture | ||||||
| 3. Compile executive summary with prioritized recommendations | ||||||
| 4. Invoke finish tool with final report | ||||||
| 3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?" | ||||||
|
||||||
| 3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?" | |
| 3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary
await asyncio.sleep(0)yield pointThis
await asyncio.sleep(0)is added to yield to the event loop so the TUI can render the "Thinking..." status message. However, this is fragile — it relies on the event loop scheduler running the TUI's timer callback in this narrow window. Since the status message is already being set on the tracer (which the TUI polls via its animation timer), thesleep(0)is unnecessary and will have no visible effect in practice. If the intent is to ensure the TUI picks up the status change, the TUI's polling timer already handles this asynchronously.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI