-
Notifications
You must be signed in to change notification settings - Fork 23
Network spy #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network spy #115
Conversation
DOM Data Monitor
Code ReviewI've completed a thorough review of PR #115 and found 6 issues that need to be addressed: CLAUDE.md Violations1. Lazy imports in
|
| def _tool_execute_python(self, tool_arguments: dict[str, Any]) -> dict[str, Any]: | |
| """Execute Python code with har_dict pre-loaded from NetworkDataStore.""" | |
| import io | |
| import sys | |
| code = tool_arguments.get("code", "") | |
| if not code: | |
| return {"error": "No code provided"} |
import io
import sysAccording to CLAUDE.md L49:
IMPORTANT: NO lazy imports! All imports must be at the top of the file
Fix: Move these imports to the top of the file.
2. Lazy import in bluebox/sdk/discovery.py
Line 87:
bluebox-sdk/bluebox/sdk/discovery.py
Lines 85 to 91 in 0e5f82d
| def _default_message_handler(self, message: RoutineDiscoveryMessage) -> None: | |
| """Default message handler that logs to console.""" | |
| from bluebox.data_models.routine_discovery.message import RoutineDiscoveryMessageType | |
| if message.type == RoutineDiscoveryMessageType.INITIATED: | |
| logger.info(f"🚀 {message.content}") | |
| elif message.type == RoutineDiscoveryMessageType.PROGRESS_THINKING: |
from bluebox.data_models.routine_discovery.message import RoutineDiscoveryMessageTypeAccording to CLAUDE.md L49:
IMPORTANT: NO lazy imports! All imports must be at the top of the file
Fix: Move this import to the top of the file alongside the existing import from the same module.
Bugs
3. 🔴 CRITICAL: Arbitrary code execution via execute_python tool
File: bluebox/agents/network_spy.py, line 753
bluebox-sdk/bluebox/agents/network_spy.py
Lines 750 to 758 in 0e5f82d
| "har_dict": har_dict, | |
| "json": json, | |
| } | |
| exec(code, exec_globals) # noqa: S102 | |
| output = captured_output.getvalue() | |
| return { | |
| "output": output if output else "(no output)", | |
| } |
The _tool_execute_python method uses exec() to execute arbitrary Python code without sandboxing:
exec_globals = {
"har_dict": har_dict,
"json": json,
}
exec(code, exec_globals) # noqa: S102Security Risk: The LLM analyzes network traffic that could contain malicious content. An attacker could embed prompt injection payloads in HTTP responses, headers, or URLs to trick the LLM into generating malicious code. The exec() has no restrictions on imports or builtins, allowing full system access.
Attack chain: Malicious network traffic → LLM context → prompt injection → LLM calls execute_python → arbitrary code execution
Recommendation: Either:
- Remove this tool entirely if not essential
- Implement proper sandboxing (restricted builtins, no imports, timeout)
- Use a safer alternative like AST-based expression evaluation for limited operations
4. unique_terms_found double-counts terms across search scopes
File: bluebox/agents/network_spy.py, lines 804-820
bluebox-sdk/bluebox/agents/network_spy.py
Lines 800 to 832 in 0e5f82d
| if "headers" in search_in: | |
| headers_str = json.dumps(entry.request_headers).lower() | |
| for term in terms_lower: | |
| count = headers_str.count(term) | |
| if count > 0: | |
| unique_terms_found += 1 | |
| total_hits += count | |
| if "headers" not in matched_in: | |
| matched_in.append("headers") | |
| # Search body | |
| if "body" in search_in and entry.post_data: | |
| post_data_str = json.dumps(entry.post_data) if isinstance(entry.post_data, (dict, list)) else str(entry.post_data) | |
| body_lower = post_data_str.lower() | |
| for term in terms_lower: | |
| count = body_lower.count(term) | |
| if count > 0: | |
| unique_terms_found += 1 | |
| total_hits += count | |
| if "body" not in matched_in: | |
| matched_in.append("body") | |
| if unique_terms_found > 0: | |
| score = (total_hits / len(terms_lower)) * unique_terms_found | |
| results.append({ | |
| "id": entry.request_id, | |
| "method": entry.method, | |
| "url": entry.url, | |
| "matched_in": matched_in, | |
| "unique_terms_found": unique_terms_found, | |
| "total_hits": total_hits, | |
| "score": round(score, 2), | |
| }) |
The unique_terms_found counter is incremented for each term found in each scope (url, headers, body) independently:
if count > 0:
unique_terms_found += 1 # Incremented per scope
total_hits += countBug: If term "api" appears in both URL and headers, unique_terms_found becomes 2 instead of 1, inflating the relevance score.
Impact: With 3 terms each appearing in all 3 scopes, unique_terms_found would be 9 instead of 3, inflating scores by 3x.
Fix: Track found terms in a set and count unique terms only once:
found_terms = set()
# ... in each scope loop:
if count > 0:
found_terms.add(term)
total_hits += count
# ... after all scopes:
unique_terms_found = len(found_terms)5. Hardcoded max iterations in autonomous mode
File: bluebox/agents/network_spy.py, line 1348
bluebox-sdk/bluebox/agents/network_spy.py
Lines 1345 to 1359 in 0e5f82d
| # Add finalize tool availability notice | |
| if self._finalize_tool_registered: | |
| # Get urgency based on iteration count | |
| remaining_iterations = 10 - self._autonomous_iteration | |
| if remaining_iterations <= 2: | |
| finalize_notice = ( | |
| f"\n\n## CRITICAL: YOU MUST CALL finalize_result NOW!\n" | |
| f"Only {remaining_iterations} iterations remaining. " | |
| f"You MUST call `finalize_result` with your best findings immediately. " | |
| f"Do NOT call any other tool - call finalize_result right now!" | |
| ) | |
| elif remaining_iterations <= 4: | |
| finalize_notice = ( | |
| f"\n\n## URGENT: Call finalize_result soon!\n" | |
| f"Only {remaining_iterations} iterations remaining. " |
remaining_iterations = 10 - self._autonomous_iterationBug: The run_autonomous method accepts a configurable max_iterations parameter (default 10), but _get_autonomous_system_prompt hardcodes 10 when computing remaining iterations.
Impact: If caller passes max_iterations=20, urgency notices trigger prematurely (at iteration 8 instead of 18), causing incorrect LLM behavior.
Fix: Pass max_iterations to _get_autonomous_system_prompt and use it:
remaining_iterations = max_iterations - self._autonomous_iteration6. sessionId attached to all CDP commands including browser-level commands
File: bluebox/cdp/async_cdp_session.py, lines 259-260
bluebox-sdk/bluebox/cdp/async_cdp_session.py
Lines 255 to 264 in 0e5f82d
| "id": cmd_id, | |
| "method": method, | |
| "params": params or {}, | |
| } | |
| if self.page_session_id: | |
| msg["sessionId"] = self.page_session_id | |
| msg_json = json.dumps(msg) | |
| await self.ws.send(msg_json) | |
| return cmd_id |
if self.page_session_id:
msg["sessionId"] = self.page_session_idBug: The code unconditionally attaches page_session_id to ALL CDP commands. However, browser-level commands (e.g., Target.getTargets, Target.attachToTarget) should NOT include a sessionId.
Impact: Sending sessionId with browser-level commands routes them to the wrong target, causing CDP protocol errors.
Fix: Only attach sessionId to page-level commands:
if self.page_session_id and is_page_level:
msg["sessionId"] = self.page_session_id(The is_page_level check already exists at line 241 for warning purposes.)
Summary
- 2 CLAUDE.md violations (lazy imports)
- 4 bugs including 1 critical security vulnerability
Please address these issues before merging.
|
Encountered a 400 error |
|
Claude finished @rayruizhiliao's task in 1m 29s —— View job Fixed 400 Error ✓
Root CauseThe 400 error occurred in Issue: The code used msg["tool_calls"] = [
{
"call_id": tc.call_id, # ❌ Wrong - API expects "id"
"name": tc.tool_name,
"arguments": tc.tool_arguments,
}
]This caused the API to see Fix AppliedChanged msg["tool_calls"] = [
{
"id": tc.call_id if tc.call_id else f"call_{idx}_{chat_id[:8]}", # ✓ Correct
"name": tc.tool_name,
"arguments": tc.tool_arguments,
}
]The fix has been committed and pushed to the |
The OpenAI API expects tool_calls to have an 'id' field, not 'call_id'. This was causing a 400 error: 'Invalid type for messages[2].tool_calls[0].id: expected a string, but got null instead.' Co-authored-by: Ruizhi (Ray) Liao <[email protected]>
[Alex] Async CDP monitors + unit tests
[Ray] DOM monitor
light refactor of network monitoring module:
Adapted the (soon to be deprecated) data store to use the data from async monitors:
NEW data store for network monitoring:
Network SPY (beta): [THIS AGENT IS NOT FINALIZED - WILL BE FINALIZED WHEN WE REFACTOR DISCOVERY]
To run network spy:
python scripts/run_network_spy.py --jsonl-path ./cdp_captures/network/events.jsonlTo test how this will fit into discovery, use
/autonomous <TASK>