feat: add MCP server for AI agent integration#349
feat: add MCP server for AI agent integration#349ms6rb wants to merge 89 commits intousestrix:mainfrom
Conversation
Security testing playbook for NestJS applications covering guard bypass, validation pipe exploits, module boundary leaks, cross-transport auth inconsistencies, passport/JWT misuse, serialization leaks, ORM injection, CRUD generator gaps, and rate limiting bypass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FastMCP server exposing Strix security sandbox tools to Claude Code, compatible with the skills-based module system. Includes: - Web target HTTP fingerprinting in start_scan - Finding deduplication with title normalization and merge-on-insert - list_vulnerability_reports, list_modules, get_scan_status tools - Richer end_scan summary with OWASP grouping and dedup stats - Web-only methodology branch with adjusted subagent template - 49 unit tests covering all new functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix guard ordering claim: NestJS uses AND logic, bypass is metadata-driven via @public()/@SetMetadata, not order-driven - Add missing validation requirements for ORM injection and cache poisoning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add nestjs module to nestjs trigger, add domain/subdomain_takeover - Add info_disclosure, open_redirect, path_traversal to web_app rules - Add 4 agent templates: NestJS, info disclosure, path traversal, subdomain - Expand HTTP probe paths from 5 to 18 (actuator, .env, swagger, etc.) - Detect Spring Actuator, exposed .env, Swagger from probe results - Add affected_endpoint and cvss_score to vulnerability reports - Update methodology subagent templates with new report fields - 8 new tests (57 total)
- Add MODULE_RULES and agent templates for Django, WordPress, Laravel, Rails, Express, and Flask — detected frameworks now get dedicated testing agents instead of only generic web testing - Auto-fetch OpenAPI/Swagger spec when swagger is detected during fingerprinting — extracts endpoint list and passes to coordinator for better subagent targeting - Add missing OWASP keywords: open_redirect, subdomain_takeover, information_disclosure, prototype_pollution, exposed_env, actuator - Update methodology with OpenAPI auto-discovery guidance - 10 new tests (67 total)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gather Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lan output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Individual markdown files per finding, CSV index sorted by severity, get_finding tool for selective recall, minimal tool responses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d finding recall Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes: keep end_scan name (avoids collision with native finish_scan), remove wrong test_integration change, add strix-agent dependency, add server.py resource descriptions, add pyproject.toml metadata task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move create_vulnerability_report to MCP Orchestration (not proxied) - Note str_replace_editor as partial parity (no create/view/insert) - Add native create_vulnerability_report to Not Yet Supported - Update design doc with final decisions, mark as superseded by plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix proxied tool count (14 -> 13) - Add agent_id parameter documentation requirement for all proxied tools - Add workflow section to README template Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove register_agent as public tool (dispatch_agent handles it) - Update all 23 tool descriptions with parameter docs and enum values - Add agent_id documentation to all 13 proxied tools - Consistent formatting across MCP-only and proxied tools Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all references to the removed vulnerability_reports and scan_dir closure variables in get_scan_status, create_vulnerability_report, list_vulnerability_reports, get_finding, and suggest_chains. These now read from tracer.get_existing_vulnerabilities() and tracer.get_run_dir(). Field name mapping: content->description, affected_endpoints->endpoint, cvss_score->cvss. New findings delegate to tracer.add_vulnerability_report(). Merges mutate shared dict references in-place and call save_run_data(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…down - Use get_existing_vulnerabilities() instead of direct .vulnerability_reports access, consistent with all other callers - Move tracer finalization after sandbox.end_scan() so a destroy_sandbox failure doesn't leave the session in a split state (tracer gone but scan still active) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spec for adding Phase 0 reconnaissance to Strix scan flow: - 6 recon modules (upstream-compatible) - 2 MCP tools: nuclei_scan, download_sourcemaps - Methodology + plan integration for recon-to-vuln handoff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address 3 critical + 5 important issues: - Add "recon" to valid note categories requirement - Commit to python_action approach for download_sourcemaps - Specify generate_plan() integration (bypass MODULE_RULES) - Clarify domain vs web_app trigger semantics - Specify nuclei timeout behavior and tracer-direct filing - Drop recon_context parameter (use task string instead) - Expand local_code recon to full web treatment - Mark mobile APK module as manual/on-demand - Add negative test cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- nuclei_scan: use background execution + polling to avoid terminal_execute timeout cap and proxy_tool 300s HTTP timeout - Add "domain" to recon_surface_discovery triggers - Note that "id" field on recon templates is for logging only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move _VALID_NOTE_CATEGORIES to module scope (was local to register_tools) - Fix existing test regression: filter confidence test to phase-1 only - Guard nuclei_scan against missing tracer (return error, not silent drop) - Fix module verification command to match get_available_skills() API - Fix embedded Python script regex escaping via variable injection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Task 1: split into proper TDD steps (move to module scope, then add recon) - Task 2: fix duplicate step numbers (5,6,6 → 5,6,7) - Task 3/4: add notes explaining why tool functions lack unit tests - Task 4: add debugging note for script construction complexity - Task 5: add note about creative work required for modules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Place nuclei_scan and download_sourcemaps after suggest_chains (line ~630) instead of after get_finding (line ~530). Groups recon tools with non-proxied scan coordination tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Promotes _VALID_NOTE_CATEGORIES from a private local variable inside register_tools() to a public module-level constant VALID_NOTE_CATEGORIES, and adds "recon" as a valid category so reconnaissance agents can write structured notes during scans. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce _RECON_TEMPLATES (phase 0) for surface discovery, infrastructure recon, and subdomain enumeration. All generate_plan() entries now carry a 'phase' field (0 = recon, 1 = vuln testing). Recon modules bypass MODULE_RULES filtering and are included as-is. Fix test_generic_triggers_are_medium_confidence to scope to phase-1 agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds parse_nuclei_jsonl and build_nuclei_command helpers at module scope, plus a nuclei_scan MCP tool that runs nuclei in the sandbox background, polls for completion, parses JSONL output, and auto-files findings via the active tracer with deduplication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements download_sourcemaps MCP tool that fetches a web target's HTML,
extracts JS bundle URLs, probes each bundle for source maps (via inline
comment, HTTP header, or .map fallback), downloads and parses map JSON,
saves recovered sources to /workspace/sourcemaps/{domain}/, and scans
for notable patterns like API keys and secrets. Adds extract_script_urls,
extract_sourcemap_url, and scan_for_notable module-level helpers with
full test coverage (6 new unit tests, 144 total passing).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the tracer fails to initialize, create_vulnerability_report silently returns phantom report IDs that are never persisted. list_vulnerability_reports then returns empty results. - Log the actual exception on tracer init failure (was silently swallowed) - Warn when create_vulnerability_report files without a tracer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if dup_idx is not None: | ||
| # existing[dup_idx] is a shared reference to the dict in | ||
| # tracer.vulnerability_reports, so mutations apply in-place. | ||
| report = existing[dup_idx] | ||
| if _SEVERITY_ORDER.index(severity) > _SEVERITY_ORDER.index( | ||
| _normalize_severity(report.get("severity", "info")) | ||
| ): | ||
| report["severity"] = severity | ||
| # Tracer stores body text as "description", not "content" | ||
| desc = report.get("description", "") | ||
| report["description"] = desc + f"\n\n---\n\n**Additional evidence:**\n{content}" | ||
| # Tracer stores "endpoint" as a string; accumulate comma-separated | ||
| if affected_endpoint: | ||
| existing_endpoint = report.get("endpoint", "") | ||
| if existing_endpoint and existing_endpoint != affected_endpoint: | ||
| if affected_endpoint not in existing_endpoint: | ||
| report["endpoint"] = f"{existing_endpoint}, {affected_endpoint}" | ||
| elif not existing_endpoint: | ||
| report["endpoint"] = affected_endpoint | ||
| if cvss_score is not None and (report.get("cvss") is None or cvss_score > report["cvss"]): | ||
| report["cvss"] = cvss_score | ||
|
|
||
| # Write updated finding to disk (Tracer only auto-writes on add, not on merge) | ||
| if tracer: | ||
| try: | ||
| tracer.save_run_data() | ||
| except Exception: | ||
| pass | ||
|
|
There was a problem hiding this comment.
Silent merge failure if
tracer.get_existing_vulnerabilities() returns copies
The deduplication path at lines 502–526 mutates existing[dup_idx] directly, relying on the comment that it is "a shared reference to the dict in tracer.vulnerability_reports". If Tracer.get_existing_vulnerabilities() ever returns a copy of its internal list (or a list of shallow-copied dicts), all mutations — severity upgrade, description append, endpoint merge, CVSS update — are silently discarded. The subsequent tracer.save_run_data() call then persists the pre-merge version of the finding, making deduplication a no-op without any error or warning.
The safest fix is to avoid relying on the shared-reference assumption and instead call an explicit tracer.update_vulnerability(report_id, ...) method (if one exists), or re-add the merged report via tracer.add_vulnerability_report (with the updated fields) after removing the old one. If the shared-reference behavior is guaranteed by the Tracer contract, it should be enforced with an assertion or test.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix-mcp/src/strix_mcp/tools.py
Line: 499-527
Comment:
**Silent merge failure if `tracer.get_existing_vulnerabilities()` returns copies**
The deduplication path at lines 502–526 mutates `existing[dup_idx]` directly, relying on the comment that it is "a shared reference to the dict in `tracer.vulnerability_reports`". If `Tracer.get_existing_vulnerabilities()` ever returns a copy of its internal list (or a list of shallow-copied dicts), all mutations — severity upgrade, description append, endpoint merge, CVSS update — are silently discarded. The subsequent `tracer.save_run_data()` call then persists the pre-merge version of the finding, making deduplication a no-op without any error or warning.
The safest fix is to avoid relying on the shared-reference assumption and instead call an explicit `tracer.update_vulnerability(report_id, ...)` method (if one exists), or re-add the merged report via `tracer.add_vulnerability_report` (with the updated fields) after removing the old one. If the shared-reference behavior is guaranteed by the Tracer contract, it should be enforced with an assertion or test.
How can I resolve this? If you propose a fix, please make it concise.The dedup merge path mutated dicts from get_existing_vulnerabilities(), relying on them being shared references to the tracer's internal list. If the tracer ever returns copies, merges would be silently discarded. Access tracer.vulnerability_reports[idx] directly instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
start_scan now returns a "tracer" field ("active", "failed", or
"unavailable") and a warning if findings won't be persisted. This
makes tracer init failures visible to the agent instead of silently
succeeding and then failing on nuclei_scan/create_vulnerability_report.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_requests passed end_page=None to the sandbox, which crashes with 'NoneType - int' when the sandbox does pagination arithmetic. Only include optional params in the proxy call when they have values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
download_sourcemaps:
- Handle both sandbox response formats ({"response": {"body": ...}} and {"body": ...})
- Return html_length for debugging empty-result cases
nuclei_scan:
- Capture stderr instead of discarding to /dev/null
- Return nuclei_stderr in response when present (template errors, binary issues)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
strix-mcp, an MCP (Model Context Protocol) server that exposes Strix's Docker sandbox tools to AI coding agents (Claude Code, Cursor, Windsurf, and any MCP-compatible client)strix_runs/formatTest plan
🤖 Generated with Claude Code