Skip to content

feat: add MCP server for AI agent integration#349

Open
ms6rb wants to merge 89 commits intousestrix:mainfrom
ms6rb:feat/mcp-orchestration
Open

feat: add MCP server for AI agent integration#349
ms6rb wants to merge 89 commits intousestrix:mainfrom
ms6rb:feat/mcp-orchestration

Conversation

@ms6rb
Copy link
Contributor

@ms6rb ms6rb commented Mar 8, 2026

Summary

  • Adds strix-mcp, an MCP (Model Context Protocol) server that exposes Strix's Docker sandbox tools to AI coding agents (Claude Code, Cursor, Windsurf, and any MCP-compatible client)
  • 13 proxied tools with full parity to native Strix (terminal, browser, proxy, file editing, sitemap)
  • 10 MCP orchestration tools: scan lifecycle, subagent dispatch, vulnerability dedup/chaining, knowledge modules
  • Tech stack auto-detection (code-based + HTTP fingerprinting) with recommended scan plans
  • Automatic vulnerability chaining across findings (10 chain rules)
  • OWASP Top 10 categorization and finding persistence in strix_runs/ format
  • Coverage map documenting parity with native Strix and roadmap for unsupported tools
  • Multi-client installation docs (Claude Code, Cursor, Windsurf)
  • Small MCP section added to root README

Test plan

  • 112 unit tests passing (chaining, stack detection, resources, tools, persistence)
  • Integration test with live Docker sandbox
  • Manual end-to-end scan with Claude Code
  • Manual verification with Cursor/Windsurf MCP clients

🤖 Generated with Claude Code

ms6rb and others added 30 commits March 8, 2026 17:12
Security testing playbook for NestJS applications covering guard bypass,
validation pipe exploits, module boundary leaks, cross-transport auth
inconsistencies, passport/JWT misuse, serialization leaks, ORM injection,
CRUD generator gaps, and rate limiting bypass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FastMCP server exposing Strix security sandbox tools to Claude Code,
compatible with the skills-based module system. Includes:

- Web target HTTP fingerprinting in start_scan
- Finding deduplication with title normalization and merge-on-insert
- list_vulnerability_reports, list_modules, get_scan_status tools
- Richer end_scan summary with OWASP grouping and dedup stats
- Web-only methodology branch with adjusted subagent template
- 49 unit tests covering all new functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix guard ordering claim: NestJS uses AND logic, bypass is
  metadata-driven via @public()/@SetMetadata, not order-driven
- Add missing validation requirements for ORM injection and
  cache poisoning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add nestjs module to nestjs trigger, add domain/subdomain_takeover
- Add info_disclosure, open_redirect, path_traversal to web_app rules
- Add 4 agent templates: NestJS, info disclosure, path traversal, subdomain
- Expand HTTP probe paths from 5 to 18 (actuator, .env, swagger, etc.)
- Detect Spring Actuator, exposed .env, Swagger from probe results
- Add affected_endpoint and cvss_score to vulnerability reports
- Update methodology subagent templates with new report fields
- 8 new tests (57 total)
- Add MODULE_RULES and agent templates for Django, WordPress, Laravel,
  Rails, Express, and Flask — detected frameworks now get dedicated
  testing agents instead of only generic web testing
- Auto-fetch OpenAPI/Swagger spec when swagger is detected during
  fingerprinting — extracts endpoint list and passes to coordinator
  for better subagent targeting
- Add missing OWASP keywords: open_redirect, subdomain_takeover,
  information_disclosure, prototype_pollution, exposed_env, actuator
- Update methodology with OpenAPI auto-discovery guidance
- 10 new tests (67 total)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gather

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lan output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t tool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Individual markdown files per finding, CSV index sorted by severity,
get_finding tool for selective recall, minimal tool responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d finding recall

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes: keep end_scan name (avoids collision with native finish_scan),
remove wrong test_integration change, add strix-agent dependency,
add server.py resource descriptions, add pyproject.toml metadata task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move create_vulnerability_report to MCP Orchestration (not proxied)
- Note str_replace_editor as partial parity (no create/view/insert)
- Add native create_vulnerability_report to Not Yet Supported
- Update design doc with final decisions, mark as superseded by plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix proxied tool count (14 -> 13)
- Add agent_id parameter documentation requirement for all proxied tools
- Add workflow section to README template

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove register_agent as public tool (dispatch_agent handles it)
- Update all 23 tool descriptions with parameter docs and enum values
- Add agent_id documentation to all 13 proxied tools
- Consistent formatting across MCP-only and proxied tools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ms6rb and others added 4 commits March 15, 2026 00:19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all references to the removed vulnerability_reports and scan_dir
closure variables in get_scan_status, create_vulnerability_report,
list_vulnerability_reports, get_finding, and suggest_chains. These now
read from tracer.get_existing_vulnerabilities() and tracer.get_run_dir().

Field name mapping: content->description, affected_endpoints->endpoint,
cvss_score->cvss. New findings delegate to tracer.add_vulnerability_report().
Merges mutate shared dict references in-place and call save_run_data().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ms6rb
Copy link
Contributor Author

ms6rb commented Mar 15, 2026

@greptileai

ms6rb and others added 2 commits March 15, 2026 02:35
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…down

- Use get_existing_vulnerabilities() instead of direct .vulnerability_reports
  access, consistent with all other callers
- Move tracer finalization after sandbox.end_scan() so a destroy_sandbox
  failure doesn't leave the session in a split state (tracer gone but
  scan still active)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ms6rb
Copy link
Contributor Author

ms6rb commented Mar 15, 2026

@greptileai

ms6rb and others added 14 commits March 15, 2026 03:29
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spec for adding Phase 0 reconnaissance to Strix scan flow:
- 6 recon modules (upstream-compatible)
- 2 MCP tools: nuclei_scan, download_sourcemaps
- Methodology + plan integration for recon-to-vuln handoff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address 3 critical + 5 important issues:
- Add "recon" to valid note categories requirement
- Commit to python_action approach for download_sourcemaps
- Specify generate_plan() integration (bypass MODULE_RULES)
- Clarify domain vs web_app trigger semantics
- Specify nuclei timeout behavior and tracer-direct filing
- Drop recon_context parameter (use task string instead)
- Expand local_code recon to full web treatment
- Mark mobile APK module as manual/on-demand
- Add negative test cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- nuclei_scan: use background execution + polling to avoid
  terminal_execute timeout cap and proxy_tool 300s HTTP timeout
- Add "domain" to recon_surface_discovery triggers
- Note that "id" field on recon templates is for logging only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move _VALID_NOTE_CATEGORIES to module scope (was local to register_tools)
- Fix existing test regression: filter confidence test to phase-1 only
- Guard nuclei_scan against missing tracer (return error, not silent drop)
- Fix module verification command to match get_available_skills() API
- Fix embedded Python script regex escaping via variable injection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Task 1: split into proper TDD steps (move to module scope, then add recon)
- Task 2: fix duplicate step numbers (5,6,6 → 5,6,7)
- Task 3/4: add notes explaining why tool functions lack unit tests
- Task 4: add debugging note for script construction complexity
- Task 5: add note about creative work required for modules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Place nuclei_scan and download_sourcemaps after suggest_chains
(line ~630) instead of after get_finding (line ~530). Groups
recon tools with non-proxied scan coordination tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Promotes _VALID_NOTE_CATEGORIES from a private local variable inside
register_tools() to a public module-level constant VALID_NOTE_CATEGORIES,
and adds "recon" as a valid category so reconnaissance agents can write
structured notes during scans.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce _RECON_TEMPLATES (phase 0) for surface discovery,
infrastructure recon, and subdomain enumeration. All generate_plan()
entries now carry a 'phase' field (0 = recon, 1 = vuln testing).
Recon modules bypass MODULE_RULES filtering and are included as-is.
Fix test_generic_triggers_are_medium_confidence to scope to phase-1 agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds parse_nuclei_jsonl and build_nuclei_command helpers at module scope,
plus a nuclei_scan MCP tool that runs nuclei in the sandbox background,
polls for completion, parses JSONL output, and auto-files findings via the
active tracer with deduplication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements download_sourcemaps MCP tool that fetches a web target's HTML,
extracts JS bundle URLs, probes each bundle for source maps (via inline
comment, HTTP header, or .map fallback), downloads and parses map JSON,
saves recovered sources to /workspace/sourcemaps/{domain}/, and scans
for notable patterns like API keys and secrets. Adds extract_script_urls,
extract_sourcemap_url, and scan_for_notable module-level helpers with
full test coverage (6 new unit tests, 144 total passing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the tracer fails to initialize, create_vulnerability_report
silently returns phantom report IDs that are never persisted.
list_vulnerability_reports then returns empty results.

- Log the actual exception on tracer init failure (was silently swallowed)
- Warn when create_vulnerability_report files without a tracer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ms6rb
Copy link
Contributor Author

ms6rb commented Mar 17, 2026

@greptileai

Comment on lines +499 to +527
if dup_idx is not None:
# existing[dup_idx] is a shared reference to the dict in
# tracer.vulnerability_reports, so mutations apply in-place.
report = existing[dup_idx]
if _SEVERITY_ORDER.index(severity) > _SEVERITY_ORDER.index(
_normalize_severity(report.get("severity", "info"))
):
report["severity"] = severity
# Tracer stores body text as "description", not "content"
desc = report.get("description", "")
report["description"] = desc + f"\n\n---\n\n**Additional evidence:**\n{content}"
# Tracer stores "endpoint" as a string; accumulate comma-separated
if affected_endpoint:
existing_endpoint = report.get("endpoint", "")
if existing_endpoint and existing_endpoint != affected_endpoint:
if affected_endpoint not in existing_endpoint:
report["endpoint"] = f"{existing_endpoint}, {affected_endpoint}"
elif not existing_endpoint:
report["endpoint"] = affected_endpoint
if cvss_score is not None and (report.get("cvss") is None or cvss_score > report["cvss"]):
report["cvss"] = cvss_score

# Write updated finding to disk (Tracer only auto-writes on add, not on merge)
if tracer:
try:
tracer.save_run_data()
except Exception:
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent merge failure if tracer.get_existing_vulnerabilities() returns copies

The deduplication path at lines 502–526 mutates existing[dup_idx] directly, relying on the comment that it is "a shared reference to the dict in tracer.vulnerability_reports". If Tracer.get_existing_vulnerabilities() ever returns a copy of its internal list (or a list of shallow-copied dicts), all mutations — severity upgrade, description append, endpoint merge, CVSS update — are silently discarded. The subsequent tracer.save_run_data() call then persists the pre-merge version of the finding, making deduplication a no-op without any error or warning.

The safest fix is to avoid relying on the shared-reference assumption and instead call an explicit tracer.update_vulnerability(report_id, ...) method (if one exists), or re-add the merged report via tracer.add_vulnerability_report (with the updated fields) after removing the old one. If the shared-reference behavior is guaranteed by the Tracer contract, it should be enforced with an assertion or test.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix-mcp/src/strix_mcp/tools.py
Line: 499-527

Comment:
**Silent merge failure if `tracer.get_existing_vulnerabilities()` returns copies**

The deduplication path at lines 502–526 mutates `existing[dup_idx]` directly, relying on the comment that it is "a shared reference to the dict in `tracer.vulnerability_reports`". If `Tracer.get_existing_vulnerabilities()` ever returns a copy of its internal list (or a list of shallow-copied dicts), all mutations — severity upgrade, description append, endpoint merge, CVSS update — are silently discarded. The subsequent `tracer.save_run_data()` call then persists the pre-merge version of the finding, making deduplication a no-op without any error or warning.

The safest fix is to avoid relying on the shared-reference assumption and instead call an explicit `tracer.update_vulnerability(report_id, ...)` method (if one exists), or re-add the merged report via `tracer.add_vulnerability_report` (with the updated fields) after removing the old one. If the shared-reference behavior is guaranteed by the Tracer contract, it should be enforced with an assertion or test.

How can I resolve this? If you propose a fix, please make it concise.

ms6rb and others added 4 commits March 17, 2026 10:34
The dedup merge path mutated dicts from get_existing_vulnerabilities(),
relying on them being shared references to the tracer's internal list.
If the tracer ever returns copies, merges would be silently discarded.

Access tracer.vulnerability_reports[idx] directly instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
start_scan now returns a "tracer" field ("active", "failed", or
"unavailable") and a warning if findings won't be persisted. This
makes tracer init failures visible to the agent instead of silently
succeeding and then failing on nuclei_scan/create_vulnerability_report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_requests passed end_page=None to the sandbox, which crashes
with 'NoneType - int' when the sandbox does pagination arithmetic.
Only include optional params in the proxy call when they have values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
download_sourcemaps:
- Handle both sandbox response formats ({"response": {"body": ...}} and {"body": ...})
- Return html_length for debugging empty-result cases

nuclei_scan:
- Capture stderr instead of discarding to /dev/null
- Return nuclei_stderr in response when present (template errors, binary issues)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant