Skip to content

Review and findings from deployment of Zscaler-MCP on AWS AgentCore, and integration into QuickSuiteΒ #18

@mrpackethead

Description

@mrpackethead

Critical Issues in Zscaler MCP AgentCore Official Image

Summary

We have identified critical bugs and missing production features in the official Zscaler MCP AgentCore Docker image (zscaler/zscaler-mcp-server:0.4.0-bedrock). These issues prevent the server from working correctly with standard MCP clients and fail to meet basic AWS security best practices.

Image Reference: 709825985650.dkr.ecr.us-east-1.amazonaws.com/zscaler/zscaler-mcp-server:0.4.0-bedrock


πŸ› Critical Bug: tools/list Response Format

Issue

The handle_tools_list() function has multiple critical bugs that break MCP protocol compliance and prevent standard MCP clients from discovering available tools.

Current Buggy Implementation

async def handle_tools_list() -> Dict[str, Any]:
    tools = mcp_server.server.list_tools()  # ❌ Missing await
    
    return {
        "status": "success",
        "tool": "tools/list",
        "result": [json.dumps(tools, indent=2)]  # ❌ Double serialization
    }

Problems

  1. Missing await keyword - The async call is not awaited, returning a coroutine object instead of tools
  2. Double JSON serialization - Tools are serialized to a JSON string, then wrapped in an array
  3. Wrong response format - Returns {"status": "success", "result": [...]} instead of MCP-compliant {"tools": [...]}
  4. Object serialization failure - Attempts to serialize Python Tool objects without converting to dictionaries

Actual Output (Broken)

{
  "status": "success",
  "tool": "tools/list",
  "result": [
    "[{\"name\": \"zpa_list_app_segments\", ...}]"  // ❌ String, not object
  ]
}

Expected Output (MCP Protocol)

{
  "tools": [
    {
      "name": "zpa_list_app_segments",
      "description": "List all application segments in ZPA",
      "inputSchema": {
        "type": "object",
        "properties": {...}
      }
    }
  ]
}

Impact

  • ❌ Breaks all standard MCP clients (Claude Desktop, QuickSuite, etc.)
  • ❌ Violates MCP protocol specification
  • ❌ Tools cannot be discovered or invoked
  • ⚠️ May work with Genesis (which wraps everything), masking the bug

Proposed Fix

async def handle_tools_list() -> Dict[str, Any]:
    # Get the list of tools from the MCP server
    tools = await mcp_server.server.list_tools()  # βœ… Added await
    
    # Convert Tool objects to dictionaries for JSON serialization
    tools_list = []
    for tool in tools:
        tool_dict = {
            "name": tool.name,
            "description": tool.description,
        }
        # MCP spec uses inputSchema (camelCase)
        if hasattr(tool, 'inputSchema'):
            tool_dict["inputSchema"] = tool.inputSchema
        tools_list.append(tool_dict)
    
    # Return MCP protocol format: {"tools": [...]}
    return {"tools": tools_list}  # βœ… Correct format

πŸ”’ Critical Security Issue: No AWS Secrets Manager Support

Issue

The official image requires Zscaler API credentials to be passed as plain-text environment variables, which violates AWS security best practices and fails compliance requirements.

Current Implementation

# Credentials must be passed as plain-text environment variables
ENV ZSCALER_CLIENT_ID=iq7u4xxxxxk6
ENV ZSCALER_CLIENT_SECRET=supersecretvalue123  # ❌ Plain text!
ENV ZSCALER_CUSTOMER_ID=2xxxxxxxxxxxx8

Security Risks

Risk Impact
ECS Task Definition Exposure Anyone with ecs:DescribeTaskDefinition can read secrets
CloudFormation Exposure Secrets visible in stack parameters and outputs
Container Inspection docker inspect reveals all environment variables
No Encryption at Rest Credentials stored in plain text in AWS APIs
No Audit Trail No CloudTrail logs for credential access
No Rotation Support Requires redeployment to update credentials
Compliance Failures Fails SOC2, PCI-DSS, HIPAA, ISO 27001 audits

Example Exposure

# Anyone with ECS read permissions can extract secrets
aws ecs describe-task-definition --task-definition zscaler-mcp

# Output exposes credentials in plain text:
{
  "environment": [
    {"name": "ZSCALER_CLIENT_SECRET", "value": "supersecretvalue123"}
  ]
}

Proposed Solution

Add AWS Secrets Manager integration:

import boto3
from botocore.exceptions import ClientError

# Fetch credentials from Secrets Manager if configured
secret_arn = os.environ.get('ZSCALER_SECRET_ARN')
if secret_arn:
    try:
        region = secret_arn.split(':')[3]
        client = boto3.client('secretsmanager', region_name=region)
        response = client.get_secret_value(SecretId=secret_arn)
        secret = json.loads(response['SecretString'])
        
        # Set all secret keys as environment variables
        for key, value in secret.items():
            os.environ[key] = str(value)
        
        logger.info(f"Loaded credentials from Secrets Manager")
    except ClientError as e:
        logger.error(f"Failed to fetch credentials: {e}")
        raise

Benefits:

  • βœ… Credentials encrypted at rest with AWS KMS
  • βœ… IAM-based access control
  • βœ… CloudTrail audit logging
  • βœ… Automatic rotation support
  • βœ… Compliance with SOC2, PCI-DSS, HIPAA
  • βœ… Zero plain-text credential exposure

⚠️ Missing Feature: MCP Protocol Negotiation

Issue

The official image does not handle MCP initialize and ping methods, preventing proper protocol negotiation with MCP clients.

Missing Implementation

# No handling for these required MCP methods:
# - initialize (protocol version negotiation)
# - ping (health check)

Impact

  • ❌ Cannot negotiate protocol versions with clients
  • ❌ No support for MCP 2024-11-05 or 2025-03-26 protocols
  • ❌ Breaks standard MCP client handshake
  • ❌ No health check mechanism for MCP clients

Proposed Solution

if method == "ping":
    logger.info("Handling MCP ping request")
    result = {}  # MCP spec: ping returns empty object
    
elif method == "initialize":
    logger.info("Handling MCP initialize request")
    # Support both 2024-11-05 and 2025-03-26 protocol versions
    client_protocol = payload.get("params", {}).get("protocolVersion", "2024-11-05")
    logger.info(f"Client requested protocol version: {client_protocol}")
    result = {
        "protocolVersion": client_protocol,  # Echo back client's version
        "capabilities": {"tools": {}},
        "serverInfo": {"name": "zscaler-mcp", "version": "1.0.0"}
    }

⚠️ Missing Feature: Standard MCP Client Support

Issue

The official image only supports AWS Genesis NDJSON format and does not support standard MCP clients that use JSON-RPC or SSE (Server-Sent Events).

Current Limitation

# Only returns Genesis NDJSON format
return StreamingResponse(
    generate_streaming_response(response_data, session_id),
    media_type="application/x-ndjson",  # Genesis only
)

Impact

  • ❌ Cannot be used with Claude Desktop
  • ❌ Cannot be used with QuickSuite
  • ❌ Cannot be used with standard MCP testing tools
  • ❌ Limited to AWS Genesis runtime only

Proposed Solution

Add content negotiation based on request format:

# Check if this is a standard MCP client or Genesis
is_jsonrpc = payload.get("jsonrpc") == "2.0"
accept_header = request.headers.get("accept", "")
prefers_sse = "text/event-stream" in accept_header

if is_jsonrpc:
    # Standard JSON-RPC response for MCP clients
    response_content = {
        "jsonrpc": "2.0",
        "id": payload.get("id"),
        "result": result
    }
    
    if prefers_sse:
        # SSE format for streaming clients
        async def sse_generator():
            yield f"data: {json.dumps(response_content)}\n\n"
        
        return StreamingResponse(
            sse_generator(),
            media_type="text/event-stream",
        )
    else:
        # Standard JSON response
        return JSONResponse(content=response_content)
else:
    # Genesis streaming NDJSON response
    return StreamingResponse(
        generate_streaming_response(response_data, session_id),
        media_type="application/x-ndjson",
    )

⚠️ Missing Feature: Service Filtering

Issue

The official image loads all Zscaler services (ZPA, ZIA, ZDX, ZCC, ZIdentity) with no ability to filter, and exceeding MCP client tool limits.

Current Implementation

# Always loads ALL services
mcp_server = ZscalerMCPServer()

Tool Count Problem

The Zscaler MCP server exposes 100+ tools across all services:

  • ZPA: ~30 tools
  • ZIA: ~40 tools
  • ZDX: ~15 tools
  • ZCC: ~10 tools
  • ZIdentity: ~10 tools

Many MCP clients have hard limits on the number of tools they can handle:

MCP Client Tool Limit Result with All Services
Claude Desktop ~50 tools ❌ Fails to load or truncates
Some Genesis Agents ~100 tools ⚠️ Performance degradation
QuickSuite 100 tools silently fails
Custom Clients Varies ❌ May fail silently

Real-World Impact

When testing with Claude Desktop:

# Without filtering (100+ tools)
❌ Error: "Too many tools provided. Maximum 50 tools supported."

# With filtering to only ZPA (30 tools)
βœ… Success: All tools loaded and functional

Impact

  • 🚫 Client Compatibility - Exceeds tool limits in Claude Desktop and other clients
  • πŸ’° Higher AWS costs - Bedrock charges per tool invocation
  • ⏱️ Slower startup - Initializes all services even if unused
  • πŸ”§ No flexibility - Cannot disable unused services
  • πŸ“Š Harder debugging - More tools to troubleshoot
  • ⚑ Performance degradation - Large tool lists slow down client UX

Proposed Solution

# Read ZSCALER_MCP_SERVICES environment variable to filter services
services_env = os.environ.get('ZSCALER_MCP_SERVICES', '')

if services_env:
    enabled_services = set(s.strip() for s in services_env.split(',') if s.strip())
    logger.info(f"Filtering to services: {enabled_services}")
    mcp_server = ZscalerMCPServer(enabled_services=enabled_services)
else:
    logger.info("Loading all services")
    mcp_server = ZscalerMCPServer()

Usage:

# Only enable ZPA and ZIA
ZSCALER_MCP_SERVICES="zpa,zia"

⚠️ Missing Feature: Configurable Logging

Issue

The official image has fixed INFO level logging with no ability to increase verbosity for debugging or decrease for production.

Current Implementation

logging.basicConfig(
    level=logging.INFO,  # ❌ Fixed, cannot change
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Impact

  • πŸ› Harder debugging - Cannot enable DEBUG logs
  • πŸ“Š No traffic inspection - Cannot log HTTP headers/bodies
  • πŸ” Limited troubleshooting - Missing critical diagnostic information

Proposed Solution

# Configure logging with environment variable
log_level = os.environ.get('LOG_LEVEL', 'INFO').upper()
logging.basicConfig(
    level=getattr(logging, log_level, logging.INFO),
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger.info(f"Logging level set to: {log_level}")

# Optional HTTP traffic logging middleware
@app.middleware("http")
async def log_request_response(request: Request, call_next):
    if os.environ.get('LOG_HEADERS', 'false').lower() == 'true':
        logger.info(f"Request: {request.method} {request.url.path}")
        logger.info(f"Headers: {dict(request.headers)}")
    
    response = await call_next(request)
    return response

Usage:

# Enable debug logging
LOG_LEVEL=DEBUG

# Enable HTTP traffic logging
LOG_HEADERS=true

πŸ“Š Summary of Issues

Issue Severity Impact Status
tools/list bug πŸ”΄ Critical Breaks MCP clients Not fixed
No Secrets Manager πŸ”΄ Critical Security vulnerability Not implemented
No protocol negotiation 🟑 High Breaks handshake Not implemented
Genesis-only support 🟑 High Limited compatibility Not implemented
No service filtering 🟑 Medium Higher costs Not implemented
Fixed logging 🟑 Medium Harder debugging Not implemented

πŸ”§ Recommended Actions

  1. Immediate (Critical):

    • Fix tools/list async/await and response format bug
    • Add AWS Secrets Manager support for credential management
  2. High Priority:

    • Implement MCP initialize and ping methods
    • Add JSON-RPC and SSE support for standard MCP clients
  3. Medium Priority:

    • Add service filtering via environment variable
    • Implement configurable logging levels

πŸ“ Testing

We have validated these issues by:

  1. Extracting the official Docker image filesystem from marketplace
  2. Comparing with a working production implementation
  3. Testing with multiple MCP clients (Genesis, QuickSuite, claude, mcp)
  4. Reviewing MCP protocol specification compliance

Test Environment:

  • Image: zscaler/zscaler-mcp-server:0.4.0-bedrock
  • Platform: linux/arm64
  • Extracted: /tmp/zscaler-official/app/web_server.py

🀝 Contributing

We have working implementations of all these fixes and are happy to contribute them back to the project. Please let us know the preferred contribution process.


πŸ’‘ Recommendation: Make AgentCore Build Public

Current Situation

The AgentCore/Bedrock-specific build is currently only available as a pre-built Docker image in AWS Marketplace ECR:

  • Image: 709825985650.dkr.ecr.us-east-1.amazonaws.com/zscaler/zscaler-mcp-server:0.4.0-bedrock
  • Source code: Not available in the public repository
  • Build process: Undocumented and opaque

The Inconsistency

This approach is particularly puzzling given that the rest of the Zscaler MCP project is fully open source:

Component Status Repository
Core MCP Server βœ… Open Source zscaler/zscaler-sdk-python-mcp
All Tool Implementations βœ… Open Source Public GitHub
ZPA Tools βœ… Open Source Public GitHub
ZIA Tools βœ… Open Source Public GitHub
ZDX Tools βœ… Open Source Public GitHub
ZCC Tools βœ… Open Source Public GitHub
ZIdentity Tools βœ… Open Source Public GitHub
AgentCore Wrapper ❌ Closed Only pre-built image

Why hide only the AgentCore wrapper when everything else is public? The wrapper is just a thin HTTP adapter (~300 lines) that translates Genesis NDJSON to MCP protocol calls. It contains no proprietary logic, algorithms, or competitive advantagesβ€”it's purely infrastructure glue code.

This selective opacity creates an inconsistent and confusing developer experience where users can see and modify 95% of the codebase but are blocked from understanding or improving the final 5% needed for AWS deployment.

Why This Is Problematic

  1. Security Concerns

    • Users cannot audit the build process
    • No way to verify what's actually in the container
    • Cannot validate security practices
    • Difficult to assess supply chain risks
  2. Lack of Transparency

    • Build process is hidden from users
    • Cannot understand how Genesis integration works
    • No visibility into dependencies or configurations
    • Makes troubleshooting nearly impossible
  3. Easy to Reverse Engineer Anyway

    • Container images can be easily extracted (as we demonstrated)
    • docker export reveals all files and code
    • Obscurity provides no real protection
    • Only creates friction for legitimate users
  4. Hinders Adoption

    • Enterprise customers require source code review
    • Security teams cannot approve "black box" containers
    • Developers cannot learn from or improve the implementation
    • Community contributions are blocked
  5. Prevents Bug Fixes

    • Users discover bugs but cannot submit fixes
    • No way to validate proposed solutions
    • Slows down issue resolution
    • Forces users to maintain private forks

Recommended Approach

Make the AgentCore/Genesis wrapper code publicly available in the repository:

zscaler-mcp/
β”œβ”€β”€ src/
β”‚   └── zscaler_mcp/
β”‚       β”œβ”€β”€ server.py          # Core MCP server (already public)
β”‚       β”œβ”€β”€ tools/             # Tool implementations (already public)
β”‚       └── web_server.py      # Genesis wrapper (currently hidden)
β”œβ”€β”€ docker/
β”‚   β”œβ”€β”€ Dockerfile             # Build instructions (currently hidden)
β”‚   └── requirements.txt       # Dependencies (currently hidden)
└── docs/
    └── agentcore-deployment.md  # Deployment guide (currently missing)

Benefits of Making It Public

  1. βœ… Increased Trust - Users can audit the code and build process
  2. βœ… Better Security - Community can identify and report vulnerabilities
  3. βœ… Faster Bug Fixes - Users can submit PRs for issues they discover
  4. βœ… Improved Quality - More eyes on the code leads to better implementations
  5. βœ… Easier Adoption - Enterprise security teams can approve the solution
  6. βœ… Community Growth - Developers can learn from and contribute to the project
  7. βœ… Better Documentation - Build process becomes self-documenting
  8. βœ… Reduced Support Burden - Users can troubleshoot and fix issues themselves

Critical for Enterprise Adoption

The current closed-source approach creates significant barriers for enterprise customers:

Enterprise Security Requirements

Most enterprise organizations have mandatory security policies that require:

  1. Source Code Review

    • Security teams must review all code before deployment
    • Cannot approve "black box" containers from unknown sources
    • Need to verify no malicious code or backdoors exist
    • Must validate compliance with internal security standards
  2. Custom Container Builds

    • Enterprises build containers in their own CI/CD pipelines
    • Use internal base images with approved security patches
    • Apply company-specific hardening and configurations
    • Sign containers with internal certificate authorities
  3. Vulnerability Scanning

    • Must scan all dependencies for known CVEs
    • Cannot use pre-built images without scanning source
    • Need to rebuild with patched dependencies when vulnerabilities are discovered
    • Require SBOM (Software Bill of Materials) generation
  4. Supply Chain Security

    • Must verify provenance of all code and dependencies
    • Cannot trust external container registries
    • Need reproducible builds from source
    • Require signed commits and verified contributors

Real-World Enterprise Blockers

Without access to source code and Dockerfile, enterprises cannot:

# ❌ Cannot build from source with internal base images
docker build -t internal-registry/zscaler-mcp:1.0.0 \
  --build-arg BASE_IMAGE=internal-registry/python:3.12-hardened \
  .

# ❌ Cannot scan dependencies before deployment
trivy image zscaler-mcp:latest
snyk container test zscaler-mcp:latest

# ❌ Cannot generate SBOM for compliance
syft zscaler-mcp:latest -o spdx-json > sbom.json

# ❌ Cannot rebuild with patched dependencies
pip install --upgrade cryptography==46.0.2  # CVE fix
docker build -t zscaler-mcp:patched .

# ❌ Cannot apply internal security policies
# - Remove unnecessary packages
# - Add internal CA certificates  
# - Configure internal logging/monitoring
# - Apply network security policies

Enterprise Approval Process

Typical enterprise security approval workflow:

1. Developer requests to use Zscaler MCP
   ↓
2. Security team reviews source code ❌ BLOCKED - No source available
   ↓
3. Security team scans for vulnerabilities ❌ BLOCKED - Cannot scan pre-built image
   ↓
4. Security team builds in internal pipeline ❌ BLOCKED - No Dockerfile
   ↓
5. Security team signs and approves ❌ BLOCKED - Cannot proceed
   ↓
6. Deployment to production ❌ REJECTED

Result: Enterprise customers cannot adopt the solution, regardless of technical merit.

Competitive Disadvantage

By keeping the AgentCore build closed:

  • ❌ Losing enterprise customers to competitors with open-source solutions
  • ❌ Limiting market reach to only small companies without strict security policies
  • ❌ Creating support burden from enterprises trying to reverse-engineer the container
  • ❌ Reducing trust in Zscaler's commitment to transparency and security
  • ❌ Blocking partnerships with security-conscious organizations

The Solution is Simple

Making the source code and Dockerfile public enables enterprises to:

# βœ… Clone the repository
git clone https://github.com/zscaler/zscaler-mcp.git
cd zscaler-mcp

# βœ… Review the code
security-team-review src/

# βœ… Build with internal base image
docker build -t internal-registry/zscaler-mcp:1.0.0 \
  --build-arg BASE_IMAGE=internal-registry/python:3.12-hardened \
  -f docker/Dockerfile .

# βœ… Scan for vulnerabilities
trivy image internal-registry/zscaler-mcp:1.0.0

# βœ… Generate SBOM
syft internal-registry/zscaler-mcp:1.0.0 -o spdx-json > sbom.json

# βœ… Sign and deploy
docker trust sign internal-registry/zscaler-mcp:1.0.0
kubectl apply -f deployment.yaml

This is standard practice for enterprise software and a requirement for serious adoption.

Precedent

Most successful MCP server implementations are fully open source:

  • Anthropic's official MCP servers - Fully open source (GitHub, Filesystem, etc.)
  • AWS's own MCP implementations - Fully open source
  • Community MCP servers - Fully open source
  • Zscaler's own MCP core - Fully open source (except AgentCore wrapper)

There is no competitive advantage to hiding the Genesis wrapper codeβ€”it's a thin adapter layer that follows standard MCP protocol patterns. If Zscaler was comfortable open-sourcing the entire MCP server implementation with all the Zscaler API integrations, why hide the trivial HTTP wrapper?

Conclusion

We strongly urge Zscaler to make the AgentCore build publicly available in the repository. The current model of distributing only pre-built images creates unnecessary friction, reduces trust, and hinders adoption. Since the container can be easily reverse-engineered anyway (as we've demonstrated), obscurity provides no real securityβ€”it only makes it harder for legitimate users to understand, validate, and improve the implementation.

Making the code public would align with industry best practices, increase community trust, and accelerate adoption of the Zscaler MCP server in AWS environments.


πŸ“š References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions