-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Comprehensive Permission Scheme with Tool-Based Access Control #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
QuiltSimon
wants to merge
70
commits into
main
Choose a base branch
from
impl/mcp-server-authentication
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add CORSMiddleware to FastMCP server for cross-origin requests - Configure CORS to allow all origins, methods, and headers - Use proper ASGI app approach following FastMCP documentation - Enable credentials support for authenticated requests - Add fallback handling for missing CORS dependencies - Support both http and streamable-http transport modes This enables web frontend integration with the remote MCP server deployed at https://demo.quiltdata.com/mcp/
- Add SSE transport to CORS middleware configuration - Expose mcp-session-id header for SSE session management - Support SSE alongside HTTP and streamable-HTTP transports - Enable real-time bidirectional communication for frontend integration - Add SSE-specific ECS task definition for deployment
- Add ECS task definition for SSE transport mode - Configure SSE service with proper environment variables - Set up target group and ALB listener rule for /sse/* endpoint - Enable real-time streaming for frontend integration
- Implement custom SessionIDExposeMiddleware to explicitly set Access-Control-Expose-Headers - Add BaseHTTPMiddleware to ensure mcp-session-id is exposed for CORS - Address frontend requirement for session ID access in browser - Maintain compatibility with existing CORS configuration
- Remove custom middleware and use proper Starlette CORSMiddleware - Configure expose_headers=['mcp-session-id'] following FastAPI best practices - Clean up unused imports and simplify implementation - Follow official documentation for CORS header exposure
- Version bump from 0.6.12 to 0.6.13 - Added CHANGELOG entry for Docker container support features - Documented HTTP transport implementation and tooling
- Extract Docker operations to reusable scripts/docker.sh - Move Docker push from push.yml to create-release action - Add manual Docker targets to make.deploy (docker-build, docker-push) - Update env.example with Docker configuration variables - Optimize CI/CD by skipping Docker for dev releases - Add PR validation for Docker builds without pushing - Document required GitHub secrets for ECR operations
- Merged docker.sh and docker_image.py into single docker.py script - Unified tag generation, build, and push operations - Updated all references in workflows and Makefiles - Added comprehensive tests for the new unified script - Removed legacy bash and separate Python scripts - Maintains backward compatibility with same functionality
…Error - Removed _EnsureExposeHeadersMiddleware that was interfering with ASGI protocol - Simplified CORS configuration to use only standard CORSMiddleware - This resolves persistent health check failures and target group draining - CORS headers still properly exposed via expose_headers parameter Fixes: Persistent 404 errors due to unhealthy containers Root cause: Custom middleware breaking ASGI message flow
- Implement AuthenticationService with multiple auth methods: - Quilt3 login (OAuth2 with refresh tokens) - Quilt registry credentials (stored from quilt3 login) - IAM role authentication (ECS task role) - Environment variable authentication - Graceful fallback between methods - Based on Quilt's authentication architecture: - Follows quilt3's credential storage patterns - Supports both local development and ECS deployment - Provides Quilt-compatible boto3 sessions - Add comprehensive test suite with mocking - Handle authentication priority and fallback logic - Support for catalog URL detection and AWS identity resolution
…updates - Fix CORS middleware configuration with proper expose_headers - Enhance health check endpoint with comprehensive metrics - Update task definition with Python-based health check - Add deployment artifacts and diagnostics - Prepare for authentication service integration
…tibility - Updated auth_status() to prioritize AuthenticationService results over QuiltService - Added support for IAM role authentication in ECS environment - Maintained backward compatibility with local QuiltService - Fixed variable naming conflicts to resolve linting issues This should resolve the authentication issues seen in the production logs where the MCP server was not properly authenticating via IAM roles.
- Add OAuth service with JWT token generation and validation - Implement OAuth 2.1 authorization endpoints: - /.well-known/oauth-protected-resource (RFC9728) - /.well-known/oauth-authorization-server (RFC8414) - /oauth/authorize, /oauth/token, /oauth/jwks - Update CORS middleware to allow Authorization headers - Add PyJWT dependency for JWT handling - Deploy to ECS with streamable-http transport - MCP protocol working correctly with proper headers Note: OAuth endpoints need routing configuration to bypass MCP protocol handler
…port - Implement complete OAuth 2.1 authorization server with PKCE support - Add all required OAuth endpoints: - /.well-known/oauth-authorization-server (RFC8414 discovery) - /.well-known/oauth-protected-resource (RFC9728 metadata) - /oauth/authorize (authorization endpoint with PKCE) - /oauth/token (token endpoint with PKCE validation) - /oauth/refresh (refresh token endpoint) - /oauth/jwks (JWKS endpoint) - /oauth/userinfo (user info endpoint) - Add authorization code storage and validation - Implement PKCE code challenge verification - Add proper error handling and HTTP status codes - Deploy to ECS with task definition revision 17 - Server running and healthy in production Frontend can now complete OAuth 2.1 flow with proper token management. Note: OAuth endpoints need ALB routing configuration to bypass MCP protocol handler.
- Add ALB routing rules for OAuth endpoints: - Priority 20: /oauth/* -> MCP server - Priority 21: /.well-known/* -> MCP server - Update OAuth endpoint registration order in build_http_app - Deploy updated server with OAuth endpoint priority fixes - OAuth requests now reach MCP server but return 404 Note: OAuth endpoints still need server-side routing fix to bypass MCP router.
- Enhanced jwt_decoder.py with comprehensive documentation from Quilt's JWT guide - Added validation helpers for bucket count and permission integrity - Improved error messages and debugging with lazy logging - Created comprehensive JWT_AUTHENTICATION.md deployment guide - Added validation warnings for empty permissions and wildcard access - Documented all three bucket compression formats (groups, patterns, compressed) - Aligned implementation with quiltdata/quilt JWT compression patterns Based on: - catalog/app/services/JWTCompressionFormat.md - catalog/app/services/MCP_Server_JWT_Decompression_Guide.md from quiltdata/quilt repository
- Replace broad Exception catches with specific exception types - Add detailed comments explaining each exception type - Fix linter warnings about catching too general exceptions - Improve error handling specificity for base64/JSON operations
- Created comprehensive debugging guide for JWT integration issues - Added test_mcp_endpoints.py script for MCP server diagnostics - Documented common bucket format and token enhancement issues - Provided step-by-step debugging workflow - Added environment checklist for frontend and MCP server
- Created actionable fix guide for frontend team - Documented three main issues with specific code fixes - Added verification steps and testing procedures - Provided diagnostic commands for browser console - Included configuration checklist for both teams
- Confirmed JWT secrets match between frontend and MCP server - Identified root cause as frontend code bugs, not configuration - Provided specific grep commands to find problematic code - Added debug logging snippets for troubleshooting - Created actionable test commands for verification
- Identified routing issue as root cause of 405 errors - Frontend and MCP server both working correctly - Routing layer not configured for /mcp path - Provided ALB, Terraform, CloudFormation, and nginx configs - Added diagnostic commands and verification steps - This is the only blocker to production deployment
BEFORE: - Middleware had duplicate/unreachable code - Line 474 allowed MCP requests through without JWT - Tools failed because no auth context was set - Frontend sent JWT but middleware didn't validate it AFTER: - Removed duplicate authentication code - All non-public paths now REQUIRE JWT - Added logging for successful/failed authentication - Runtime context properly set with JWT claims This fixes the issue where frontend sends JWT but server doesn't validate it.
- Allow MCP requests without Authorization header (protocol initialization) - Still validate JWT when present on subsequent requests - Add comprehensive logging for all MCP requests - Log whether auth header is present for debugging - Set unauthenticated context for init requests - Log successful JWT auth with user/bucket/permission counts This fixes the 'Session initialization failed: 401' error while maintaining security for actual tool calls.
- Log JWT secret (first/last 8 chars only for security) - Log token length and kid during validation - Log payload keys on successful decode - Log detailed error info on validation failure - This will help diagnose secret mismatch issues
- Script to decode and validate JWT tokens locally - Shows header, payload, and expiration info - Tests signature validation with configured secret - Provides clear diagnosis for signature mismatch - Useful for debugging frontend/backend JWT secret sync
- Identified root cause: task definition had wrong secret - Manually updated to revision 77 with correct secret - Validated token locally - confirms it works with new secret - Deployment complete and healthy - Ready for frontend testing
PROBLEM: - JWT validated on every request but auth context not cached - Tools fell back to IAM role instead of using JWT credentials - Inefficient to re-validate JWT on every single request SOLUTION: - Created SessionAuthManager to cache auth by MCP session ID - Validate JWT once per session, reuse cached auth - Cache includes boto3 session built from JWT credentials - Sessions expire after 1 hour or can be invalidated - Comprehensive logging shows cache hits/misses BENEFITS: - More efficient (validate once vs every request) - Tools now use JWT-derived boto3 session - Proper session lifecycle management - Clear logging for debugging
- Added jwt_diagnostics tool module with 3 diagnostic functions - jwt_diagnostics(): comprehensive JWT auth state inspection - validate_jwt_token(): validate and decode JWT tokens - session_diagnostics(): inspect session cache state - Created 12 unit tests for SessionAuthManager (all passing) - Registered diagnostic tools in tool modules - Tools available in frontend DevTools for troubleshooting
- Documented all 3 JWT diagnostic tools - Provided testing workflow for each tool - Added common issues and solutions guide - Included browser console testing commands - CloudWatch log patterns for success/failure - Pre-deployment checklist - Post-deployment validation steps - Success criteria for JWT authentication
- Complete implementation summary with test results - Identified critical issue: deployment script doesn't preserve JWT secret - Provided two solutions: manual verification or SSM Parameter Store - Deployment checklist with pre/post validation steps - Expected behavior and CloudWatch log patterns - Known limitations and future enhancements
PROBLEM: - Deployment script didn't preserve JWT secret - Each deployment reverted to development-enhanced-jwt-secret - Had to manually fix task definition after every deploy SOLUTION: - Store JWT secret in SSM Parameter Store - Deployment script now uses SSM reference instead of inline value - Removes MCP_ENHANCED_JWT_SECRET from environment variables - Adds it to secrets section with SSM ARN - Added ecsTaskExecutionRole permission to read SSM parameters BENEFITS: - JWT secret persists across deployments - Secret rotation possible without code changes - Better security (not in plain text in task definition) - Centralized secret management
- Session-based JWT authentication deployed (rev 79) - JWT secret in SSM Parameter Store (persists across deploys) - Diagnostic tools available in DevTools - 29 tests passing (session auth + JWT decompression) - Comprehensive documentation complete - All success criteria met - Ready for production use
CRITICAL FINDING: - CloudWatch shows ALL MCP requests have no Authorization header - Not just initialization - even tool calls have no auth - Frontend has token but Client.ts isn't sending it ROOT CAUSE: - MCP Client.getRequestHeaders() might not be awaited - Headers might not be passed to fetch calls - Token getter might be returning null PROVIDED: - Verification commands for browser console - Fetch interceptor to trace requests - Checklist of common issues in Client.ts - Expected before/after examples This is the final blocker - once frontend sends the header, all JWT authentication will work end-to-end.
CRITICAL FINDING from browser console: - MCP Server URL: undefined - MCP Client not properly initialized - This explains why requests fail NEED FROM FRONTEND: - Complete configuration check output - Verify mcp.serverUrl is set in config - Verify MCP Client is initialized with endpoint - Verify token getter is configured Provided comprehensive verification command to diagnose the exact configuration issue.
- Added detailed logging in auth_helpers._runtime_jwt_result - Shows whether JWT is found in runtime context - Logs user, bucket count, permission count when JWT found - Warns when falling back to unauthenticated context - Logs traditional auth fallback with reason - Will help trace exactly where JWT is lost
CRITICAL CHANGE: - Bucket tools now REFUSE to fallback to IAM when JWT fails - If JWT auth succeeds but doesn't provide S3 client, return error - If JWT auth fails, return error (don't silently use IAM role) - This forces proper JWT usage instead of hiding issues BENEFITS: - Clear error messages when JWT isn't working - No more silent IAM role fallback - Forces frontend to send Authorization header - Makes JWT issues visible immediately Updated tests to match new behavior.
| buckets.append(suffix) | ||
| else: | ||
| buckets.append(f"{prefix}-{suffix}") | ||
| return buckets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: JWT Decoder Bucket Name Decompression Bug
The _decompress_groups function in jwt_decoder.py incorrectly omits the prefix when decompressing bucket names if the suffix contains a dash. This can lead to malformed bucket names and inconsistent decompression results compared to other parts of the codebase.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements a comprehensive permission scheme for the Quilt MCP Server that provides fine-grained, tool-based access control for AWS operations. The system ensures users receive exactly the permissions they need, nothing more, following the principle of least privilege.
Problem Solved
Users with valid IAM roles (like
ReadWriteQuiltV2-sales-prod) were receiving permission errors when trying to access S3 buckets through MCP tools, specifically missings3:ListBucketpermissions despite having the role.Solution
Key Features
New Components
validate_tool_accessandlist_available_toolsPermission Categories
Implementation Details
Tool-Based Permission Mapping
Role Definitions
New MCP Tools
validate_tool_access(tool_name, bucket_name)- Check specific tool permissionslist_available_tools()- Discover available tools and permissionsget_user_permissions()- Get current authorization levelSecurity Benefits
Documentation
Testing
Deployment
permission-schemetagBreaking Changes
None - this is purely additive functionality that enhances the existing permission system.
Future Enhancements
Ready for review and testing! 🚀
Note
Add JWT-only auth with session caching and HTTP/SSE server, Dockerized build/publish and Terraform ECS deployment, GraphQL via bearer, CORS/middleware, tool refactors to JWT, health endpoint, and comprehensive docs/tests.
services/session_auth.py,bearer_auth_service.py,jwt_decoder.py), new runtime context (runtime_context.py), and Quilt auth middleware with role auto‑assumption + CORS./healthz, proper MCP init handling, and SSE/CORS headers (utils.py).services/graphql_bearer_service.py,tools/graphql.py,buckets.py,packages.py).scripts/docker.py,ecs_deploy.py) and GH workflows; expose FASTMCP_* env; Make targets.deploy/terraform/modules/mcp_server/*).tools/jwt_auth.py,jwt_diagnostics.py); refactor bucket/package tools to use JWT helpers (tools/auth_helpers.py, updates in buckets/package_ops).0.6.13; add requests/pyjwt deps; README/CHANGELOG updates.Written by Cursor Bugbot for commit 502080c. This will update automatically on new commits. Configure here.