Skip to content

Conversation

crivetimihai
Copy link
Member

🏢 EPIC: Complete Multi-Tenancy System Implementation

🚀 Summary

This massive PR transforms MCP Gateway from a single-tenant system into a production-ready enterprise multi-tenant platform with team-based resource scoping, comprehensive authentication, and enterprise SSO integration.

Impact: Complete architectural transformation enabling secure team collaboration, enterprise SSO integration, and scalable multi-tenant deployments.


🎯 Issues Closed

Primary Epic:

Core Security & Authentication:

SSO Integration:

Future Work:


🔥 Major Features Implemented

🔐 Authentication & Authorization System

  • Email-based Authentication with Argon2id password hashing
  • Complete RBAC System with Platform Admin, Team Owner, Team Member roles
  • Enhanced JWT Tokens with team context and scoped permissions
  • Password Policy Engine with configurable security requirements
  • Multi-Provider SSO Framework (GitHub, Google, IBM Security Verify)

👥 Team Management System

  • Personal Teams Auto-Creation - Every user gets a personal team
  • Multi-Team Membership - Users can belong to multiple teams with roles
  • Team Invitation System - Email-based invitations with secure tokens
  • Team Visibility Controls - Private/Public team discovery
  • Team Administration - Complete team lifecycle management

🔒 Resource Scoping & Visibility

  • Three-Tier Resource Visibility System:
    • Private: Owner-only access
    • Team: Team member access
    • Public: Cross-team access
  • Applied to All Resource Types: Tools, Servers, Resources, Prompts, A2A Agents
  • Team-Scoped API Endpoints with proper access validation
  • Cross-Team Resource Discovery for public resources

🏗️ Platform Administration

  • Platform Admin Role separate from team roles
  • Domain-Based Auto-Assignment via SSO (SSO_AUTO_ADMIN_DOMAINS)
  • Enterprise Domain Trust (SSO_TRUSTED_DOMAINS)
  • System-Wide Team Management for administrators

🗄️ Database & Infrastructure

  • Complete Multi-Tenant Database Schema with proper indexing
  • Team-Based Query Filtering for performance optimization
  • Automated Migration Strategy from single-tenant to multi-tenant
  • All APIs Redesigned to be team-aware

📐 System Architecture

This implementation introduces a comprehensive multi-tenant architecture:

graph TB
    subgraph "Authentication Layer"
        Email[Email Authentication<br/>Argon2id Hashing]
        SSO[Multi-Provider SSO<br/>GitHub, Google, IBM Verify]
        JWT[Enhanced JWT Tokens<br/>Team Context + Scoped Access]
    end

    subgraph "Team Management"
        PersonalTeams[Personal Teams<br/>Auto-Creation]
        MultiTeam[Multi-Team Membership<br/>Owner/Member Roles]
        Invitations[Email Invitations<br/>Token-Based Workflow]
    end

    subgraph "Resource Scoping"
        ResourceVis[Three-Tier Visibility<br/>Private/Team/Public]
        TeamAPI[Team-Scoped APIs<br/>All Resource Types]
        CrossTeam[Cross-Team Access<br/>Public Resources]
    end

    subgraph "Platform Administration"
        PlatformAdmin[Platform Admin Role<br/>System-Wide Access]
        DomainMap[Domain-Based Assignment<br/>Auto-Admin via SSO]
    end

    Email --> JWT
    SSO --> JWT
    JWT --> MultiTeam
    PersonalTeams --> MultiTeam
    MultiTeam --> ResourceVis
    ResourceVis --> TeamAPI
    PlatformAdmin --> DomainMap
Loading

🗄️ Database Schema Changes

New Multi-Tenant Tables:

-- User management
CREATE TABLE email_users (
    email VARCHAR(255) PRIMARY KEY,
    password_hash VARCHAR(255) NOT NULL,
    full_name VARCHAR(255),
    is_admin BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Team management
CREATE TABLE email_teams (
    id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    visibility VARCHAR(20) DEFAULT 'private',
    owner_email VARCHAR(255) REFERENCES email_users(email),
    is_personal BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Team membership
CREATE TABLE email_team_members (
    id UUID PRIMARY KEY,
    team_id UUID REFERENCES email_teams(id),
    user_email VARCHAR(255) REFERENCES email_users(email),
    role VARCHAR(50) DEFAULT 'member',
    joined_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    UNIQUE(team_id, user_email)
);

-- Team invitations
CREATE TABLE email_team_invitations (
    id UUID PRIMARY KEY,
    team_id UUID REFERENCES email_teams(id),
    invited_email VARCHAR(255),
    invited_by_email VARCHAR(255) REFERENCES email_users(email),
    token VARCHAR(255) UNIQUE NOT NULL,
    role VARCHAR(50) DEFAULT 'member',
    expires_at TIMESTAMP WITH TIME ZONE,
    status VARCHAR(20) DEFAULT 'pending',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Extended Resource Tables:

All existing resource tables (tool, server, resource, prompt, a2a_agent) extended with:

ALTER TABLE [resource_table] ADD COLUMN team_id UUID REFERENCES email_teams(id);
ALTER TABLE [resource_table] ADD COLUMN owner_email VARCHAR(255) REFERENCES email_users(email);
ALTER TABLE [resource_table] ADD COLUMN visibility VARCHAR(20) DEFAULT 'private';

🔧 Configuration Changes

New Environment Variables:

Core Multi-Tenancy:

# Team Management
AUTO_CREATE_PERSONAL_TEAMS=true
PERSONAL_TEAM_PREFIX=personal
MAX_TEAMS_PER_USER=50
MAX_MEMBERS_PER_TEAM=100

# Invitations
INVITATION_EXPIRY_DAYS=7
REQUIRE_EMAIL_VERIFICATION_FOR_INVITES=true

# Platform Administration
[email protected]
PLATFORM_ADMIN_PASSWORD=changeme
PLATFORM_ADMIN_FULL_NAME="Platform Administrator"

Authentication:

# Email Authentication
EMAIL_AUTH_ENABLED=true
ARGON2ID_TIME_COST=3
ARGON2ID_MEMORY_COST=65536
ARGON2ID_PARALLELISM=1

# Password Policies
PASSWORD_MIN_LENGTH=8
PASSWORD_REQUIRE_UPPERCASE=false
PASSWORD_REQUIRE_NUMBERS=false
PASSWORD_REQUIRE_SPECIAL=false

SSO Integration:

# Multi-Provider SSO
SSO_ENABLED=true
SSO_TRUSTED_DOMAINS=["company.com","trusted-partner.com"]
SSO_AUTO_ADMIN_DOMAINS=["company.com"]
SSO_AUTO_CREATE_USERS=true
SSO_PRESERVE_ADMIN_AUTH=true

# GitHub SSO
SSO_GITHUB_ENABLED=true
SSO_GITHUB_CLIENT_ID=your-github-client-id
SSO_GITHUB_CLIENT_SECRET=your-github-client-secret

# Google SSO  
SSO_GOOGLE_ENABLED=true
SSO_GOOGLE_CLIENT_ID=your-google-client-id
SSO_GOOGLE_CLIENT_SECRET=your-google-client-secret

# IBM Security Verify SSO
SSO_IBM_VERIFY_ENABLED=true
SSO_IBM_VERIFY_CLIENT_ID=your-ibm-client-id
SSO_IBM_VERIFY_CLIENT_SECRET=your-ibm-client-secret
SSO_IBM_VERIFY_ISSUER=https://tenant.verify.ibm.com/oidc/endpoint/default

🔐 Security Enhancements

Multi-Tenant Security Model:

  • Data Isolation: Team-scoped queries prevent cross-tenant data access
  • Resource Ownership: Every resource has owner_email and team_id validation
  • Visibility Enforcement: Private/Team/Public visibility strictly enforced
  • Access Logging: Authentication and authorization events tracked
  • Secure Tokens: Invitation tokens with expiration and single-use validation

Enterprise Security Controls:

  • Domain Restrictions: Corporate domain enforcement via SSO_TRUSTED_DOMAINS
  • MFA Support: Automatic enforcement of SSO provider MFA policies
  • Conditional Access: Location/device-based access controls via SSO providers
  • Password Security: Argon2id hashing with configurable parameters
  • JWT Enhancements: Team context and scoped permissions in tokens

🚀 API Changes

New Authentication Endpoints:

  • POST /auth/email/register - Email user registration
  • POST /auth/email/login - Email user login
  • GET /auth/sso/providers - List available SSO providers
  • GET /auth/sso/login/{provider} - Initiate SSO login
  • POST /auth/sso/callback/{provider} - Handle SSO callback

New Team Management Endpoints:

  • GET /teams - List user's teams
  • POST /teams - Create new team
  • GET /teams/{team_id} - Get team details
  • PUT /teams/{team_id} - Update team
  • DELETE /teams/{team_id} - Delete team (non-personal only)
  • POST /teams/{team_id}/invitations - Invite user to team
  • GET /teams/{team_id}/members - List team members
  • DELETE /teams/{team_id}/members/{user_email} - Remove team member

Enhanced Resource Endpoints:

All resource endpoints (tools, servers, resources, prompts, a2a agents) now support:

  • ?team_id=uuid - Filter by team
  • ?visibility=private|team|public - Filter by visibility
  • team_id, owner_email, visibility fields in request/response bodies

📚 Documentation Added

Complete Documentation Suite:

  • Architecture Documentation: docs/docs/architecture/multitenancy.md (934 lines)
  • SSO Integration Tutorials:
    • docs/docs/manage/sso-ibm-tutorial.md - IBM Security Verify setup
    • docs/docs/manage/sso-github-tutorial.md - GitHub SSO setup
    • docs/docs/manage/sso-google-tutorial.md - Google SSO setup
  • Configuration Reference: Complete environment variable documentation
  • Migration Guide: Single-tenant to multi-tenant upgrade path
  • API Reference: Team-scoped endpoint documentation

Enterprise Deployment Guides:

  • Production Checklist: Security requirements and best practices
  • Troubleshooting Guide: Common multi-tenant scenarios and solutions
  • Performance Tuning: Team-based indexing and query optimization

🧪 Testing

Test Coverage:

  • Unit Tests: All new authentication and team management code
  • Integration Tests: Team-scoped API endpoint validation
  • Security Tests: RBAC permission enforcement
  • Migration Tests: Database schema migration validation
  • SSO Tests: Multi-provider authentication flows

Test Categories:

  • Authentication system tests (email + SSO)
  • Team management workflow tests
  • Resource visibility and access control tests
  • Database migration and rollback tests
  • API endpoint authorization tests
  • JWT token validation tests

⚡ Performance Optimizations

Database Optimizations:

  • Team-Based Indexing: Optimized queries for team-scoped resources
  • Query Performance: Efficient team membership lookups
  • Connection Pooling: Optimized for multi-tenant workloads
  • Index Strategy: Strategic indexing on team_id, owner_email, visibility

API Performance:

  • Team Context Caching: JWT team memberships cached for performance
  • Resource Filtering: Efficient team-scoped resource queries
  • Bulk Operations: Optimized multi-resource operations

🔄 Migration Strategy

Backward Compatibility:

  • Feature Flags: Multi-tenancy can be enabled/disabled via configuration
  • API Compatibility: Existing API endpoints remain functional
  • Data Migration: Automated migration of existing data to multi-tenant schema
  • Rollback Support: Database migrations can be rolled back if needed

Upgrade Path:

  1. Database Migration: Automated Alembic migrations add multi-tenant schema
  2. Configuration Update: Add multi-tenancy environment variables
  3. Feature Enablement: Enable multi-tenancy features via configuration
  4. User Migration: Existing users automatically get personal teams
  5. SSO Integration: Configure SSO providers as needed

🏆 Business Impact

Enterprise Readiness:

  • Secure Multi-Tenancy: Enterprise customers can deploy with confidence
  • SSO Integration: Seamless integration with existing identity infrastructure
  • Team Collaboration: Enable collaborative workflows within organizations
  • Resource Governance: Proper resource scoping and access controls
  • Compliance Ready: Audit logging and security controls for enterprise requirements

Scalability Improvements:

  • Performance Optimization: Team-based indexing and query filtering
  • Resource Isolation: Proper data separation for compliance requirements
  • Admin Efficiency: Platform-level management for enterprise deployments
  • Multi-Tenant Architecture: Foundation for supporting thousands of teams/users

🎯 Breaking Changes

Database Schema:

  • New tables: email_users, email_teams, email_team_members, email_team_invitations
  • Extended tables: All resource tables with team_id, owner_email, visibility columns

API Changes:

  • New authentication endpoints for email and SSO
  • New team management endpoints
  • Enhanced resource endpoints with team-scoping parameters

Configuration:

  • New required environment variables for multi-tenancy features
  • SSO provider configuration variables

Note: All changes are backward compatible when multi-tenancy features are disabled.


🚦 Deployment Checklist

Pre-Deployment:

  • Database backup completed
  • Environment variables configured
  • SSO provider applications configured (if using SSO)
  • Admin user credentials prepared

Deployment:

  • Run database migrations: make alembic-upgrade
  • Update environment configuration
  • Restart gateway services
  • Verify multi-tenancy features enabled
  • Test authentication flows

Post-Deployment:

  • Create platform admin user
  • Test team creation and management
  • Verify resource scoping works correctly
  • Test SSO integration (if enabled)
  • Monitor performance and logs

🎉 Summary

This PR represents a complete architectural transformation of MCP Gateway into a production-ready enterprise multi-tenant platform. The implementation includes:

  • 9 major GitHub issues closed (plus foundation for 1 future issue)
  • Complete authentication system with email + multi-provider SSO
  • Comprehensive team management with roles and permissions
  • Three-tier resource scoping for all resource types
  • Enterprise security controls with domain restrictions and MFA support
  • Production-grade documentation with deployment guides
  • Performance-optimized database schema with proper indexing
  • Backward-compatible migration path for existing deployments

Result: MCP Gateway now supports multi-tenancy, team collaboration, and SSO integration.

Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
@crivetimihai crivetimihai self-assigned this Sep 1, 2025
@crivetimihai crivetimihai added security Improves security enhancement New feature or request labels Sep 1, 2025
@crivetimihai crivetimihai added this to the Release 0.7.0 milestone Sep 1, 2025
Copy link
Collaborator

@MohanLaksh MohanLaksh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary:

make serve

  1. Login with [email protected] and changeme
  2. Successfully able to add streamable http transport server ()
  3. Successfully tested the get pr details tool
image
  1. Successfully able to add mcp-container-runtime on sse transport

  2. Successfully able to test a tool get current time

  3. Able to see metrics for the executed tools

  4. Able to export metrics as a csv file

  5. Able to create a team

9. Not able to exit add members when clicking on cancel below.

image

PR Test Summary:

1. make test - 1 test failing

FAILED tests/unit/mcpgateway/test_main_extended.py::TestApplicationStartupPaths::test_startup_without_plugin_manager - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: global_config
=== 1 failed, 2652 passed, 24 skipped, 1069 warnings in 233.88s (0:03:53) ===

  1. make autoflake isort black flake8 - PASS - no errors

3. make pylint - FAIL - Your code has been rated at 9.71/10

🐛 pylint mcpgateway mcp-servers/python...
************* Module mcpgateway.db
mcpgateway/db.py:1698:13: E1136: Value 'Mapped' is unsubscriptable (unsubscriptable-object)

Your code has been rated at 9.71/10 (previous run: 10.00/10, -0.29)

make: *** [Makefile:723: pylint] Error 2

  1. make smoketest - PASS
    ✅ Smoketest passed!

  2. make doctest - all pass
    616 passed, 7 skipped, 69 warnings in 22.10s

Signed-off-by: Mihai Criveti <[email protected]>
@crivetimihai
Copy link
Member Author

mcpgateway/db.py:2358:16: E1136: Value 'Mapped' is unsubscriptable (unsubscriptable-object)

The error is occurring because pylint doesn't recognize Mapped[Type] syntax as valid. This is a common issue with SQLAlchemy 2.0's new typing system. The code is actually correct - Mapped is designed to be subscriptable with type parameters. This requires pylint-pydantic.

Did you run make venv install install-dev to refresh the venv before and activated the env . ~/.venv/mcpgateway/bin/activate

Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Copy link
Member

@araujof araujof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive and very comprehensive PR! The design looks good.
Because the PR is huge, I decided to look for specific things to check in the code.

From code inspection, the token services catalog handles JTI revocation for invalid or expired tokens; on token revocation, it immediately invalidates the JWT token.

In .env.example, SSO_AUTO_CREATE_USERS=true while SSO_ENABLED=false could be confusing. Should we thrown an error or warning during startup if SSO is disabled but auto-create is enabled?

Maybe, for a feature this big, should we have a rollback plan (including db schema) and a flag like MULTI_TENANCY_ENABLED to allow users to disable, or have this disabled by default?

Overall this is an impressive addition!

  • make test

@imolloy
Copy link
Collaborator

imolloy commented Sep 3, 2025

Similar comments as above, but I tried to experiment with the system to probe some corner cases I was interested in checking. Reviewing the design looks pretty comprehensive, but after experimenting a little I noticed a few small things.

  • Metrics: on first log in the Metrics panel provides statistics of tools that other users have run, which is a possible side channel
  • System Logs: The logs are for the server, which gives information about actions other users have performed, including some failures
  • User Management: I tried to create another user (using my personal email) to test multi-tennancy, but it failed. The message was cryptic Failed to create user. Please check your input. but the inspector provided more details {"detail":"Insufficient permissions. Required: admin.user_management"}. It would be useful to known what permissions I have.
  • Teams: Tried to add users to teams but only see this: Private teams require invitations to add new members. Use the team invitation system instead.. Unsure where to do this or if this feature has been implemented.

Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
@crivetimihai crivetimihai marked this pull request as ready for review September 4, 2025 09:57
Signed-off-by: Mihai Criveti <[email protected]>
@kanapuli
Copy link

kanapuli commented Sep 4, 2025

Hey @crivetimihai I had a chance to test the changes in this branch locally, and it's looking great – really solid work!

As I was going through it, a couple of questions came to mind regarding the architecture, and I was hoping to get your insights:

  1. I noticed that tools, resources, and prompts from a federated gateway/MCP server appear to be publicly accessible and not directly configurable. Could you help me understand the design considerations behind this approach? I'm keen to learn more about the decision-making process here.

  2. With the new multi-tenancy support, I found that I couldn't register MCP servers as gateways using the same URL across different teams. I'm curious if the unique URL constraint for gateways is still a strict requirement in this new context, or if there's a way we might handle this differently.

I'm keen to align my understanding with the intended design.

Signed-off-by: Mihai Criveti <[email protected]>
Containerfile Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we did in our SysFlow open source project is use a manifest file to set versions for things in the container. This helped a lot in the CI/CD as we could quickly update versions for new builds. Here's an example: https://github.com/sysflow-telemetry/sf-collector/blob/master/makefile.manifest.inc

Might be useful here cc: @araujof

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment on linting. I've noticed this a bit with other PRs that a bunch of the file changes are linting related from stuff checked in in previous PRs. Are there ways that we might be able to stabilize the linting so that it has to be done during PR check in and remains consistent? I think this might help with the readability of the PRs as we wouldn't get lint changes from code checked in from previous PRs. Just a thought.

Signed-off-by: Mihai Criveti <[email protected]>
Signed-off-by: Mihai Criveti <[email protected]>
@@ -173,6 +184,66 @@
message_ttl=settings.message_ttl,
)


# Helper function for authentication compatibility
def get_user_email(user):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be similar to the get_user_email in the admin.py script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request security Improves security
Projects
None yet
6 participants