A high-performance, production-ready Model Context Protocol (MCP) server that enables AI assistants to seamlessly interact with cancer genomics data from cBioPortal. Built with modern async Python architecture, enterprise-grade modular design, and BaseEndpoint pattern for maximum reliability, maintainability, and 4.5x faster performance.
- โก 4.5x Performance Boost: Full async implementation with concurrent API operations
- ๐๏ธ Enterprise Architecture: BaseEndpoint pattern with 60% code duplication elimination
- ๐ Modular Design: Professional structure with 71% code reduction (1,357 โ 396 lines)
- ๐ฆ Modern Package Management: uv-based workflow with pyproject.toml
- ๐ Concurrent Operations: Bulk fetching of studies and genes with automatic batching
- โ๏ธ Multi-layer Configuration: CLI args โ Environment variables โ YAML config โ Defaults
- ๐ Comprehensive Testing: 93 tests across 8 organized test suites with full coverage
- ๐ก๏ธ Input Validation: Robust parameter validation and error handling
- ๐ Pagination Support: Efficient data retrieval with automatic pagination
- ๐ง Code Quality: Ruff linting, formatting, and comprehensive code quality checks
- โก Configurable Performance: Adjustable batch sizes and performance tuning
- ๐ Study Management: Browse, search, and analyze cancer studies
- ๐งช Molecular Data: Access mutations, clinical data, and molecular profiles
- ๐ Bulk Operations: Concurrent fetching of multiple entities
- ๐ Advanced Search: Keyword-based discovery across studies and genes
- ๐๏ธ BaseEndpoint Architecture: Eliminated ~60% code duplication through inheritance-based design
- ๐ Code Quality Excellence: Comprehensive external review integration with modern linting (Ruff)
- โ๏ธ Enhanced Configurability: Gene batch sizes, retry logic, and performance tuning now configurable
- ๐ก๏ธ Robust Validation: Decorator-based parameter validation and error handling
- ๐งช Testing Maturity: 93 comprehensive tests with zero regressions through major refactoring
- โ External Code Review: Professional code quality validation and improvements implemented
- ๐ง Modern Python Practices: Type checking, linting, formatting, and best practice adherence
- ๐๏ธ Enterprise Architecture: Modular design with clear separation of concerns
- ๐ Performance Optimized: 4.5x async improvements with configurable batch processing
This project demonstrates cutting-edge human-AI collaboration in bioinformatics software development:
- ๐ง Domain Expertise: 20+ years cancer research experience guided architecture and feature requirements
- ๐ค AI Implementation: Advanced code generation, API design, and performance optimization through systematic LLM collaboration
- ๐ Quality Assurance: Iterative refinement ensuring professional standards and production reliability
- ๐๏ธ Architectural Evolution: BaseEndpoint pattern and 60% code duplication elimination through AI-guided refactoring
- ๐ Innovation Approach: Showcases how domain experts can effectively leverage AI tools to build enterprise-grade bioinformatics platforms
Recent Achievements: External code review integration with comprehensive quality improvements including Ruff configuration, configurable performance settings, and modern Python best practices.
Methodology: This collaborative approach combines deep biological domain knowledge with AI-powered development capabilities, accelerating innovation while maintaining rigorous code quality and scientific accuracy.
- Python 3.10+ ๐
- uv (modern package manager) - recommended ๐ฆ
- Git (optional, for cloning)
# Install uv if needed
pipx install uv
# Clone and setup
git clone https://github.com/yourusername/cbioportal-mcp.git
cd cbioportal-mcp
uv sync
# Launch server
uv run cbioportal-mcp
That's it! ๐ Your server is running and ready for AI assistant connections.
Modern, lightning-fast package management with automatic environment handling:
# Install uv
pipx install uv
# Or with Homebrew: brew install uv
# Clone repository
git clone https://github.com/yourusername/cbioportal-mcp.git
cd cbioportal-mcp
# One-command setup (creates venv + installs dependencies)
uv sync
Standard Python package management approach:
# Create virtual environment
python -m venv cbioportal-mcp-env
# Activate environment
# Windows: cbioportal-mcp-env\Scripts\activate
# macOS/Linux: source cbioportal-mcp-env/bin/activate
# Install dependencies
pip install -e .
The server supports flexible configuration with priority: CLI args > Environment variables > Config file > Defaults
Create config.yaml
for persistent settings:
# cBioPortal MCP Server Configuration
server:
base_url: "https://www.cbioportal.org/api"
transport: "stdio"
client_timeout: 480.0
logging:
level: "INFO"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
api:
rate_limit:
enabled: false
requests_per_second: 10
retry:
enabled: true
max_attempts: 3
backoff_factor: 1.0
cache:
enabled: false
ttl_seconds: 300
batch_size:
genes: 100 # Configurable gene batch size for concurrent operations
export CBIOPORTAL_BASE_URL="https://custom-instance.org/api"
export CBIOPORTAL_LOG_LEVEL="DEBUG"
export CBIOPORTAL_CLIENT_TIMEOUT=600
export CBIOPORTAL_GENE_BATCH_SIZE=50 # Configure gene batch size
export CBIOPORTAL_RETRY_MAX_ATTEMPTS=5
# Basic usage
uv run cbioportal-mcp
# Custom configuration
uv run cbioportal-mcp --config config.yaml --log-level DEBUG
# Custom API endpoint
uv run cbioportal-mcp --base-url https://custom-instance.org/api
# Generate example config
uv run cbioportal-mcp --create-example-config
Configure in your Claude Desktop MCP settings:
Option 1: Direct Script Path (Recommended)
{
"mcpServers": {
"cbioportal": {
"command": "/path/to/your/project/cbioportal_MCP/.venv/bin/cbioportal-mcp",
"env": {
"CBIOPORTAL_LOG_LEVEL": "INFO"
}
}
}
}
Option 2: uv run (Alternative)
{
"mcpServers": {
"cbioportal": {
"command": "uv",
"args": ["run", "cbioportal-mcp"],
"cwd": "/path/to/your/project/cbioportal_MCP",
"env": {
"CBIOPORTAL_LOG_LEVEL": "INFO"
}
}
}
}
Important Setup Steps:
- Replace
/path/to/your/project/cbioportal_MCP
with your actual project path - Ensure the project is installed in editable mode:
uv pip install -e .
- Restart Claude Desktop after updating the configuration
Add to your workspace settings:
{
"mcp.servers": {
"cbioportal": {
"command": "uv",
"args": ["run", "cbioportal-mcp"],
"cwd": "/path/to/cbioportal-mcp"
}
}
}
# Development server with debug logging
uv run cbioportal-mcp --log-level DEBUG
# Production server with custom config
uv run cbioportal-mcp --config production.yaml
# Using custom cBioPortal instance
uv run cbioportal-mcp --base-url https://private-instance.org/api
cbioportal-mcp/
โโโ ๐ cbioportal_mcp/ # Main package directory
โ โโโ ๐ server.py # Main MCP server implementation
โ โโโ ๐ api_client.py # Dedicated HTTP client class
โ โโโ โ๏ธ config.py # Multi-layer configuration system
โ โโโ ๐ constants.py # Centralized constants
โ โโโ ๐ endpoints/ # Domain-specific API modules
โ โ โโโ ๐๏ธ base.py # BaseEndpoint pattern (60% duplication reduction)
โ โ โโโ ๐ฌ studies.py # Cancer studies & search
โ โ โโโ ๐งฌ genes.py # Gene operations & mutations
โ โ โโโ ๐งช samples.py # Sample data management
โ โ โโโ ๐ molecular_profiles.py # Molecular & clinical data
โ โโโ ๐ utils/ # Shared utilities
โ โโโ ๐ pagination.py # Efficient pagination logic
โ โโโ โ
validation.py # Input validation
โ โโโ ๐ logging.py # Logging configuration
โโโ ๐ tests/ # Comprehensive test suite (93 tests)
โโโ ๐ docs/ # Documentation
โโโ ๐ scripts/ # Development utilities
โโโ ๐ pyproject.toml # Modern Python project config
- ๐ง Modular: Clear separation of concerns with domain-specific modules
- โก Async-First: Full asynchronous implementation for maximum performance
- ๐๏ธ BaseEndpoint Pattern: Inheritance-based architecture eliminating 60% code duplication
- ๐ก๏ธ Robust: Comprehensive input validation and error handling with decorators
- ๐งช Testable: 93 tests ensuring reliability and preventing regressions
- ๐ Maintainable: Clean code architecture with 71% reduction in complexity
- ๐ Code Quality: Ruff linting, formatting, and modern Python practices
The server provides 12 high-performance tools for AI assistants:
๐ง Tool | ๐ Description | โก Features |
---|---|---|
get_cancer_studies |
List all available cancer studies | ๐ Pagination, ๐ Filtering |
search_studies |
Search studies by keyword | ๐ Full-text search, ๐ Sorting |
get_study_details |
Detailed study information | ๐ Comprehensive metadata |
get_samples_in_study |
Samples for specific studies | ๐ Paginated results |
get_genes |
Gene information by ID/symbol | ๐ท๏ธ Flexible identifiers |
search_genes |
Search genes by keyword | ๐ Symbol & name search |
get_mutations_in_gene |
Gene mutations in studies | ๐งฌ Mutation details |
get_clinical_data |
Patient clinical information | ๐ฅ Patient-centric data |
get_molecular_profiles |
Study molecular profiles | ๐ Profile metadata |
get_multiple_studies |
๐ Concurrent study fetching | โก Bulk operations |
get_multiple_genes |
๐ Concurrent gene retrieval | ๐ฆ Automatic batching |
get_gene_panels_for_study |
Gene panels in studies | ๐งฌ Panel information |
- โก Concurrent Operations:
get_multiple_*
methods useasyncio.gather
for parallel processing - ๐ฆ Smart Batching: Automatic batching for large gene lists
- ๐ Efficient Pagination: Async generators for memory-efficient data streaming
- โฑ๏ธ Performance Metrics: Execution timing and batch count reporting
Our async implementation delivers significant performance improvements:
๐โโ๏ธ Sequential Study Fetching: 1.31 seconds (10 studies)
โก Concurrent Study Fetching: 0.29 seconds (10 studies)
๐ฏ Performance Improvement: 4.57x faster!
- ๐ 4.5x Faster: Concurrent API requests vs sequential operations
- ๐ฆ Bulk Processing: Efficient batched operations for multiple entities
- โฑ๏ธ Non-blocking: Asynchronous I/O prevents request blocking
- ๐งฎ Smart Batching: Automatic optimization for large datasets
- Use
get_multiple_studies
for fetching multiple studies concurrently - Leverage
get_multiple_genes
with automatic batching for gene lists - Configure
concurrent_batch_size
in config for optimal performance - Monitor execution metrics included in response metadata
# Setup development environment
uv sync
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=.
# Run specific test file
uv run pytest tests/test_server_lifecycle.py
# Update snapshots
uv run pytest --snapshot-update
# Lint code
uv run ruff check .
# Format code
uv run ruff format .
Comprehensive test suite with 93 tests across 8 categories:
- ๐
test_server_lifecycle.py
- Server startup/shutdown & tool registration - ๐
test_pagination.py
- Pagination logic & edge cases - ๐
test_multiple_entity_apis.py
- Concurrent operations & bulk fetching - โ
test_input_validation.py
- Parameter validation & error handling - ๐ธ
test_snapshot_responses.py
- API response consistency (syrupy) - ๐ป
test_cli.py
- Command-line interface & argument parsing - ๐ก๏ธ
test_error_handling.py
- Error scenarios & network issues - โ๏ธ
test_configuration.py
- Configuration system validation
- ๐ฆ uv: Modern package management (10-100x faster than pip)
- ๐งช pytest: Testing framework with async support and 93 comprehensive tests
- ๐ธ syrupy: Snapshot testing for API response consistency
- ๐ Ruff: Lightning-fast linting, formatting, and code quality enforcement
- ๐ pytest-cov: Code coverage reporting and quality metrics
- ๐๏ธ BaseEndpoint: Inheritance pattern eliminating 60% code duplication
- โ๏ธ Type Checking: Comprehensive type annotations for better code safety
- ๐ก๏ธ Validation Decorators: Automatic parameter validation and error handling
- ๐ด Fork the repository
- ๐ฟ Create a feature branch (
git checkout -b feature/amazing-feature
) - โ
Test your changes (
uv run pytest
) - ๐ Commit with clear messages (
git commit -m 'Add amazing feature'
) - ๐ Push to branch (
git push origin feature/amazing-feature
) - ๐ Create a Pull Request
# Check Python version
python --version # Should be 3.10+
# Verify dependencies
uv sync
# Check for conflicts
uv run python -c "import mcp, httpx, fastmcp; print('Dependencies OK')"
- โ Use direct script path (Option 1) for most reliable connection
- โ
Verify paths in MCP configuration are absolute (no
~
or relative paths) - โ
Install in editable mode: Run
uv pip install -e .
in project directory - โ
Ensure the virtual environment
.venv/bin/cbioportal-mcp
script exists - โ
For Option 2: Check that
uv
is in your system PATH andcwd
points to project directory - โ Review Claude Desktop logs for detailed errors
- ๐ง Increase
concurrent_batch_size
in config - ๐ง Adjust
max_concurrent_requests
for your system - ๐ง Use
get_multiple_*
methods for bulk operations - ๐ง Monitor network latency to cBioPortal API
# Generate example config
uv run cbioportal-mcp --create-example-config
# Validate configuration
uv run cbioportal-mcp --config your-config.yaml --log-level DEBUG
# Check environment variables
env | grep CBIOPORTAL
# Test cBioPortal API accessibility
curl https://www.cbioportal.org/api/cancer-types
# Test with custom instance
curl https://your-instance.org/api/studies
"What cancer studies are available for breast cancer research?"
"Search for melanoma studies with genomic data"
"Get mutation data for TP53 in lung cancer studies"
"Find clinical data for patients in the TCGA-BRCA study"
"What molecular profiles are available for pediatric brain tumors?"
"Compare mutation frequencies between two cancer studies"
"Get all genes in the DNA repair pathway for ovarian cancer"
"Find studies with both RNA-seq and mutation data"
"What are the most frequently mutated genes in glioblastoma?"
"Fetch data for multiple cancer studies concurrently"
"Get information for a list of cancer genes efficiently"
"Compare clinical characteristics across multiple studies"
"Retrieve molecular profiles for several cancer types"
This project is licensed under the MIT License - see the LICENSE file for details.
- ๐งฌ cBioPortal - Open-access cancer genomics data platform
- ๐ Model Context Protocol - Enabling seamless AI-tool interactions
- โก FastMCP - High-performance MCP server framework
- ๐ฆ uv - Modern Python package management
- ๐ค AI Collaboration - Demonstrating the power of human-AI partnership in scientific software development
๐ Production-ready bioinformatics platform built through innovative human-AI collaboration! ๐งฌโจ
Demonstrating the power of domain expertise + AI-assisted development for enterprise-grade scientific software.