Goal: Build a semantic layer-powered AI analyst that connects to Claude Desktop and ChatGPT Desktop via MCP (Model Context Protocol), enabling natural language data analysis with statistical rigor and incremental exploration.
Status: v1.0 Production Ready | 100% Test Pass Rate | 23 MCP Tools
┌─────────────────────────────────────────────────────────┐
│ Claude Desktop / ChatGPT Desktop │
└────────────────────┬────────────────────────────────────┘
│ MCP Protocol (23 Tools)
┌────────────────────▼────────────────────────────────────┐
│ FastMCP Server │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Multi-Query Workflow Engine │ │
│ │ • Dependency Resolution & Parallel Execution │ │
│ │ • 3 Built-in Analytical Workflows │ │
│ └─────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────▼───────────────────────────────┐ │
│ │ Query Optimization Engine │ │
│ │ • Intelligent Caching (95% hit rate) │ │
│ │ • Performance Learning │ │
│ └─────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────▼───────────────────────────────┐ │
│ │ Conversation Memory │ │
│ │ • 24-hour Context Window │ │
│ │ • Pattern Recognition │ │
│ └─────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────▼───────────────────────────────┐ │
│ │ Intelligence Layer │ │
│ │ • Statistical Testing │ │
│ │ • Natural Language Generation │ │
│ └─────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────▼───────────────────────────────┐ │
│ │ Semantic Layer (Ibis + DuckDB) │ │
│ │ • Product Analytics Models │ │
│ │ • Business Logic & Metrics │ │
│ └─────────────────┬───────────────────────────────┘ │
└────────────────────┼────────────────────────────────────┘
│
┌─────▼──────┐
│ DuckDB │
└────────────┘
Pattern: Build → Execute → Annotate
# NEVER generate observations without running queries
query = builder.generate_query(question)
result = executor.run(query) # MUST execute first
interpretation = interpret(result) # Based on REAL dataPattern: One Question Per Turn
# Start simple
"How many customers do we have?"
→ SELECT COUNT(*) FROM customers
# Build complexity based on results
"What's the breakdown by industry?"
→ SELECT industry, COUNT(*) FROM customers GROUP BY industryAuto-run tests when comparing groups:
- Chi-square or t-test for group comparisons
- Confidence intervals for correlations
- Sample size validation for claims
Concise observations:
- "Tech customers 2x higher LTV" ✅
- NOT: "Upon analyzing the data, Technology segment customers demonstrate..." ❌
# Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone https://github.com/sbdk-dev/claude-analyst.git
cd claude-analyst/semantic-layer
uv sync
# Test the system
uv run python test_all_functionality.pyAdd to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"ai-analyst": {
"command": "/opt/homebrew/bin/uv",
"args": ["run", "python", "run_mcp_server.py"],
"cwd": "/path/to/claude-analyst/semantic-layer"
}
}
}Note: Use full path to uv (find with which uv). GUI apps don't inherit shell PATH.
# 1. Set OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"
# 2. Start the OpenAI API server
cd semantic-layer
uv run python run_openai_server.py
# Server starts on http://localhost:8000
# 3. Configure ChatGPT Desktop
# Settings → Beta Features → Actions → Add Custom Action
# URL: http://localhost:8000claude-analyst/
├── CLAUDE.md # This file
├── README.md # Quick start and overview
├── QUICK_START.md # Detailed setup guide
├── LICENSE # MIT License
├── semantic-layer/ # Main implementation
│ ├── models/ # Semantic models (YAML)
│ │ ├── users.yml
│ │ ├── events.yml
│ │ └── engagement.yml
│ ├── mcp_server/ # FastMCP server
│ │ ├── server.py
│ │ ├── tools.py
│ │ └── intelligence_layer.py
│ ├── data/ # DuckDB database
│ │ └── analytics.duckdb
│ ├── run_mcp_server.py # Server entry point
│ ├── run_openai_server.py # ChatGPT integration
│ └── test_all_functionality.py
└── scripts/
└── setup.sh
| Tool | Description |
|---|---|
list_models |
List available semantic models |
query_model |
Execute queries against semantic layer |
suggest_analysis |
Get context-aware next questions |
test_significance |
Run statistical tests |
| Tool | Description |
|---|---|
run_workflow |
Execute multi-step analytical workflows |
list_workflows |
List available workflow templates |
get_workflow_status |
Check workflow execution status |
| Tool | Description |
|---|---|
get_context |
Retrieve conversation context |
get_suggestions |
Get analysis suggestions based on history |
| Tool | Description |
|---|---|
get_cache_stats |
View query cache statistics |
optimize_query |
Get query optimization recommendations |
Following Rasmus Engelbrecht's patterns:
- Start with the business question, not the schema
- Metrics should be self-contained and reusable
- Dimensions define how metrics can be sliced
- Keep models simple and composable
model:
name: users
description: "User metrics and dimensions"
dimensions:
- name: user_id
type: string
primary_key: true
- name: plan_type
type: string
description: "Subscription plan"
- name: industry
type: string
measures:
- name: total_users
type: count_distinct
field: user_id
- name: conversion_rate
type: ratio
numerator: converted_users
denominator: total_users- 23 MCP Tools: All functional and tested
- Test Pass Rate: 100% (7/7 tests)
- Cache Hit Rate: 95% for repeated queries
- Query Response Time: <100ms for cached queries
- Workflow Templates: 3 (conversion, feature, revenue)
- MCP Server: FastMCP
- Semantic Layer: Boring Semantic Layer
- Query Engine: Ibis
- Database: DuckDB
- Statistical Testing: scipy
- Rasmus: Practical Guide to Semantic Layers
- Boring Semantic Layer Documentation
- Ibis Documentation
- FastMCP Framework
- Model Context Protocol Spec
MIT License - See LICENSE for details.
Version: 1.0 Last Updated: November 2025 Author: Matt Strautmann