AI Analyst System - Semantic Layer Integration

Project Overview

Goal: Build a semantic layer-powered AI analyst that connects to Claude Desktop and ChatGPT Desktop via MCP (Model Context Protocol), enabling natural language data analysis with statistical rigor and incremental exploration.

Status: v1.0 Production Ready | 100% Test Pass Rate | 23 MCP Tools

Architecture

┌─────────────────────────────────────────────────────────┐
│        Claude Desktop / ChatGPT Desktop                 │
└────────────────────┬────────────────────────────────────┘
                     │ MCP Protocol (23 Tools)
┌────────────────────▼────────────────────────────────────┐
│              FastMCP Server                             │
│  ┌─────────────────────────────────────────────────┐   │
│  │         Multi-Query Workflow Engine             │   │
│  │  • Dependency Resolution & Parallel Execution   │   │
│  │  • 3 Built-in Analytical Workflows              │   │
│  └─────────────────┬───────────────────────────────┘   │
│                    │                                    │
│  ┌─────────────────▼───────────────────────────────┐   │
│  │         Query Optimization Engine               │   │
│  │  • Intelligent Caching (95% hit rate)          │   │
│  │  • Performance Learning                        │   │
│  └─────────────────┬───────────────────────────────┘   │
│                    │                                    │
│  ┌─────────────────▼───────────────────────────────┐   │
│  │         Conversation Memory                     │   │
│  │  • 24-hour Context Window                      │   │
│  │  • Pattern Recognition                         │   │
│  └─────────────────┬───────────────────────────────┘   │
│                    │                                    │
│  ┌─────────────────▼───────────────────────────────┐   │
│  │         Intelligence Layer                      │   │
│  │  • Statistical Testing                         │   │
│  │  • Natural Language Generation                 │   │
│  └─────────────────┬───────────────────────────────┘   │
│                    │                                    │
│  ┌─────────────────▼───────────────────────────────┐   │
│  │         Semantic Layer (Ibis + DuckDB)         │   │
│  │  • Product Analytics Models                    │   │
│  │  • Business Logic & Metrics                    │   │
│  └─────────────────┬───────────────────────────────┘   │
└────────────────────┼────────────────────────────────────┘
                     │
               ┌─────▼──────┐
               │   DuckDB   │
               └────────────┘

Core Principles

1. Execution-First (Prevent Fabrication)

Pattern: Build → Execute → Annotate

# NEVER generate observations without running queries
query = builder.generate_query(question)
result = executor.run(query)  # MUST execute first
interpretation = interpret(result)  # Based on REAL data

2. Incremental Exploration

Pattern: One Question Per Turn

# Start simple
"How many customers do we have?"
→ SELECT COUNT(*) FROM customers

# Build complexity based on results
"What's the breakdown by industry?"
→ SELECT industry, COUNT(*) FROM customers GROUP BY industry

3. Statistical Rigor by Default

Auto-run tests when comparing groups:

Chi-square or t-test for group comparisons
Confidence intervals for correlations
Sample size validation for claims

4. Natural Language Communication

Concise observations:

"Tech customers 2x higher LTV" ✅
NOT: "Upon analyzing the data, Technology segment customers demonstrate..." ❌

Quick Start

Environment Setup

# Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/sbdk-dev/claude-analyst.git
cd claude-analyst/semantic-layer
uv sync

# Test the system
uv run python test_all_functionality.py

Claude Desktop Integration

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "ai-analyst": {
      "command": "/opt/homebrew/bin/uv",
      "args": ["run", "python", "run_mcp_server.py"],
      "cwd": "/path/to/claude-analyst/semantic-layer"
    }
  }
}

Note: Use full path to uv (find with which uv). GUI apps don't inherit shell PATH.

ChatGPT Desktop Integration

# 1. Set OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"

# 2. Start the OpenAI API server
cd semantic-layer
uv run python run_openai_server.py
# Server starts on http://localhost:8000

# 3. Configure ChatGPT Desktop
# Settings → Beta Features → Actions → Add Custom Action
# URL: http://localhost:8000

File Structure

claude-analyst/
├── CLAUDE.md                    # This file
├── README.md                    # Quick start and overview
├── QUICK_START.md               # Detailed setup guide
├── LICENSE                      # MIT License
├── semantic-layer/              # Main implementation
│   ├── models/                  # Semantic models (YAML)
│   │   ├── users.yml
│   │   ├── events.yml
│   │   └── engagement.yml
│   ├── mcp_server/              # FastMCP server
│   │   ├── server.py
│   │   ├── tools.py
│   │   └── intelligence_layer.py
│   ├── data/                    # DuckDB database
│   │   └── analytics.duckdb
│   ├── run_mcp_server.py        # Server entry point
│   ├── run_openai_server.py     # ChatGPT integration
│   └── test_all_functionality.py
└── scripts/
    └── setup.sh

MCP Tools

Core Tools

Tool	Description
`list_models`	List available semantic models
`query_model`	Execute queries against semantic layer
`suggest_analysis`	Get context-aware next questions
`test_significance`	Run statistical tests

Workflow Tools

Tool	Description
`run_workflow`	Execute multi-step analytical workflows
`list_workflows`	List available workflow templates
`get_workflow_status`	Check workflow execution status

Memory Tools

Tool	Description
`get_context`	Retrieve conversation context
`get_suggestions`	Get analysis suggestions based on history

Optimization Tools

Tool	Description
`get_cache_stats`	View query cache statistics
`optimize_query`	Get query optimization recommendations

Semantic Model Design

Following Rasmus Engelbrecht's patterns:

Start with the business question, not the schema
Metrics should be self-contained and reusable
Dimensions define how metrics can be sliced
Keep models simple and composable

Example Model

model:
  name: users
  description: "User metrics and dimensions"

dimensions:
  - name: user_id
    type: string
    primary_key: true
  - name: plan_type
    type: string
    description: "Subscription plan"
  - name: industry
    type: string

measures:
  - name: total_users
    type: count_distinct
    field: user_id
  - name: conversion_rate
    type: ratio
    numerator: converted_users
    denominator: total_users

Performance Metrics

23 MCP Tools: All functional and tested
Test Pass Rate: 100% (7/7 tests)
Cache Hit Rate: 95% for repeated queries
Query Response Time: <100ms for cached queries
Workflow Templates: 3 (conversion, feature, revenue)

Technology Stack

MCP Server: FastMCP
Semantic Layer: Boring Semantic Layer
Query Engine: Ibis
Database: DuckDB
Statistical Testing: scipy

References

License

MIT License - See LICENSE for details.

Version: 1.0 Last Updated: November 2025 Author: Matt Strautmann

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Analyst System - Semantic Layer Integration

Project Overview

Architecture

Core Principles

1. Execution-First (Prevent Fabrication)

2. Incremental Exploration

3. Statistical Rigor by Default

4. Natural Language Communication

Quick Start

Environment Setup

Claude Desktop Integration

ChatGPT Desktop Integration

File Structure

MCP Tools

Core Tools

Workflow Tools

Memory Tools

Optimization Tools

Semantic Model Design

Example Model

Performance Metrics

Technology Stack

References

License

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

AI Analyst System - Semantic Layer Integration

Project Overview

Architecture

Core Principles

1. Execution-First (Prevent Fabrication)

2. Incremental Exploration

3. Statistical Rigor by Default

4. Natural Language Communication

Quick Start

Environment Setup

Claude Desktop Integration

ChatGPT Desktop Integration

File Structure

MCP Tools

Core Tools

Workflow Tools

Memory Tools

Optimization Tools

Semantic Model Design

Example Model

Performance Metrics

Technology Stack

References

License