This tutorial covers how to use the MDL (Model Definition Language) module in KAI to create and manage semantic layers for your databases.
- Introduction
- Core Concepts
- Getting Started
- Creating MDL Manifests
- Working with Models
- Defining Relationships
- Business Metrics
- API Reference
- Best Practices
- Integration with Autonomous Agent
The MDL module provides a semantic layer abstraction over your database tables. It allows you to:
- Define business-friendly names and descriptions for tables and columns
- Establish relationships between tables
- Create calculated columns and business metrics
- Export to WrenAI-compatible format for advanced analytics
- Simplify queries: Users can query using business terms instead of technical table/column names
- Enforce consistency: Define metrics once, use everywhere
- Enable self-service: Business users can explore data without SQL knowledge
- Improve AI accuracy: LLMs understand your data model better with semantic context
The top-level container that holds your entire semantic layer definition:
from app.modules.mdl import MDLManifest
manifest = MDLManifest(
catalog="analytics", # Database catalog
schema="public", # Database schema
data_source="postgresql", # Database type
models=[...], # Table definitions
relationships=[...], # Table relationships
metrics=[...], # Business metrics
views=[...], # Saved queries
)Models represent database tables with semantic metadata:
from app.modules.mdl import MDLModel, MDLColumn
users_model = MDLModel(
name="users",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="email", type="VARCHAR"),
MDLColumn(name="created_at", type="TIMESTAMP"),
],
primary_key="id",
properties={"description": "User accounts table"}
)Relationships define how models connect to each other:
from app.modules.mdl import MDLRelationship, JoinType
relationship = MDLRelationship(
name="orders_users",
models=["orders", "users"],
join_type=JoinType.MANY_TO_ONE,
condition="orders.user_id = users.id"
)| Type | Description | Example |
|---|---|---|
ONE_TO_ONE |
Each record in A maps to exactly one in B | user -> user_profile |
ONE_TO_MANY |
One record in A maps to many in B | user -> orders |
MANY_TO_ONE |
Many records in A map to one in B | orders -> user |
MANY_TO_MANY |
Many-to-many relationship | students <-> courses |
- KAI server running with Typesense
- A database connection configured in KAI
- Tables scanned via TableDescription
The MDL module is included in KAI. No additional installation needed.
from app.modules.mdl import (
MDLService,
MDLRepository,
MDLBuilder,
)
from app.data.db.storage import Storage
# Initialize storage and repository
storage = Storage()
repository = MDLRepository(storage=storage)
# Create the service
mdl_service = MDLService(
repository=repository,
table_description_repo=storage, # For auto-generation
)The easiest way to create an MDL manifest is to generate it automatically when scanning your database:
# Scan database and generate MDL in one command
kai-agent scan-all <connection_id> --generate-mdl
# With custom manifest name
kai-agent scan-all <connection_id> -m --mdl-name "Sales Analytics"
# Full workflow: scan with AI descriptions + generate MDL
kai-agent scan-all <connection_id> -d -m --mdl-name "E-Commerce Semantic Layer"This approach:
- Scans all tables to extract schema metadata
- Optionally generates AI descriptions for tables/columns
- Automatically builds the MDL manifest with inferred relationships
- Saves everything to Typesense storage
If you've already scanned your database, generate MDL from existing TableDescriptions:
# Auto-generate MDL from scanned tables
manifest_id = await mdl_service.build_from_database(
db_connection_id="your_connection_id",
name="Sales Analytics",
catalog="analytics",
schema="public",
data_source="postgresql",
infer_relationships=True, # Auto-detect relationships
)
print(f"Created manifest: {manifest_id}")This will:
- Fetch all TableDescriptions for the connection
- Convert them to MDL models
- Infer relationships from foreign keys and column naming conventions
- Save the manifest to Typesense
For more control, create manifests manually:
from app.modules.mdl import (
MDLManifest,
MDLModel,
MDLColumn,
MDLRelationship,
JoinType,
)
# Define models
customers = MDLModel(
name="customers",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="name", type="VARCHAR"),
MDLColumn(name="email", type="VARCHAR"),
MDLColumn(name="tier", type="VARCHAR"),
],
primary_key="id",
properties={"description": "Customer accounts"}
)
orders = MDLModel(
name="orders",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="customer_id", type="INTEGER"),
MDLColumn(name="total_amount", type="DECIMAL"),
MDLColumn(name="status", type="VARCHAR"),
MDLColumn(name="created_at", type="TIMESTAMP"),
],
primary_key="id",
properties={"description": "Customer orders"}
)
# Define relationship
orders_customers = MDLRelationship(
name="orders_customers",
models=["orders", "customers"],
join_type=JoinType.MANY_TO_ONE,
condition="orders.customer_id = customers.id"
)
# Create manifest
manifest_id = await mdl_service.create_manifest(
db_connection_id="your_connection_id",
name="E-Commerce Analytics",
catalog="ecommerce",
schema="public",
data_source="postgresql",
models=[customers, orders],
relationships=[orders_customers],
)# Create manifest via REST API
curl -X POST http://localhost:8015/api/v1/mdl/manifests \
-H "Content-Type: application/json" \
-d '{
"db_connection_id": "your_connection_id",
"name": "Sales Analytics",
"catalog": "analytics",
"schema": "public",
"data_source": "postgresql"
}'
# Build from database
curl -X POST http://localhost:8015/api/v1/mdl/manifests/build \
-H "Content-Type: application/json" \
-d '{
"db_connection_id": "your_connection_id",
"name": "Auto-Generated MDL",
"catalog": "analytics",
"schema": "public",
"infer_relationships": true
}'Calculated columns are derived from expressions:
orders_model = MDLModel(
name="orders",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="quantity", type="INTEGER"),
MDLColumn(name="unit_price", type="DECIMAL"),
# Calculated column
MDLColumn(
name="total_price",
type="DECIMAL",
is_calculated=True,
expression="quantity * unit_price",
properties={"description": "Total line item price"}
),
],
primary_key="id",
)# Via service
new_model = MDLModel(
name="products",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="name", type="VARCHAR"),
MDLColumn(name="price", type="DECIMAL"),
],
primary_key="id",
)
await mdl_service.add_model(manifest_id, new_model)# Via API
curl -X POST http://localhost:8015/api/v1/mdl/manifests/{manifest_id}/models \
-H "Content-Type: application/json" \
-d '{
"name": "products",
"columns": [
{"name": "id", "type": "INTEGER"},
{"name": "name", "type": "VARCHAR"},
{"name": "price", "type": "DECIMAL"}
],
"primary_key": "id"
}'await mdl_service.remove_model(manifest_id, "products")curl -X DELETE http://localhost:8015/api/v1/mdl/manifests/{manifest_id}/models/products# One customer has many orders
customer_orders = MDLRelationship(
name="customer_orders",
models=["customers", "orders"],
join_type=JoinType.ONE_TO_MANY,
condition="customers.id = orders.customer_id"
)# Many orders belong to one customer
orders_customer = MDLRelationship(
name="orders_customer",
models=["orders", "customers"],
join_type=JoinType.MANY_TO_ONE,
condition="orders.customer_id = customers.id"
)The MDL Builder can automatically infer relationships based on:
- Foreign key constraints from TableDescriptions
- Column naming conventions (e.g.,
customer_id->customerstable)
from app.modules.mdl import MDLBuilder
# Get existing manifest
manifest = await mdl_service.get_manifest(manifest_id)
# Infer additional relationships
updated_manifest = MDLBuilder.infer_relationships(manifest)
# Save the updated manifest
await mdl_service.update_manifest(updated_manifest)curl -X POST http://localhost:8015/api/v1/mdl/manifests/{manifest_id}/relationships \
-H "Content-Type: application/json" \
-d '{
"name": "orders_products",
"models": ["orders", "products"],
"join_type": "MANY_TO_ONE",
"condition": "orders.product_id = products.id"
}'Metrics define reusable business calculations:
from app.modules.mdl import MDLMetric, MDLTimeGrain, DatePart
revenue_metric = MDLMetric(
name="total_revenue",
base_object="orders",
dimension=[
MDLColumn(name="customer_tier", type="VARCHAR"),
MDLColumn(name="product_category", type="VARCHAR"),
],
measure=[
MDLColumn(
name="revenue",
type="DECIMAL",
is_calculated=True,
expression="SUM(total_amount)",
),
MDLColumn(
name="order_count",
type="INTEGER",
is_calculated=True,
expression="COUNT(*)",
),
],
time_grain=[
MDLTimeGrain(
name="order_date",
ref_column="created_at",
date_parts=[DatePart.YEAR, DatePart.QUARTER, DatePart.MONTH],
)
],
properties={"description": "Total revenue by customer tier and product category"}
)| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/mdl/manifests |
Create manifest |
| GET | /api/v1/mdl/manifests |
List manifests |
| GET | /api/v1/mdl/manifests/{id} |
Get manifest by ID |
| DELETE | /api/v1/mdl/manifests/{id} |
Delete manifest |
| POST | /api/v1/mdl/manifests/build |
Build from database |
| GET | /api/v1/mdl/manifests/{id}/export |
Export MDL JSON |
| POST | /api/v1/mdl/manifests/{id}/models |
Add model |
| DELETE | /api/v1/mdl/manifests/{id}/models/{name} |
Remove model |
| POST | /api/v1/mdl/manifests/{id}/relationships |
Add relationship |
| DELETE | /api/v1/mdl/manifests/{id}/relationships/{name} |
Remove relationship |
# Get WrenAI-compatible JSON
mdl_json = await mdl_service.export_mdl_json(manifest_id)
print(json.dumps(mdl_json, indent=2))Output:
{
"catalog": "analytics",
"schema": "public",
"dataSource": "postgresql",
"models": [
{
"name": "customers",
"columns": [
{"name": "id", "type": "INTEGER"},
{"name": "name", "type": "VARCHAR"}
],
"primaryKey": "id"
}
],
"relationships": [
{
"name": "orders_customers",
"models": ["orders", "customers"],
"joinType": "MANY_TO_ONE",
"condition": "orders.customer_id = customers.id"
}
]
}Validate manifests against the MDL JSON Schema:
from app.modules.mdl import MDLValidator
# Validate a manifest
is_valid, errors = MDLValidator.validate(mdl_json)
if not is_valid:
print("Validation errors:")
for error in errors:
print(f" - {error}")# Good
MDLModel(name="customer_orders", ...)
MDLRelationship(name="orders_to_customers", ...)
# Avoid
MDLModel(name="tbl_1", ...)
MDLRelationship(name="rel_a_b", ...)MDLColumn(
name="ltv",
type="DECIMAL",
is_calculated=True,
expression="SUM(order_total)",
properties={
"description": "Customer Lifetime Value - total of all orders",
"displayName": "Lifetime Value"
}
)Always specify primary keys for proper relationship handling:
MDLModel(
name="orders",
columns=[...],
primary_key="id", # Important!
)- Use
MANY_TO_ONEwhen the "many" side has the foreign key - Use
ONE_TO_MANYwhen defining from the "one" side's perspective
# Always validate before using in production
manifest = await mdl_service.get_manifest(manifest_id)
is_valid, errors = await mdl_service.validate_manifest(manifest)
if not is_valid:
raise ValueError(f"Invalid manifest: {errors}")Use the version field to track changes:
manifest = MDLManifest(
catalog="analytics",
schema="public",
version="1.2.0",
...
)from app.modules.mdl import (
MDLManifest,
MDLModel,
MDLColumn,
MDLRelationship,
MDLMetric,
MDLTimeGrain,
JoinType,
DatePart,
)
# Models
customers = MDLModel(
name="customers",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="name", type="VARCHAR"),
MDLColumn(name="email", type="VARCHAR"),
MDLColumn(name="tier", type="VARCHAR"),
MDLColumn(name="created_at", type="TIMESTAMP"),
],
primary_key="id",
properties={"description": "Customer master data"}
)
products = MDLModel(
name="products",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="name", type="VARCHAR"),
MDLColumn(name="category", type="VARCHAR"),
MDLColumn(name="price", type="DECIMAL"),
],
primary_key="id",
properties={"description": "Product catalog"}
)
orders = MDLModel(
name="orders",
columns=[
MDLColumn(name="id", type="INTEGER"),
MDLColumn(name="customer_id", type="INTEGER"),
MDLColumn(name="product_id", type="INTEGER"),
MDLColumn(name="quantity", type="INTEGER"),
MDLColumn(name="unit_price", type="DECIMAL"),
MDLColumn(
name="total",
type="DECIMAL",
is_calculated=True,
expression="quantity * unit_price"
),
MDLColumn(name="status", type="VARCHAR"),
MDLColumn(name="created_at", type="TIMESTAMP"),
],
primary_key="id",
properties={"description": "Customer orders"}
)
# Relationships
relationships = [
MDLRelationship(
name="orders_customers",
models=["orders", "customers"],
join_type=JoinType.MANY_TO_ONE,
condition="orders.customer_id = customers.id"
),
MDLRelationship(
name="orders_products",
models=["orders", "products"],
join_type=JoinType.MANY_TO_ONE,
condition="orders.product_id = products.id"
),
]
# Metrics
revenue_metric = MDLMetric(
name="revenue_by_category",
base_object="orders",
dimension=[
MDLColumn(name="category", type="VARCHAR"),
MDLColumn(name="customer_tier", type="VARCHAR"),
],
measure=[
MDLColumn(
name="total_revenue",
type="DECIMAL",
is_calculated=True,
expression="SUM(total)"
),
MDLColumn(
name="avg_order_value",
type="DECIMAL",
is_calculated=True,
expression="AVG(total)"
),
],
time_grain=[
MDLTimeGrain(
name="order_date",
ref_column="created_at",
date_parts=[DatePart.YEAR, DatePart.MONTH, DatePart.DAY]
)
]
)
# Complete manifest
manifest = MDLManifest(
name="E-Commerce Analytics",
catalog="ecommerce",
schema="public",
data_source="postgresql",
models=[customers, products, orders],
relationships=relationships,
metrics=[revenue_metric],
version="1.0.0",
)
# Save to KAI
manifest_id = await mdl_service.create_manifest(
db_connection_id="your_connection_id",
name=manifest.name,
catalog=manifest.catalog,
schema=manifest.schema,
data_source=manifest.data_source,
models=manifest.models,
relationships=manifest.relationships,
metrics=manifest.metrics,
)
print(f"Created E-Commerce MDL: {manifest_id}")The MDL semantic layer integrates with KAI's autonomous SQL generation agent to improve query accuracy by providing:
- Business term resolution: Maps user-friendly terms to actual table/column names
- Metric formulas: Pre-defined calculations (e.g., "revenue" →
SUM(total_amount)) - Join paths: Correct relationships between tables
- Calculated columns: Derived column expressions
When an MDL manifest is available, the agent receives an additional tool called mdl_semantic_lookup that allows it to search the semantic layer for relevant definitions.
┌──────────────────────────────────────────────────────────────┐
│ User Question │
│ "Show me total revenue by customer tier" │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ MDL Semantic Tool │
│ • Searches for "revenue" → Finds metric formula │
│ • Searches for "customer tier" → Finds column mapping │
│ • Returns join paths between orders and customers │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ SQL Generation Agent │
│ Uses semantic context to generate accurate SQL: │
│ SELECT c.tier, SUM(o.total_amount) as revenue │
│ FROM orders o JOIN customers c ON o.customer_id = c.id │
│ GROUP BY c.tier │
└──────────────────────────────────────────────────────────────┘
from app.utils.deep_agent.tools import KaiToolContext, build_tool_specs
from app.modules.mdl import MDLService, MDLRepository
from app.data.db.storage import Storage
# Get the MDL manifest
storage = Storage()
mdl_repo = MDLRepository(storage=storage)
mdl_service = MDLService(repository=mdl_repo)
manifest = await mdl_service.get_manifest(manifest_id)
# Create tool context with MDL
ctx = KaiToolContext(
database=sql_database,
db_scan=table_descriptions,
embedding=embedding_model,
mdl_manifest=manifest, # Enable MDL semantic lookup
)
# Build tools - includes mdl_semantic_lookup when manifest is provided
tool_specs = build_tool_specs(ctx)# First, ensure an MDL manifest exists for your database
uv run kai-agent run "Show revenue by customer tier" \
--db mydb \
--mdl-manifest-id your_manifest_idThe mdl_semantic_lookup tool allows the agent to search semantic definitions:
from app.utils.sql_tools.mdl_semantic_lookup import (
create_mdl_semantic_tool,
get_mdl_context_prompt,
)
# Create the tool
tool = create_mdl_semantic_tool(manifest)
# Example searches
print(tool._run("revenue"))
# Output:
# ## Business Metrics:
# **total_revenue** - Total revenue metric
# Base: orders
# Dimensions: customer_tier
# Measure: revenue = SUM(total_amount)
# Time Grain: order_date on created_at
print(tool._run("customers"))
# Output:
# ## Matching Tables/Models:
# **customers** - Customer master data, PK: id
# Columns:
# - id: INTEGER
# - name: VARCHAR (Customer full name)
# - tier: VARCHAR
# - created_at: TIMESTAMP
#
# ## Available Joins:
# **orders_customers**: orders N:1 customers
# JOIN: orders.customer_id = customers.idFor enhanced awareness without tool calls, inject MDL context into the agent's system prompt:
from app.utils.sql_tools.mdl_semantic_lookup import get_mdl_context_prompt
# Generate context string for system prompt
mdl_context = get_mdl_context_prompt(manifest)
print(mdl_context)Output:
## Semantic Layer (MDL) Context
This database has a semantic layer defined. Use business-friendly names when possible.
### Available Tables:
- **customers** - Customer master data
- **orders** - Customer orders
- **products** - Product catalog
### Table Relationships (use these for JOINs):
- orders → customers: `orders.customer_id = customers.id`
- orders → products: `orders.product_id = products.id`
### Business Metrics (use these formulas):
- **total_revenue.revenue**: `SUM(total_amount)`
- **total_revenue.order_count**: `COUNT(*)`
When the MDL tool is available, the agent will:
- Search semantic layer first before querying raw schema
- Use metric formulas instead of guessing aggregations
- Follow defined join paths for multi-table queries
- Map business terms to actual column/table names
Example agent reasoning:
User: "What's our total revenue by product category this month?"
Agent thinking:
1. Search MDL for "revenue" → Found metric with SUM(total_amount) formula
2. Search MDL for "product category" → Found in products.category column
3. Search MDL for "orders" → Found relationship orders_products
4. Generate SQL using the semantic definitions...
-
Define all important metrics: Pre-define common business metrics so the agent uses consistent formulas
-
Add column descriptions: Help the agent understand what each column represents
MDLColumn( name="ltv", type="DECIMAL", properties={"description": "Customer Lifetime Value"} )
-
Name relationships clearly: Use descriptive names like
orders_customersinstead ofrel_1 -
Include time grains: Define time dimensions for proper date handling
MDLTimeGrain( name="order_date", ref_column="created_at", date_parts=[DatePart.YEAR, DatePart.MONTH, DatePart.DAY] )
-
Keep manifests updated: Update the MDL when schema changes to maintain accuracy
- Explore the MDL JSON Schema for the complete specification
- Check out the test examples for more usage patterns
- Review the WrenAI MDL documentation for advanced features
For questions or issues, see the main KAI documentation.