Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docscontent/quickstart_content.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ type: markdown
---
## 1. LLM Configuration

Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For the semantic search feature, you will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](README.md) file.
Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For the semantic search feature, you will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) file.

You can configure the necessary services by setting the following environment variables:

Expand Down Expand Up @@ -196,7 +196,7 @@ type: markdown

The semantic search feature allows you to search for columns in your datasets using natural language.

> **Note:** To use this feature, you need to have a running Qdrant instance and an OpenAI API key. Please refer to the [README.md](README.md) for detailed setup instructions.
> **Note:** To use this feature, you need to have a running Qdrant instance and an OpenAI API key. Please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) for detailed setup instructions.
>
> **Google Colab Users:** If you are running this notebook in Google Colab, you may not be able to connect to a local Qdrant instance running in Docker. In this case, you will need to use a remotely hosted Qdrant server.
>
Expand Down
58 changes: 58 additions & 0 deletions docsite/docs/mcp-server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
sidebar_position: 6
title: MCP Server
---

# Intugle MCP Server

The Intugle library includes a built-in MCP (Model Context Protocol) server that exposes your data environment as a set of tools that can be understood and used by AI assistants and LLM-powered clients.

By serving your project's context through this standardized protocol, you enable powerful conversational workflows, such as [Vibe Coding](./vibe-coding.md), and allow AI agents to interact with your data securely.

## 1. Setting up the MCP Server

Once you have built your semantic layer using the `SemanticModel`, you can easily expose it as a set of tools for an AI assistant by starting the built-in MCP server.

### Starting the Server

To start the server, run the following command in your terminal from your project's root directory:

```bash
intugle-mcp
```

This will start a server on `localhost:8080` by default. You should see output indicating that the server is running and that the `semantic_layer` service is mounted.

### Connecting from an MCP Client

With the server running, you can connect to it from any MCP-compatible client. The endpoint for the semantic layer is:

`http://localhost:8080/semantic_layer/mcp`

Popular clients that support MCP include AI-powered IDEs and standalone applications. Here’s how to configure a few of them:

- **Cursor**: [Configuring MCP Servers](https://docs.cursor.com/en/context/mcp#configuring-mcp-servers)
- **Claude Code**: [Using MCP with Claude Code](https://docs.claude.com/en/docs/claude-code/mcp)
- **Claude Desktop**: [User Quickstart](https://modelcontextprotocol.info/docs/quickstart/user/)
- **Gemini CLI**: [Configure MCP Servers](https://cloud.google.com/gemini/docs/codeassist/use-agentic-chat-pair-programmer#configure-mcp-servers)

## 2. Data Discovery Tools

The MCP server provides tools that allow an LLM client to discover and understand the structure of your data. These tools are essential for providing the AI with the context it needs to answer questions and generate valid queries or specifications.

These tools are only available after a `SemanticModel` has been successfully generated and loaded.

### `get_tables`

This tool returns a list of all available tables in your semantic model, along with their descriptions. It's the primary way for an AI assistant to discover what data is available.

- **Description**: Get list of tables in database along with their technical description.
- **Returns**: A list of objects, where each object contains the `table_name` and `table_description`.

### `get_schema`

This tool retrieves the schema for one or more specified tables, including column names, data types, and other metadata including links. This allows the AI to understand the specific attributes of each table before attempting to query it.

- **Description**: Given database table names, get the schemas of the tables.
- **Parameters**: `table_names` (a list of strings).
- **Returns**: A dictionary where keys are table names and values are their detailed schemas.
95 changes: 44 additions & 51 deletions docsite/docs/vibe-coding.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,13 @@
---
sidebar_position: 6
sidebar_position: 7
title: Vibe Coding
---

# Vibe Coding with the MCP Server

"Vibe Coding" is an interactive, conversational approach to development where you use natural language to generate code or specifications. Intugle embraces this by allowing you to serve your semantic layer through an MCP (Model Context Protocol) server.
"Vibe Coding" is an interactive, conversational approach to data intelligence. Intugle embraces this by allowing you to serve your project as an MCP (Model Context Protocol) server.

This turns your data into a "self-describing" resource that an AI assistant can understand, allowing you to "vibe" with your data to create specifications without writing them by hand.

:::info In Progress
Currently, Vibe Coding is available for generating **Data Product** specifications. We are actively working on extending this capability to other modules in the Intugle ecosystem. Stay tuned for more updates!
:::
This turns your entire data workflow into a "self-describing" resource that an AI assistant can understand and operate. It allows you to "vibe" with the intugle library—using natural language to build semantic models, perform searches, and create data products from scratch.

## 1. Setting up the MCP Server

Expand All @@ -29,13 +25,13 @@ To start the server, run the following command in your terminal from your projec
intugle-mcp
```

This will start a server on `localhost:8000` by default. You should see output indicating that the server is running and that the `semantic_layer` and `adapter` services are mounted.
This will start a server on `localhost:8080` by default. You should see output indicating that the server is running and that the `semantic_layer` service is mounted.

### Connecting from an MCP Client

With the server running, you can connect to it from any MCP-compatible client. The endpoint for the semantic layer is:

`http://localhost:8000/semantic_layer/mcp`
`http://localhost:8080/semantic_layer/mcp`

Popular clients that support MCP include AI-powered IDEs and standalone applications. Here’s how to configure a few of them:

Expand All @@ -44,56 +40,53 @@ Popular clients that support MCP include AI-powered IDEs and standalone applicat
- **Claude Desktop**: [User Quickstart](https://modelcontextprotocol.info/docs/quickstart/user/)
- **Gemini CLI**: [Configure MCP Servers](https://cloud.google.com/gemini/docs/codeassist/use-agentic-chat-pair-programmer#configure-mcp-servers)

## 2. Using Vibe Coding
## 2. Vibe Coding

The MCP server exposes powerful prompts that are designed to take your natural language requests and convert them directly into valid specifications.
The MCP server exposes the `intugle-vibe` prompt. This prompt equips an AI assistant with knowledge of the Intugle library and access to its core tools. You can use it to guide you through the entire data intelligence workflow using natural language.

### Example: Generating a Data Product
In your MCP-compatible client, you can invoke the prompt and provide your request. In most clients, this is done by typing `/` followed by the prompt name.

Currently, you can use the `create-dp` prompt to generate a `product_spec` dictionary for a Data Product.
### Example 1: Getting Started and Building a Semantic Model

In your MCP-compatible client, you can invoke the prompt and provide your request. In most clients, this is done by typing `/` followed by the prompt name.
If you are unsure how to start, you can ask for guidance. You can also ask the assistant to perform actions like creating a semantic model.

```
/create-dp show me the top 5 patients with the most claims
/intugle-vibe How do I create a semantic model?
```
```
/intugle-vibe Create a semantic model over my healthcare data.
```

:::tip Client-Specific Commands
The exact command to invoke a prompt (e.g., using `/` or another prefix) can vary between clients. Be sure to check the documentation for your specific tool.
:::
The assistant will read the relevant documentation and guide you through the process or execute the steps if possible.

### Example 2: Generating a Data Product Specification

Once you have a semantic model, you can ask the assistant to create a specification for a reusable data product.

```
/intugle-vibe create a data product specification for the top 5 patients with the most claims
```

The AI assistant, connected to your MCP server, will understand that you are requesting a `product_spec`. It will use the `get_tables` and `get_schema` tools to find the `patients` and `claims` tables, and generate the specification.

### Example 3: Performing a Semantic Search

The AI assistant, connected to your MCP server, will understand the request, use the `get_tables` and `get_schema` tools to find the `patients` and `claims` tables, and generate the following `product_spec`:

```json
{
"name": "top_5_patients_by_claims",
"fields": [
{
"id": "patients.first",
"name": "first_name"
},
{
"id": "patients.last",
"name": "last_name"
},
{
"id": "claims.id",
"name": "number_of_claims",
"category": "measure",
"measure_func": "count"
}
],
"filter": {
"sort_by": [
{
"id": "claims.id",
"alias": "number_of_claims",
"direction": "desc"
}
],
"limit": 5
}
}
You can also perform a semantic search on your data.

```
/intugle-vibe use semantic search to find columns related to 'hospital visit reasons'
```

This workflow allows you to stay in your creative flow, rapidly iterating on data product ideas by describing what you want in plain English.
The assistant will code out the semantic search capabilities of your `SemanticModel` to find and return relevant columns from your datasets.

:::tip Agent Mode
Most modern, AI-powered clients support an "agent mode" where the coding assistant can handle the entire workflow for you.

For example, you can directly ask for a final output, like a CSV file:

`/intugle-vibe create a CSV of the top 10 patients by claim count`

The agent will understand the end goal and perform all the necessary intermediate steps for you. It will realize it needs to build the semantic model, generate the data product specification, execute it, and finally provide you with the resulting CSV file—all without you needing to manage the code or the process.
:::

This workflow accelerates your journey from raw data to insightful data products. Simply describe what you want in plain English and let the assistant handle the details, freeing you from the hassle of digging through documentation.
6 changes: 3 additions & 3 deletions notebooks/quickstart_fmcg.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,15 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "f0ecc2ba",
"metadata": {},
"outputs": [],
"source": [
"def generate_config(table_name: str) -> str:\n",
" \"\"\"Append the base URL to the table name.\"\"\"\n",
" return {\n",
" \"path\": f\"./sample_data/fmcg/{table_name}.csv\",\n",
" \"path\": f\"https://github.com/Intugle/data-tools/tree/main/sample_data/fmcg/{table_name}.csv\",\n",
" \"type\": \"csv\",\n",
" }\n",
"\n",
Expand Down Expand Up @@ -2528,7 +2528,7 @@
"\n",
"The semantic search feature allows you to search for columns in your datasets using natural language. \n",
"\n",
"> **Note:** To use this feature, you need to have a running Qdrant instance and an OpenAI API key. Please refer to the [README.md](README.md) for detailed setup instructions.\n",
"> **Note:** To use this feature, you need to have a running Qdrant instance and an OpenAI API key. Please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) for detailed setup instructions.\n",
">\n",
"> **Google Colab Users:** If you are running this notebook in Google Colab, you may not be able to connect to a local Qdrant instance running in Docker. In this case, you will need to use a remotely hosted Qdrant server.\n",
">\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/quickstart_fmcg_snowflake.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@
"source": [
"## 1. LLM Configuration\n",
"\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For detailed setup instructions, please refer to the [README.md](README.md) file.\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For detailed setup instructions, please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) file.\n",
"\n",
"You can configure the necessary services by setting the following environment variables:\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/quickstart_healthcare.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -124,15 +124,15 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "76771eda",
"metadata": {},
"outputs": [],
"source": [
"def generate_config(table_name: str) -> str:\n",
" \"\"\"Append the base URL to the table name.\"\"\"\n",
" return {\n",
" \"path\": f\"./sample_data/healthcare/{table_name}.csv\",\n",
" \"path\": f\"https://github.com/Intugle/data-tools/tree/main/sample_data/healthcare/{table_name}.csv\",\n",
" \"type\": \"csv\",\n",
" }\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/quickstart_healthcare_databricks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@
"source": [
"## 1. LLM Configuration\n",
"\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For detailed setup instructions, please refer to the [README.md](README.md) file.\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For detailed setup instructions, please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) file.\n",
"\n",
"You can configure the necessary services by setting the following environment variables:\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/quickstart_native_databricks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"source": [
"## 1. LLM Configuration\n",
"\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For the semantic search feature, you will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](README.md) file.\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For the semantic search feature, you will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) file.\n",
"\n",
"You can configure the necessary services by setting the following environment variables:\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/quickstart_native_snowflake.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@
"source": [
"## 1. LLM Configuration\n",
"\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For the semantic search feature, you will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](README.md) file.\n",
"Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. For the semantic search feature, you will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](https://github.com/Intugle/data-tools/blob/main/README.md) file.\n",
"\n",
"You can configure the necessary services by setting the following environment variables:\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/quickstart_sports_media.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -124,15 +124,15 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "f7367816",
"metadata": {},
"outputs": [],
"source": [
"def generate_config(table_name: str) -> str:\n",
" \"\"\"Append the base URL to the table name.\"\"\"\n",
" return {\n",
" \"path\": f\"./sample_data/sports_media/{table_name}.csv\",\n",
" \"path\": f\"https://github.com/Intugle/data-tools/tree/main/sample_data/sports_media/{table_name}.csv\",\n",
" \"type\": \"csv\",\n",
" }\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/quickstart_tech_manufacturing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -119,14 +119,14 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def generate_config(table_name: str) -> str:\n",
" \"\"\"Append the base URL to the table name.\"\"\"\n",
" return {\n",
" \"path\": f\"./sample_data/tech_manufacturing/{table_name}.csv\",\n",
" \"path\": f\"https://github.com/Intugle/data-tools/tree/main/sample_data/tech_company/{table_name}.csv\",\n",
" \"type\": \"csv\",\n",
" }\n",
"\n",
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "intugle"
version = "1.0.4"
version = "1.0.5"
authors = [
{ name="Intugle", email="[email protected]" },
]
Expand Down Expand Up @@ -50,6 +50,7 @@ dependencies = [
"langchain[anthropic,google-genai,openai]>=0.3.27",
"qdrant-client>=1.15.1",
"rich>=14.1.0",
"aiohttp>=3.9.5",
]

[project.optional-dependencies]
Expand Down
11 changes: 0 additions & 11 deletions src/intugle/cli.py

This file was deleted.

1 change: 1 addition & 0 deletions src/intugle/core/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ class Settings(BaseSettings):
CUSTOM_EMBEDDINGS_INSTANCE: Optional[Any] = None

# LP
RELATIONSHIPS_FILE: str = "__relationships__.yml"
HALLUCINATIONS_MAX_RETRY: int = 2
UNIQUENESS_THRESHOLD: float = 0.9
INTERSECT_RATIO_THRESHOLD: float = 0.9
Expand Down
5 changes: 4 additions & 1 deletion src/intugle/link_predictor/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,14 @@ def _predict_for_pair(
]
return pair_links

def predict(self, filename='__relationships__.yml', save: bool = False, force_recreate: bool = False) -> 'LinkPredictor':
def predict(self, filename: str = None, save: bool = False, force_recreate: bool = False) -> 'LinkPredictor':
"""
Iterates through all unique pairs of datasets, predicts the links for
each pair, and returns the aggregated results.
"""
if filename is None:
filename = settings.RELATIONSHIPS_FILE

relationships_file = os.path.join(settings.PROJECT_BASE, filename)

if not force_recreate and os.path.exists(relationships_file):
Expand Down
Loading