Skip to content

Commit 652f457

Browse files
Merge pull request #56 from Intugle/feature/snowflake-integration
Feature/snowflake integration
2 parents 032685e + 4027289 commit 652f457

File tree

26 files changed

+1187
-78
lines changed

26 files changed

+1187
-78
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,3 +215,6 @@ models_bak
215215
settings.json
216216
archived/
217217
/experiments/
218+
219+
profiles.yml
220+
profiles.yaml
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"label": "Connectors",
3+
"position": 5
4+
}
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Snowflake
6+
7+
`intugle` integrates with Snowflake, allowing you to read data from Snowflake tables and deploy your `SemanticModel` as a **Semantic View** in your Snowflake account.
8+
9+
## Installation
10+
11+
To use `intugle` with Snowflake, you must install the optional dependencies:
12+
13+
```bash
14+
pip install "intugle[snowflake]"
15+
```
16+
17+
This installs the `snowflake-snowpark-python` library.
18+
19+
## Configuration
20+
21+
The Snowflake adapter can connect using credentials from a `profiles.yml` file or automatically use an active session when running inside a Snowflake Notebook.
22+
23+
### Connecting from an External Environment
24+
25+
When running `intugle` outside of a Snowflake Notebook, you must provide connection credentials in a `profiles.yml` file at the root of your project. The adapter looks for a top-level `snowflake:` key.
26+
27+
**Example `profiles.yml`:**
28+
29+
```yaml
30+
snowflake:
31+
type: snowflake
32+
account: <your_snowflake_account>
33+
user: <your_username>
34+
password: <your_password>
35+
role: <your_role>
36+
warehouse: <your_warehouse>
37+
database: <your_database>
38+
schema: <your_schema>
39+
```
40+
41+
### Connecting from a Snowflake Notebook
42+
43+
When your code is executed within a Snowflake Notebook, the adapter automatically detects and uses the notebook's active Snowpark session. **No configuration is required.**
44+
45+
## Usage
46+
47+
### Reading Data from Snowflake
48+
49+
To include a Snowflake table in your `SemanticModel`, define it in your input dictionary with `type: "snowflake"` and use the `identifier` key to specify the table name.
50+
51+
:::caution Important
52+
The dictionary key for your dataset (e.g., `"CUSTOMERS"`) must exactly match the table name specified in the `identifier`.
53+
:::
54+
55+
```python
56+
from intugle import SemanticModel
57+
58+
datasets = {
59+
"CUSTOMERS": {
60+
"identifier": "CUSTOMERS", # Must match the key above
61+
"type": "snowflake"
62+
},
63+
"ORDERS": {
64+
"identifier": "ORDERS", # Must match the key above
65+
"type": "snowflake"
66+
}
67+
}
68+
69+
# Initialize the semantic model
70+
sm = SemanticModel(datasets, domain="E-commerce")
71+
72+
# Build the model as usual
73+
sm.build()
74+
```
75+
76+
### Materializing Data Products
77+
78+
When you use the `DataProduct` class with a Snowflake connection, the resulting data product will be materialized as a new table directly within your Snowflake schema.
79+
80+
### Deploying the Semantic Model
81+
82+
Once your semantic model is built, you can deploy it to Snowflake using the `deploy()` method. This process performs two actions:
83+
1. **Syncs Metadata:** It updates the comments on your physical Snowflake tables and columns with the business glossaries from your `intugle` model.
84+
2. **Creates Semantic View:** It constructs and executes a `CREATE OR REPLACE SEMANTIC VIEW` statement in your target database and schema.
85+
86+
```python
87+
# Deploy the model to Snowflake
88+
sm.deploy(target="snowflake")
89+
90+
# You can also provide a custom name for the view
91+
sm.deploy(target="snowflake", model_name="my_custom_semantic_view")
92+
```
93+
94+
:::info Required Permissions
95+
To successfully deploy a semantic model, the Snowflake role you are using must have the following privileges:
96+
* `USAGE` on the target database and schema.
97+
* `CREATE SEMANTIC VIEW` on the target schema.
98+
* `ALTER TABLE` permissions on the source tables to update their comments.
99+
:::
100+
101+
:::tip Next Steps: Chat with your Data using Cortex Analyst
102+
Now that you have deployed a Semantic View, you can use **Snowflake Cortex Analyst** to interact with your data using natural language. Cortex Analyst leverages the relationships and context defined in your Semantic View to answer questions without requiring you to write SQL.
103+
104+
To get started, navigate to **AI & ML -> Cortex Analyst** in the Snowflake UI and select your newly created view.
105+
:::

docsite/docs/core-concepts/data-product/index.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,11 @@ The primary input for the `DataProduct` is a product specification, which you de
2626

2727
## Usage Example
2828

29-
Using the `DataProduct` is straightforward. Once the `SemanticModel` has successfully built the semantic layer, you can immediately start creating data products.
29+
Once the `SemanticModel` is built, you can use the `DataProduct` class to generate unified data products from the semantic layer. This allows you to select fields from across different tables, and `intugle` will automatically handle the joins and generate the final, unified dataset.
30+
31+
## Building a Data Product
32+
33+
To build a data product, you define an product specification model specifying the fields you want, any transformations, and filters.
3034

3135
```python
3236
from intugle import DataProduct
@@ -77,9 +81,11 @@ print(data_product.to_df())
7781
print(data_product.sql_query)
7882
```
7983

80-
This workflow allows you to rapidly prototype and generate complex, unified datasets by simply describing what you need, letting the `DataProduct` handle the underlying SQL complexity.
84+
:::info Materialization with Connectors
85+
When using a database connector like **[Snowflake](../../connectors/snowflake)**, the `build()` method will materialize the data product as a new table directly within your connected database schema. For file-based sources, it is materialized as a view in an in-memory DuckDB database.
86+
:::
8187

82-
For a detailed breakdown of all capabilities with more examples, please see the following pages:
88+
The `DataProduct` class provides a powerful way to query your connected data without writing complex SQL manually. For more detailed operations, see the other guides in this section:
8389

8490
* **[Basic Operations](./basic-operations.md)**: Learn how to select, alias, and limit fields.
8591
* **[Sorting](./sorting.md)**: See how to order your data products.

docsite/docs/core-concepts/semantic-intelligence/semantic-model.md

Lines changed: 27 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -5,50 +5,46 @@ title: Semantic Model
55

66
# Semantic Model
77

8-
The `SemanticModel` is the primary orchestrator of the data intelligence pipeline. It's the main user-facing class that manages many data sources and runs the end-to-end process of transforming them from raw, disconnected tables into a fully enriched and interconnected semantic layer.
8+
The `SemanticModel` is the core class in `intugle`. It orchestrates the entire process of profiling, link prediction, and glossary generation to build a unified semantic layer over your data.
99

10-
## Overview
11-
12-
At a high level, the `SemanticModel` is responsible for:
10+
## Initialization
1311

14-
1. **Initializing and Managing Datasets**: It takes your raw data sources (for example, file paths) and wraps each one in a `DataSet` object.
15-
2. **Executing the Semantic Model Pipeline**: It runs a series of analysis stages in a specific, logical order to build up a rich understanding of your data.
16-
3. **Ensuring Resilience**: The pipeline avoids redundant work. It automatically saves its progress after each major stage, letting you resume an interrupted run without losing completed work.
12+
You can initialize the `SemanticModel` in two ways, depending on your use case.
1713

18-
## Initialization
14+
### Method 1: From a Dictionary (Recommended)
1915

20-
You can initialize the `SemanticModel` in two ways:
16+
This is the simplest and most common method. You provide a dictionary where each key is a unique name for a dataset, and the value contains its configuration (like path and type).
2117

22-
1. **With a Dictionary of File-Based Sources**: This is the most common method. You give a dictionary where keys are the desired names for your datasets and values are dictionary configurations pointing to your data. The `path` can be a local file path or a remote URL (e.g., over HTTPS). Currently, `csv`, `parquet`, and `excel` file formats are supported.
18+
```python
19+
from intugle import SemanticModel
2320

24-
```python
25-
from intugle import SemanticModel
21+
datasets = {
22+
"allergies": {"path": "path/to/allergies.csv", "type": "csv"},
23+
"patients": {"path": "path/to/patients.csv", "type": "csv"},
24+
"claims": {"path": "path/to/claims.csv", "type": "csv"},
25+
}
2626

27-
data_sources = {
28-
"customers": {"path": "path/to/customers.csv", "type": "csv"},
29-
"orders": {"path": "https://example.com/orders.csv", "type": "csv"},
30-
}
27+
sm = SemanticModel(datasets, domain="Healthcare")
28+
```
3129

32-
sm = SemanticModel(data_input=data_sources, domain="e-commerce")
33-
```
30+
:::info Connecting to Data Sources
31+
While these examples use local CSV files, `intugle` can connect to various data sources. See our **[Connectors documentation](../../connectors/snowflake)** for details on specific integrations like Snowflake.
32+
:::
3433

35-
2. **With a List of `DataSet` Objects**: If you have already created `DataSet` objects, you can pass a list of them directly.
34+
### Method 2: From a List of DataSet Objects
3635

37-
```python
38-
from intugle.analysis.models import DataSet
39-
from intugle import SemanticModel
36+
For more advanced scenarios, you can initialize the `SemanticModel` with a list of pre-configured `DataSet` objects. This is useful if you have already instantiated `DataSet` objects for other purposes.
4037

41-
# Create DataSet objects from file-based sources
42-
customers_data = {"path": "path/to/customers.csv", "type": "csv"}
43-
orders_data = {"path": "path/to/orders.csv", "type": "csv"}
44-
45-
dataset_one = DataSet(customers_data, name="customers")
46-
dataset_two = DataSet(orders_data, name="orders")
38+
```python
39+
from intugle import SemanticModel, DataSet
4740

48-
datasets = [dataset_one, dataset_two]
41+
# Create DataSet objects first
42+
dataset_allergies = DataSet(data={"path": "path/to/allergies.csv", "type": "csv"}, name="allergies")
43+
dataset_patients = DataSet(data={"path": "path/to/patients.csv", "type": "csv"}, name="patients")
4944

50-
sm = SemanticModel(data_input=datasets, domain="e-commerce")
51-
```
45+
# Initialize the SemanticModel with the list of objects
46+
sm = SemanticModel([dataset_allergies, dataset_patients], domain="Healthcare")
47+
```
5248

5349
The `domain` parameter is an optional but highly recommended string that gives context to the underlying AI models, helping them generate more relevant business glossary terms.
5450

docsite/docs/vibe-coding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 5
2+
sidebar_position: 6
33
title: Vibe Coding
44
---
55

pyproject.toml

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "intugle"
7-
version = "1.0.1"
7+
version = "1.0.2"
88
authors = [
99
{ name="Intugle", email="[email protected]" },
1010
]
@@ -43,15 +43,21 @@ dependencies = [
4343
"python-dotenv>=1.1.1",
4444
"symspellpy>=6.9.0",
4545
"trieregex>=1.0.0",
46-
"xgboost<=3.0.4",
46+
"xgboost>=3.0.4",
4747
"pyyaml>=6.0.2",
4848
"duckdb>=1.3.2",
49-
"scikit-learn<=1.7.1",
49+
"scikit-learn==1.7.1",
5050
"langchain[anthropic,google-genai,openai]>=0.3.27",
5151
"qdrant-client>=1.15.1",
52-
"rich>=14.1.0",
52+
"rich>=14.1.0"
5353
]
5454

55+
[project.optional-dependencies]
56+
snowflake = [
57+
"snowflake-snowpark-python[pandas]>=1.12.0"
58+
]
59+
60+
5561
[project.urls]
5662
"Homepage" = "https://github.com/Intugle/data-tools"
5763
"Bug Tracker" = "https://github.com/Intugle/data-tools/issues"

src/intugle/adapters/adapter.py

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
from abc import ABC, abstractmethod
22
from typing import Any, TYPE_CHECKING
33

4+
import pandas as pd
5+
46
from intugle.adapters.models import (
57
ColumnProfile,
68
DataSetData,
@@ -9,6 +11,7 @@
911

1012
if TYPE_CHECKING:
1113
from intugle.analysis.models import DataSet
14+
from intugle.models.manifest import Manifest
1215

1316

1417
class Adapter(ABC):
@@ -27,22 +30,40 @@ def column_profile(
2730
dtype_sample_limit: int = 10000,
2831
) -> ColumnProfile:
2932
pass
30-
33+
3134
@abstractmethod
32-
def load():
35+
def load(self, data: Any, table_name: str):
3336
...
3437

3538
@abstractmethod
3639
def execute(self, query: str):
3740
raise NotImplementedError()
38-
41+
42+
@abstractmethod
43+
def to_df(self, data: DataSetData, table_name: str):
44+
raise NotImplementedError()
45+
3946
@abstractmethod
40-
def to_df(self: DataSetData, date, table_name: str):
47+
def to_df_from_query(self, query: str) -> pd.DataFrame:
4148
raise NotImplementedError()
42-
49+
50+
@abstractmethod
51+
def create_table_from_query(self, table_name: str, query: str):
52+
raise NotImplementedError()
53+
54+
@abstractmethod
55+
def create_new_config_from_etl(self, etl_name: str) -> DataSetData:
56+
raise NotImplementedError()
57+
58+
def deploy_semantic_model(self, manifest: "Manifest", **kwargs):
59+
"""Deploys a semantic model to the target system."""
60+
raise NotImplementedError()
61+
4362
def get_details(self, _: DataSetData):
4463
return None
4564

4665
@abstractmethod
47-
def intersect_count(self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str) -> int:
66+
def intersect_count(
67+
self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str
68+
) -> int:
4869
raise NotImplementedError()

src/intugle/adapters/factory.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ def import_module(name: str) -> ModuleInterface:
2121
DEFAULT_PLUGINS = [
2222
"intugle.adapters.types.pandas.pandas",
2323
"intugle.adapters.types.duckdb.duckdb",
24+
"intugle.adapters.types.snowflake.snowflake",
2425
]
2526

2627

@@ -35,8 +36,12 @@ def __init__(self, plugins: list[dict] = None):
3536
plugins.extend(DEFAULT_PLUGINS)
3637

3738
for _plugin in plugins:
38-
plugin = import_module(_plugin)
39-
plugin.register(self)
39+
try:
40+
plugin = import_module(_plugin)
41+
plugin.register(self)
42+
except ImportError:
43+
print(f"Warning: Could not load plugin '{_plugin}' due to missing dependencies. This adapter will not be available.")
44+
pass
4045

4146
@classmethod
4247
def register(

src/intugle/adapters/models.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@
66
from pydantic import BaseModel, Field
77

88
from intugle.adapters.types.duckdb.models import DuckdbConfig
9+
from intugle.adapters.types.snowflake.models import SnowflakeConfig
910

1011
# FIXME load dynamically
11-
DataSetData = pd.DataFrame | DuckdbConfig
12+
DataSetData = pd.DataFrame | DuckdbConfig | SnowflakeConfig
1213

1314

1415
class ProfilingOutput(BaseModel):

0 commit comments

Comments
 (0)