Intugle
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docsite/docs/connectors/_category_.json‎
Lines changed: 4 additions & 0 deletions b/‎docsite/docs/connectors/_category_.json‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docsite/docs/connectors/snowflake.md‎
Lines changed: 105 additions & 0 deletions b/‎docsite/docs/connectors/snowflake.md‎
Lines changed: 105 additions & 0 deletions
diff --git a/‎docsite/docs/core-concepts/data-product/index.md‎
Lines changed: 9 additions & 3 deletions b/‎docsite/docs/core-concepts/data-product/index.md‎
Lines changed: 9 additions & 3 deletions
diff --git a/‎docsite/docs/core-concepts/semantic-intelligence/semantic-model.md‎
Lines changed: 27 additions & 31 deletions b/‎docsite/docs/core-concepts/semantic-intelligence/semantic-model.md‎
Lines changed: 27 additions & 31 deletions
diff --git a/‎docsite/docs/vibe-coding.md‎
Lines changed: 1 addition & 1 deletion b/‎docsite/docs/vibe-coding.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pyproject.toml‎
Lines changed: 10 additions & 4 deletions b/‎pyproject.toml‎
Lines changed: 10 additions & 4 deletions
diff --git a/‎src/intugle/adapters/adapter.py‎
Lines changed: 27 additions & 6 deletions b/‎src/intugle/adapters/adapter.py‎
Lines changed: 27 additions & 6 deletions
diff --git a/‎src/intugle/adapters/factory.py‎
Lines changed: 7 additions & 2 deletions b/‎src/intugle/adapters/factory.py‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎src/intugle/adapters/models.py‎
Lines changed: 2 additions & 1 deletion b/‎src/intugle/adapters/models.py‎
Lines changed: 2 additions & 1 deletion
@@ -215,3 +215,6 @@ models_bak
 settings.json
 archived/
 /experiments/
+
+profiles.yml
+profiles.yaml
@@ -0,0 +1,4 @@
+{
+  "label": "Connectors",
+  "position": 5
+}
@@ -0,0 +1,105 @@
+---
+sidebar_position: 1
+---
+
+# Snowflake
+
+`intugle` integrates with Snowflake, allowing you to read data from Snowflake tables and deploy your `SemanticModel` as a **Semantic View** in your Snowflake account.
+
+## Installation
+
+To use `intugle` with Snowflake, you must install the optional dependencies:
+
+```bash
+pip install "intugle[snowflake]"
+```
+
+This installs the `snowflake-snowpark-python` library.
+
+## Configuration
+
+The Snowflake adapter can connect using credentials from a `profiles.yml` file or automatically use an active session when running inside a Snowflake Notebook.
+
+### Connecting from an External Environment
+
+When running `intugle` outside of a Snowflake Notebook, you must provide connection credentials in a `profiles.yml` file at the root of your project. The adapter looks for a top-level `snowflake:` key.
+
+**Example `profiles.yml`:**
+
+```yaml
+snowflake:
+  type: snowflake
+  account: <your_snowflake_account>
+  user: <your_username>
+  password: <your_password>
+  role: <your_role>
+  warehouse: <your_warehouse>
+  database: <your_database>
+  schema: <your_schema>
+```
+
+### Connecting from a Snowflake Notebook
+
+When your code is executed within a Snowflake Notebook, the adapter automatically detects and uses the notebook's active Snowpark session. **No configuration is required.**
+
+## Usage
+
+### Reading Data from Snowflake
+
+To include a Snowflake table in your `SemanticModel`, define it in your input dictionary with `type: "snowflake"` and use the `identifier` key to specify the table name.
+
+:::caution Important
+The dictionary key for your dataset (e.g., `"CUSTOMERS"`) must exactly match the table name specified in the `identifier`.
+:::
+
+```python
+from intugle import SemanticModel
+
+datasets = {
+    "CUSTOMERS": {
+        "identifier": "CUSTOMERS", # Must match the key above
+        "type": "snowflake"
+    },
+    "ORDERS": {
+        "identifier": "ORDERS", # Must match the key above
+        "type": "snowflake"
+    }
+}
+
+# Initialize the semantic model
+sm = SemanticModel(datasets, domain="E-commerce")
+
+# Build the model as usual
+sm.build()
+```
+
+### Materializing Data Products
+
+When you use the `DataProduct` class with a Snowflake connection, the resulting data product will be materialized as a new table directly within your Snowflake schema.
+
+### Deploying the Semantic Model
+
+Once your semantic model is built, you can deploy it to Snowflake using the `deploy()` method. This process performs two actions:
+1.  **Syncs Metadata:** It updates the comments on your physical Snowflake tables and columns with the business glossaries from your `intugle` model.
+2.  **Creates Semantic View:** It constructs and executes a `CREATE OR REPLACE SEMANTIC VIEW` statement in your target database and schema.
+
+```python
+# Deploy the model to Snowflake
+sm.deploy(target="snowflake")
+
+# You can also provide a custom name for the view
+sm.deploy(target="snowflake", model_name="my_custom_semantic_view")
+```
+
+:::info Required Permissions
+To successfully deploy a semantic model, the Snowflake role you are using must have the following privileges:
+*   `USAGE` on the target database and schema.
+*   `CREATE SEMANTIC VIEW` on the target schema.
+*   `ALTER TABLE` permissions on the source tables to update their comments.
+:::
+
+:::tip Next Steps: Chat with your Data using Cortex Analyst
+Now that you have deployed a Semantic View, you can use **Snowflake Cortex Analyst** to interact with your data using natural language. Cortex Analyst leverages the relationships and context defined in your Semantic View to answer questions without requiring you to write SQL.
+
+To get started, navigate to **AI & ML -> Cortex Analyst** in the Snowflake UI and select your newly created view.
+:::
@@ -26,7 +26,11 @@ The primary input for the `DataProduct` is a product specification, which you de
 
 ## Usage Example
 
-Using the `DataProduct` is straightforward. Once the `SemanticModel` has successfully built the semantic layer, you can immediately start creating data products.
+Once the `SemanticModel` is built, you can use the `DataProduct` class to generate unified data products from the semantic layer. This allows you to select fields from across different tables, and `intugle` will automatically handle the joins and generate the final, unified dataset.
+
+## Building a Data Product
+
+To build a data product, you define an product specification model specifying the fields you want, any transformations, and filters.
 
 ```python
 from intugle import DataProduct
@@ -77,9 +81,11 @@ print(data_product.to_df())
 print(data_product.sql_query)
 ```
 
-This workflow allows you to rapidly prototype and generate complex, unified datasets by simply describing what you need, letting the `DataProduct` handle the underlying SQL complexity.
+:::info Materialization with Connectors
+When using a database connector like **[Snowflake](../../connectors/snowflake)**, the `build()` method will materialize the data product as a new table directly within your connected database schema. For file-based sources, it is materialized as a view in an in-memory DuckDB database.
+:::
 
-For a detailed breakdown of all capabilities with more examples, please see the following pages:
+The `DataProduct` class provides a powerful way to query your connected data without writing complex SQL manually. For more detailed operations, see the other guides in this section:
 
 *   **[Basic Operations](./basic-operations.md)**: Learn how to select, alias, and limit fields.
 *   **[Sorting](./sorting.md)**: See how to order your data products.
 
@@ -5,50 +5,46 @@ title: Semantic Model
 
 # Semantic Model
 
-The `SemanticModel` is the primary orchestrator of the data intelligence pipeline. It's the main user-facing class that manages many data sources and runs the end-to-end process of transforming them from raw, disconnected tables into a fully enriched and interconnected semantic layer.
+The `SemanticModel` is the core class in `intugle`. It orchestrates the entire process of profiling, link prediction, and glossary generation to build a unified semantic layer over your data.
 
-## Overview
-
-At a high level, the `SemanticModel` is responsible for:
+## Initialization
 
-1.  **Initializing and Managing Datasets**: It takes your raw data sources (for example, file paths) and wraps each one in a `DataSet` object.
-2.  **Executing the Semantic Model Pipeline**: It runs a series of analysis stages in a specific, logical order to build up a rich understanding of your data.
-3.  **Ensuring Resilience**: The pipeline avoids redundant work. It automatically saves its progress after each major stage, letting you resume an interrupted run without losing completed work.
+You can initialize the `SemanticModel` in two ways, depending on your use case.
 
-## Initialization
+### Method 1: From a Dictionary (Recommended)
 
-You can initialize the `SemanticModel` in two ways:
+This is the simplest and most common method. You provide a dictionary where each key is a unique name for a dataset, and the value contains its configuration (like path and type).
 
-1.  **With a Dictionary of File-Based Sources**: This is the most common method. You give a dictionary where keys are the desired names for your datasets and values are dictionary configurations pointing to your data. The `path` can be a local file path or a remote URL (e.g., over HTTPS). Currently, `csv`, `parquet`, and `excel` file formats are supported.
+```python
+from intugle import SemanticModel
 
-    ```python
-    from intugle import SemanticModel
+datasets = {
+    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
+    "patients": {"path": "path/to/patients.csv", "type": "csv"},
+    "claims": {"path": "path/to/claims.csv", "type": "csv"},
+}
 
-    data_sources = {
-        "customers": {"path": "path/to/customers.csv", "type": "csv"},
-        "orders": {"path": "https://example.com/orders.csv", "type": "csv"},
-    }
+sm = SemanticModel(datasets, domain="Healthcare")
+```
 
-    sm = SemanticModel(data_input=data_sources, domain="e-commerce")
-    ```
+:::info Connecting to Data Sources
+While these examples use local CSV files, `intugle` can connect to various data sources. See our **[Connectors documentation](../../connectors/snowflake)** for details on specific integrations like Snowflake.
+:::
 
-2.  **With a List of `DataSet` Objects**: If you have already created `DataSet` objects, you can pass a list of them directly.
+### Method 2: From a List of DataSet Objects
 
-    ```python
-    from intugle.analysis.models import DataSet
-    from intugle import SemanticModel
+For more advanced scenarios, you can initialize the `SemanticModel` with a list of pre-configured `DataSet` objects. This is useful if you have already instantiated `DataSet` objects for other purposes.
 
-    # Create DataSet objects from file-based sources
-    customers_data = {"path": "path/to/customers.csv", "type": "csv"}
-    orders_data = {"path": "path/to/orders.csv", "type": "csv"}
-    
-    dataset_one = DataSet(customers_data, name="customers")
-    dataset_two = DataSet(orders_data, name="orders")
+```python
+from intugle import SemanticModel, DataSet
 
-    datasets = [dataset_one, dataset_two]
+# Create DataSet objects first
+dataset_allergies = DataSet(data={"path": "path/to/allergies.csv", "type": "csv"}, name="allergies")
+dataset_patients = DataSet(data={"path": "path/to/patients.csv", "type": "csv"}, name="patients")
 
-    sm = SemanticModel(data_input=datasets, domain="e-commerce")
-    ```
+# Initialize the SemanticModel with the list of objects
+sm = SemanticModel([dataset_allergies, dataset_patients], domain="Healthcare")
+```
 
 The `domain` parameter is an optional but highly recommended string that gives context to the underlying AI models, helping them generate more relevant business glossary terms.
 
 
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 6
 title: Vibe Coding
 ---
 
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "intugle"
-version = "1.0.1"
+version = "1.0.2"
 authors = [
     { name="Intugle", email="[email protected]" },
 ]
@@ -43,15 +43,21 @@ dependencies = [
     "python-dotenv>=1.1.1",
     "symspellpy>=6.9.0",
     "trieregex>=1.0.0",
-    "xgboost<=3.0.4",
+    "xgboost>=3.0.4",
     "pyyaml>=6.0.2",
     "duckdb>=1.3.2",
-    "scikit-learn<=1.7.1",
+    "scikit-learn==1.7.1",
     "langchain[anthropic,google-genai,openai]>=0.3.27",
     "qdrant-client>=1.15.1",
-    "rich>=14.1.0",
+    "rich>=14.1.0"
 ]
 
+[project.optional-dependencies]
+snowflake = [
+    "snowflake-snowpark-python[pandas]>=1.12.0"
+]
+
+
 [project.urls]
 "Homepage" = "https://github.com/Intugle/data-tools"
 "Bug Tracker" = "https://github.com/Intugle/data-tools/issues"
 
@@ -1,6 +1,8 @@
 from abc import ABC, abstractmethod
 from typing import Any, TYPE_CHECKING
 
+import pandas as pd
+
 from intugle.adapters.models import (
     ColumnProfile,
     DataSetData,
@@ -9,6 +11,7 @@
 
 if TYPE_CHECKING:
     from intugle.analysis.models import DataSet
+    from intugle.models.manifest import Manifest
 
 
 class Adapter(ABC):
@@ -27,22 +30,40 @@ def column_profile(
         dtype_sample_limit: int = 10000,
     ) -> ColumnProfile:
         pass
-    
+
     @abstractmethod
-    def load():
+    def load(self, data: Any, table_name: str):
         ...
 
     @abstractmethod
     def execute(self, query: str):
         raise NotImplementedError()
-    
+
+    @abstractmethod
+    def to_df(self, data: DataSetData, table_name: str):
+        raise NotImplementedError()
+
     @abstractmethod
-    def to_df(self: DataSetData, date, table_name: str):
+    def to_df_from_query(self, query: str) -> pd.DataFrame:
         raise NotImplementedError()
-    
+
+    @abstractmethod
+    def create_table_from_query(self, table_name: str, query: str):
+        raise NotImplementedError()
+
+    @abstractmethod
+    def create_new_config_from_etl(self, etl_name: str) -> DataSetData:
+        raise NotImplementedError()
+
+    def deploy_semantic_model(self, manifest: "Manifest", **kwargs):
+        """Deploys a semantic model to the target system."""
+        raise NotImplementedError()
+
     def get_details(self, _: DataSetData):
         return None
 
     @abstractmethod
-    def intersect_count(self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str) -> int:
+    def intersect_count(
+        self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str
+    ) -> int:
         raise NotImplementedError()
@@ -21,6 +21,7 @@ def import_module(name: str) -> ModuleInterface:
 DEFAULT_PLUGINS = [
     "intugle.adapters.types.pandas.pandas",
     "intugle.adapters.types.duckdb.duckdb",
+    "intugle.adapters.types.snowflake.snowflake",
 ]
 
 
@@ -35,8 +36,12 @@ def __init__(self, plugins: list[dict] = None):
         plugins.extend(DEFAULT_PLUGINS)
 
         for _plugin in plugins:
-            plugin = import_module(_plugin)
-            plugin.register(self)
+            try:
+                plugin = import_module(_plugin)
+                plugin.register(self)
+            except ImportError:
+                print(f"Warning: Could not load plugin '{_plugin}' due to missing dependencies. This adapter will not be available.")
+                pass
 
     @classmethod
     def register(
 
@@ -6,9 +6,10 @@
 from pydantic import BaseModel, Field
 
 from intugle.adapters.types.duckdb.models import DuckdbConfig
+from intugle.adapters.types.snowflake.models import SnowflakeConfig
 
 # FIXME load dynamically
-DataSetData = pd.DataFrame | DuckdbConfig
+DataSetData = pd.DataFrame | DuckdbConfig | SnowflakeConfig
 
 
 class ProfilingOutput(BaseModel):
-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +{
 +  "label": "Connectors",
 +  "position": 5
 +}