Intugle
diff --git a/‎docsite/docs/connectors/implementing-a-connector.md‎
Lines changed: 63 additions & 15 deletions b/‎docsite/docs/connectors/implementing-a-connector.md‎
Lines changed: 63 additions & 15 deletions
diff --git a/‎docsite/docs/connectors/postgres.md‎
Lines changed: 122 additions & 0 deletions b/‎docsite/docs/connectors/postgres.md‎
Lines changed: 122 additions & 0 deletions
diff --git a/‎docsite/docs/core-concepts/semantic-intelligence/dataset.md‎
Lines changed: 15 additions & 4 deletions b/‎docsite/docs/core-concepts/semantic-intelligence/dataset.md‎
Lines changed: 15 additions & 4 deletions
diff --git a/‎docsite/docs/core-concepts/semantic-intelligence/semantic-model.md‎
Lines changed: 1 addition & 1 deletion b/‎docsite/docs/core-concepts/semantic-intelligence/semantic-model.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pyproject.toml‎
Lines changed: 7 additions & 1 deletion b/‎pyproject.toml‎
Lines changed: 7 additions & 1 deletion
@@ -7,8 +7,8 @@ sidebar_position: 4
 :::tip Pro Tip: Use an AI Coding Assistant
 The fastest way to implement a new adapter is to use an AI coding assistant like the **Gemini CLI**, **Cursor**, or **Claude**.
 
-1.  **Provide Context:** Give the assistant the code for an existing, similar adapter (e.g., `SnowflakeAdapter` or `DatabricksAdapter`).
-2.  **State Your Goal:** Ask it to replicate the structure and logic for your new data source. For example: *"Using the Snowflake adapter as a reference, create a new adapter for MyConnector."*
+1.  **Provide Context:** Give the assistant the code for an existing, similar adapter (e.g., `PostgresAdapter` or `DatabricksAdapter`).
+2.  **State Your Goal:** Ask it to replicate the structure and logic for your new data source. For example: *"Using the Postgres adapter as a reference, create a new adapter for Redshift."*
 3.  **Iterate:** The assistant can generate the boilerplate code for the models, the adapter class, and the registration functions, allowing you to focus on the specific implementation details for your database driver.
 :::
 
@@ -25,6 +25,7 @@ The core steps to create a new connector are:
 2.  **Define Configuration Models:** Create Pydantic models for your connector's configuration.
 3.  **Implement the Adapter Class:** Write the logic to interact with your data source.
 4.  **Register the Adapter:** Make your new adapter discoverable by the `intugle` factory.
+5.  **Add Optional Dependencies:** Declare the necessary driver libraries.
 
 ## Step 1: Create the Scaffolding
 
@@ -45,8 +46,8 @@ src/intugle/adapters/types/myconnector/
 
 In `src/intugle/adapters/types/myconnector/models.py`, you need to define two Pydantic models:
 
-1.  **Connection Config:** Defines the parameters needed to connect to your data source (e.g., host, user, password). This will be the format that will be picked up from the profiles.yml
-2.  **Data Config:** Defines how to identify a specific table or asset from that source. This will be the format that will be used to pass the datasets into the SemanticModel
+1.  **Connection Config:** Defines the parameters needed to connect to your data source (e.g., host, user, password). This is the structure that will be read from `profiles.yml`.
+2.  **Data Config:** Defines how to identify a specific table or asset from that source. This is the structure used when passing datasets into the `SemanticModel`.
 
 **Example `models.py`:**
 ```python
@@ -58,37 +59,40 @@ class MyConnectorConnectionConfig(SchemaBase):
     port: int
     user: str
     password: str
+    database: str
     schema: Optional[str] = None
 
 class MyConnectorConfig(SchemaBase):
     identifier: str
     type: str = "myconnector"
 ```
 
-Finally, open `src/intugle/adapters/models.py` and add your new `MyConnectorConfig` to the `DataSetData` type hint:
+Finally, open `src/intugle/adapters/models.py` and add your new `MyConnectorConfig` to the `DataSetData` type hint. This is for static type checking and improves developer experience.
 
 ```python
 # src/intugle/adapters/models.py
 
 # ... other imports
 from intugle.adapters.types.myconnector.models import MyConnectorConfig
 
-DataSetData = pd.DataFrame | DuckdbConfig | ... | MyConnectorConfig
+if TYPE_CHECKING:
+    # ... other configs
+    DataSetData = pd.DataFrame | ... | MyConnectorConfig
 ```
 
 ## Step 3: Implement the Adapter Class
 
 In `src/intugle/adapters/types/myconnector/myconnector.py`, create your adapter class. It must inherit from `Adapter` and implement its abstract methods.
 
-This is a simplified skeleton. You can look at the `DatabricksAdapter` or `SnowflakeAdapter` for a more complete example.
+This is a simplified skeleton. Refer to the `PostgresAdapter` or `DatabricksAdapter` for a complete example.
 
 **Example `myconnector.py`:**
 ```python
 from typing import Any, Optional
 import pandas as pd
 from intugle.adapters.adapter import Adapter
 from intugle.adapters.factory import AdapterFactory
-from intugle.adapters.models import ColumnProfile, ProfilingOutput
+from intugle.adapters.models import ColumnProfile, ProfilingOutput, DataSetData
 from .models import MyConnectorConfig, MyConnectorConnectionConfig
 from intugle.core import settings
 
@@ -101,15 +105,30 @@ class MyConnectorAdapter(Adapter):
         connection_params = settings.PROFILES.get("myconnector", {})
         config = MyConnectorConnectionConfig.model_validate(connection_params)
         # self.connection = myconnector_driver.connect(**config.model_dump())
+        self._database = config.database
+        self._schema = config.schema
         pass
 
-    # --- Must be implemented ---
+    # --- Properties ---
+    @property
+    def database(self) -> Optional[str]:
+        return self._database
+
+    @property
+    def schema(self) -> Optional[str]:
+        return self._schema
+    
+    @property
+    def source_name(self) -> str:
+        return "my_connector_source"
+
+    # --- Abstract Method Implementations ---
 
     def profile(self, data: Any, table_name: str) -> ProfilingOutput:
         # Return table-level metadata: row count, column names, and dtypes
         raise NotImplementedError()
 
-    def column_profile(self, data: Any, table_name: str, column_name: str, total_count: int) -> Optional[ColumnProfile]:
+    def column_profile(self, data: Any, table_name: str, column_name: str, total_count: int, **kwargs) -> Optional[ColumnProfile]:
         # Return column-level statistics: null count, distinct count, samples, etc.
         raise NotImplementedError()
 
@@ -121,7 +140,7 @@ class MyConnectorAdapter(Adapter):
         # Execute a query and return the result as a pandas DataFrame
         raise NotImplementedError()
 
-    def create_table_from_query(self, table_name: str, query: str) -> str:
+    def create_table_from_query(self, table_name: str, query: str, materialize: str = "view", **kwargs) -> str:
         # Materialize a query as a new table or view
         raise NotImplementedError()
 
@@ -132,8 +151,6 @@ class MyConnectorAdapter(Adapter):
     def intersect_count(self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str) -> int:
         # Calculate the count of intersecting values between two columns
         raise NotImplementedError()
-
-    # --- Other required methods ---
 
     def load(self, data: Any, table_name: str):
         # For database adapters, this is often a no-op
@@ -168,7 +185,7 @@ To make `intugle` aware of your new adapter, you must register it with the facto
     def register(factory: AdapterFactory):
         # Check if the required driver is installed
         # if MYCONNECTOR_DRIVER_AVAILABLE:
-        factory.register("myconnector", can_handle_myconnector, MyConnectorAdapter)
+        factory.register("myconnector", can_handle_myconnector, MyConnectorAdapter, MyConnectorConfig)
     ```
 
 2.  **Add the adapter to the default plugins list:** Open `src/intugle/adapters/factory.py` and add the path to your new adapter module.
@@ -185,7 +202,7 @@ To make `intugle` aware of your new adapter, you must register it with the facto
 
 ## Step 5: Add Optional Dependencies
 
-If your adapter requires a specific driver library (like `databricks-sql-connector` for Databricks), you should add it as an optional dependency.
+If your adapter requires a specific driver library (like `asyncpg` for Postgres), you should add it as an optional dependency.
 
 1.  Open the `pyproject.toml` file at the root of the project.
 2.  Add a new extra under the `[project.optional-dependencies]` section.
@@ -200,4 +217,35 @@ If your adapter requires a specific driver library (like `databricks-sql-connect
 
 This allows users to install the necessary libraries by running `pip install "intugle[myconnector]"`.
 
+## Best Practices and Considerations
+
+When implementing your adapter, keep the following points in mind to ensure it is robust, secure, and efficient.
+
+### Handling Database Objects
+Your adapter should be able to interact with different types of database objects, not just tables.
+-   **Tables, Views, and Materialized Views:** Ensure your `profile` method can read and `create_table_from_query` method can handle creating these different object types. The `materialize` parameter can be used to control this behavior. For example, the Postgres adapter supports `table`, `view`, and `materialized_view`.
+-   **Identifier Quoting:** Always wrap table and column identifiers in quotes (e.g., `"` for Postgres and Snowflake) to handle special characters, spaces, and case-sensitivity correctly.
+
+### Secure Query Execution
+-   **Parameterized Queries:** To prevent SQL injection vulnerabilities, always use parameterized queries when user-provided values are part of a SQL statement. Most database drivers provide a safe way to pass parameters (e.g., using `?` or `$1` placeholders) instead of formatting them directly into the query string.
+
+    **Do this:**
+    ```python
+    # Example with asyncpg
+    await connection.fetch("SELECT * FROM users WHERE id = $1", user_id)
+    ```
+
+    **Avoid this:**
+    ```python
+    # Unsafe - vulnerable to SQL injection
+    await connection.fetch(f"SELECT * FROM users WHERE id = {user_id}")
+    ```
+
+### Stability and Error Handling
+-   **Network Errors and Timeouts:** Implement timeouts for both establishing connections and executing queries. This prevents your application from hanging indefinitely if the database is unresponsive. Your chosen database driver should provide options for setting these timeouts.
+-   **Graceful Error Handling:** Wrap database calls in `try...except` blocks to catch potential exceptions (e.g., connection errors, permission denied) and provide clear, informative error messages to the user.
+
+### Atomicity
+-   **Transactions:** For operations that involve multiple SQL statements (like dropping and then recreating a table), wrap them in a transaction. This ensures that the entire operation is atomic—it either completes successfully, or all changes are rolled back if an error occurs, preventing the database from being left in an inconsistent state.
+
 That's it! You have now implemented and registered a custom connector.
@@ -0,0 +1,122 @@
+---
+sidebar_position: 3
+---
+
+# Postgres
+
+`intugle` integrates with PostgreSQL, allowing you to read data from your tables, views, and materialized views, and deploy your `SemanticModel` by setting constraints and comments directly in your PostgreSQL database.
+
+## Installation
+
+To use `intugle` with PostgreSQL, you must install the optional dependencies:
+
+```bash
+pip install "intugle[postgres]"
+```
+
+This installs the `asyncpg` and `sqlglot` libraries.
+
+## Configuration
+
+To connect to your PostgreSQL database, you must provide connection credentials in a `profiles.yml` file at the root of your project. The adapter looks for a top-level `postgres:` key.
+
+**Example `profiles.yml`:**
+
+```yaml
+postgres:
+  host: <your_postgres_host>
+  port: 5432 # Default PostgreSQL port
+  user: <your_username>
+  password: <your_password>
+  database: <your_database_name>
+  schema: <your_schema_name>
+```
+
+## Usage
+
+### Reading Data from PostgreSQL
+
+To include a PostgreSQL table, view, or materialized view in your `SemanticModel`, define it in your input dictionary with `type: "postgres"` and use the `identifier` key to specify the object name.
+
+:::caution Important
+The dictionary key for your dataset (e.g., `"CUSTOMERS"`) must exactly match the table, view, or materialized view name specified in the `identifier`.
+:::
+
+```python
+from intugle import SemanticModel
+
+datasets = {
+    "CUSTOMERS": {
+        "identifier": "CUSTOMERS", # Must match the key above
+        "type": "postgres"
+    },
+    "ORDERS_VIEW": {
+        "identifier": "ORDERS_VIEW", # Can be a view
+        "type": "postgres"
+    },
+    "PRODUCT_MV": {
+        "identifier": "PRODUCT_MV", # Can be a materialized view
+        "type": "postgres"
+    }
+}
+
+# Initialize the semantic model
+sm = SemanticModel(datasets, domain="E-commerce")
+
+# Build the model as usual
+sm.build()
+```
+
+### Materializing Data Products
+
+When you use the `DataProduct` class with a PostgreSQL connection, the resulting data product can be materialized as a new **table**, **view**, or **materialized view** directly within your target schema.
+
+```python
+from intugle import DataProduct
+
+etl_model = {
+    "name": "top_customers",
+    "fields": [
+        {"id": "CUSTOMERS.customer_id", "name": "customer_id"},
+        {"id": "CUSTOMERS.name", "name": "customer_name"},
+    ]
+}
+
+dp = DataProduct()
+
+# Materialize as a view (default)
+dp.build(etl_model, materialize="view")
+
+# Materialize as a table
+dp.build(etl_model, materialize="table")
+
+# Materialize as a materialized view
+dp.build(etl_model, materialize="materialized_view")
+```
+
+### Deploying the Semantic Model
+
+Once your semantic model is built, you can deploy it to PostgreSQL using the `deploy()` method. This process syncs your model's intelligence to your physical tables by:
+1.  **Syncing Metadata:** It updates the comments on your physical PostgreSQL tables and columns with the business glossaries from your `intugle` model.
+2.  **Setting Constraints:** It sets `PRIMARY KEY` and `FOREIGN KEY` constraints on your tables based on the relationships discovered in the model.
+
+```python
+# Deploy the model to PostgreSQL
+sm.deploy(target="postgres")
+
+# You can also control which parts of the deployment to run
+sm.deploy(
+    target="postgres",
+    sync_glossary=True,
+    set_primary_keys=True,
+    set_foreign_keys=True
+)
+```
+
+:::info Required Permissions
+To successfully deploy a semantic model, the PostgreSQL user you are using must have the following privileges:
+*   `USAGE` on the target schema.
+*   `CREATE TABLE`, `CREATE VIEW`, `CREATE MATERIALIZED VIEW` on the target schema.
+*   `COMMENT` privilege on tables and columns.
+*   `ALTER TABLE` to add primary and foreign key constraints.
+:::
@@ -25,13 +25,18 @@ Currently, the library supports `csv`, `parquet`, and `excel` files. More integr
 
 ### Centralized metadata
 
-All analysis results for a data source are stored within the `dataset.source_table_model` attribute. This attribute is a structured Pydantic model that makes accessing metadata predictable and easy. For more convenient access to column-level data, the `DataSet` also provides a `columns` dictionary.
+All analysis results for a data source are stored within the `dataset.source` attribute. This attribute is a structured Pydantic model that contains the `Source` object, which in turn holds the `SourceTables` model. This makes accessing metadata predictable and easy. For more convenient access to column-level data, the `DataSet` also provides a `columns` dictionary.
 
 #### Metadata structure and access
 
 The library organizes metadata using Pydantic models, but you can access it through the `DataSet`'s attributes.
 
--   **Table-Level Metadata**: Accessed via `dataset.source_table_model`.
+-   **Source-Level Metadata**: Accessed via `dataset.source`.
+    -   `.name: str`
+    -   `.description: str`
+    -   `.schema: str`
+    -   `.database: str`
+-   **Table-Level Metadata**: Accessed via `dataset.source.table`.
     -   `.name: str`
     -   `.description: str`
     -   `.key: Optional[str]`
@@ -55,9 +60,15 @@ The library organizes metadata using Pydantic models, but you can access it thro
 # Assuming 'sm' is a built SemanticModel instance
 customers_dataset = sm.datasets['customers']
 
+# Access source-level metadata
+print(f"Source Name: {customers_dataset.source.name}")
+print(f"Database: {customers_dataset.source.database}")
+print(f"Schema: {customers_dataset.source.schema}")
+
 # Access table-level metadata
-print(f"Table Name: {customers_dataset.source_table_model.name}")
-print(f"Primary Key: {customers_dataset.source_table_model.key}")
+print(f"Table Name: {customers_dataset.source.table.name}")
+print(f"Table Description: {customers_dataset.source.table.description}")
+print(f"Primary Key: {customers_dataset.source.table.key}")
 
 # Access column-level metadata using the 'columns' dictionary
 email_column = customers_dataset.columns['email']
 
@@ -131,7 +131,7 @@ customers_dataset = sm.datasets['customers']
 link_predictor = sm.link_predictor
 
 # Now you can explore rich metadata or results
-print(f"Primary Key for customers: {customers_dataset.source_table_model.description}")
+print(f"Primary Key for customers: {customers_dataset.source.table.description}")
 print("Discovered Links:")
 print(link_predictor.get_links_df())
 ```
 
@@ -56,13 +56,18 @@ dependencies = [
 
 [project.optional-dependencies]
 snowflake = [
-    "snowflake-snowpark-python[pandas]>=1.12.0"
+    "snowflake-snowpark-python[pandas]>=1.12.0",
+    "sqlglot>=27.20.0",
 ]
 databricks = [
     "databricks-sql-connector>=4.1.3",
     "pyspark>=3.5.0",
     "sqlglot>=27.20.0",
 ]
+postgres = [
+    "asyncpg>=0.30.0",
+    "sqlglot>=27.20.0",
+]
 
 
 [project.urls]
@@ -84,6 +89,7 @@ test = [
 ]
 lint = ["ruff"]
 dev = [
+    "asyncpg>=0.30.0",
     "databricks-sql-connector>=4.1.3",
     "ipykernel>=6.30.1",
     "pysonar>=1.2.0.2419",