You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docsite/docs/connectors/implementing-a-connector.md
+63-15Lines changed: 63 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ sidebar_position: 4
7
7
:::tip Pro Tip: Use an AI Coding Assistant
8
8
The fastest way to implement a new adapter is to use an AI coding assistant like the **Gemini CLI**, **Cursor**, or **Claude**.
9
9
10
-
1.**Provide Context:** Give the assistant the code for an existing, similar adapter (e.g., `SnowflakeAdapter` or `DatabricksAdapter`).
11
-
2.**State Your Goal:** Ask it to replicate the structure and logic for your new data source. For example: *"Using the Snowflake adapter as a reference, create a new adapter for MyConnector."*
10
+
1.**Provide Context:** Give the assistant the code for an existing, similar adapter (e.g., `PostgresAdapter` or `DatabricksAdapter`).
11
+
2.**State Your Goal:** Ask it to replicate the structure and logic for your new data source. For example: *"Using the Postgres adapter as a reference, create a new adapter for Redshift."*
12
12
3.**Iterate:** The assistant can generate the boilerplate code for the models, the adapter class, and the registration functions, allowing you to focus on the specific implementation details for your database driver.
13
13
:::
14
14
@@ -25,6 +25,7 @@ The core steps to create a new connector are:
25
25
2.**Define Configuration Models:** Create Pydantic models for your connector's configuration.
26
26
3.**Implement the Adapter Class:** Write the logic to interact with your data source.
27
27
4.**Register the Adapter:** Make your new adapter discoverable by the `intugle` factory.
28
+
5.**Add Optional Dependencies:** Declare the necessary driver libraries.
In `src/intugle/adapters/types/myconnector/models.py`, you need to define two Pydantic models:
47
48
48
-
1.**Connection Config:** Defines the parameters needed to connect to your data source (e.g., host, user, password). This will be the format that will be picked up from the profiles.yml
49
-
2.**Data Config:** Defines how to identify a specific table or asset from that source. This will be the format that will be used to pass the datasets into the SemanticModel
49
+
1.**Connection Config:** Defines the parameters needed to connect to your data source (e.g., host, user, password). This is the structure that will be read from `profiles.yml`.
50
+
2.**Data Config:** Defines how to identify a specific table or asset from that source. This is the structure used when passing datasets into the `SemanticModel`.
50
51
51
52
**Example `models.py`:**
52
53
```python
@@ -58,37 +59,40 @@ class MyConnectorConnectionConfig(SchemaBase):
58
59
port: int
59
60
user: str
60
61
password: str
62
+
database: str
61
63
schema: Optional[str] =None
62
64
63
65
classMyConnectorConfig(SchemaBase):
64
66
identifier: str
65
67
type: str="myconnector"
66
68
```
67
69
68
-
Finally, open `src/intugle/adapters/models.py` and add your new `MyConnectorConfig` to the `DataSetData` type hint:
70
+
Finally, open `src/intugle/adapters/models.py` and add your new `MyConnectorConfig` to the `DataSetData` type hint. This is for static type checking and improves developer experience.
69
71
70
72
```python
71
73
# src/intugle/adapters/models.py
72
74
73
75
# ... other imports
74
76
from intugle.adapters.types.myconnector.models import MyConnectorConfig
In `src/intugle/adapters/types/myconnector/myconnector.py`, create your adapter class. It must inherit from `Adapter` and implement its abstract methods.
82
86
83
-
This is a simplified skeleton. You can look at the `DatabricksAdapter` or `SnowflakeAdapter` for a more complete example.
87
+
This is a simplified skeleton. Refer to the `PostgresAdapter` or `DatabricksAdapter` for a complete example.
84
88
85
89
**Example `myconnector.py`:**
86
90
```python
87
91
from typing import Any, Optional
88
92
import pandas as pd
89
93
from intugle.adapters.adapter import Adapter
90
94
from intugle.adapters.factory import AdapterFactory
91
-
from intugle.adapters.models import ColumnProfile, ProfilingOutput
95
+
from intugle.adapters.models import ColumnProfile, ProfilingOutput, DataSetData
92
96
from .models import MyConnectorConfig, MyConnectorConnectionConfig
93
97
from intugle.core import settings
94
98
@@ -101,15 +105,30 @@ class MyConnectorAdapter(Adapter):
2. **Add the adapter to the default plugins list:** Open `src/intugle/adapters/factory.py`and add the path to your new adapter module.
@@ -185,7 +202,7 @@ To make `intugle` aware of your new adapter, you must register it with the facto
185
202
186
203
## Step 5: Add Optional Dependencies
187
204
188
-
If your adapter requires a specific driver library (like `databricks-sql-connector`forDatabricks), you should add it as an optional dependency.
205
+
If your adapter requires a specific driver library (like `asyncpg`forPostgres), you should add it as an optional dependency.
189
206
190
207
1. Open the `pyproject.toml`file at the root of the project.
191
208
2. Add a new extra under the `[project.optional-dependencies]` section.
@@ -200,4 +217,35 @@ If your adapter requires a specific driver library (like `databricks-sql-connect
200
217
201
218
This allows users to install the necessary libraries by running `pip install "intugle[myconnector]"`.
202
219
220
+
## Best Practices and Considerations
221
+
222
+
When implementing your adapter, keep the following points in mind to ensure it is robust, secure, and efficient.
223
+
224
+
### Handling Database Objects
225
+
Your adapter should be able to interact with different types of database objects, not just tables.
226
+
-**Tables, Views, and Materialized Views:** Ensure your `profile` method can read and`create_table_from_query` method can handle creating these different object types. The `materialize` parameter can be used to control this behavior. For example, the Postgres adapter supports `table`, `view`, and`materialized_view`.
227
+
-**Identifier Quoting:** Always wrap table and column identifiers in quotes (e.g., `"` for Postgres and Snowflake) to handle special characters, spaces, and case-sensitivity correctly.
228
+
229
+
### Secure Query Execution
230
+
-**Parameterized Queries:** To prevent SQL injection vulnerabilities, always use parameterized queries when user-provided values are part of a SQL statement. Most database drivers provide a safe way to pass parameters (e.g., using `?`or`$1` placeholders) instead of formatting them directly into the query string.
231
+
232
+
**Do this:**
233
+
```python
234
+
# Example with asyncpg
235
+
await connection.fetch("SELECT * FROM users WHERE id = $1", user_id)
236
+
```
237
+
238
+
**Avoid this:**
239
+
```python
240
+
# Unsafe - vulnerable to SQL injection
241
+
await connection.fetch(f"SELECT * FROM users WHERE id = {user_id}")
242
+
```
243
+
244
+
### Stability and Error Handling
245
+
-**Network Errors and Timeouts:** Implement timeouts for both establishing connections and executing queries. This prevents your application from hanging indefinitely if the database is unresponsive. Your chosen database driver should provide options for setting these timeouts.
246
+
-**Graceful Error Handling:** Wrap database calls in`try...except` blocks to catch potential exceptions (e.g., connection errors, permission denied) and provide clear, informative error messages to the user.
247
+
248
+
### Atomicity
249
+
-**Transactions:** For operations that involve multiple SQL statements (like dropping and then recreating a table), wrap them in a transaction. This ensures that the entire operation is atomic—it either completes successfully, orall changes are rolled back if an error occurs, preventing the database from being left in an inconsistent state.
250
+
203
251
That's it! You have now implemented and registered a custom connector.
`intugle` integrates with PostgreSQL, allowing you to read data from your tables, views, and materialized views, and deploy your `SemanticModel` by setting constraints and comments directly in your PostgreSQL database.
8
+
9
+
## Installation
10
+
11
+
To use `intugle` with PostgreSQL, you must install the optional dependencies:
12
+
13
+
```bash
14
+
pip install "intugle[postgres]"
15
+
```
16
+
17
+
This installs the `asyncpg` and `sqlglot` libraries.
18
+
19
+
## Configuration
20
+
21
+
To connect to your PostgreSQL database, you must provide connection credentials in a `profiles.yml` file at the root of your project. The adapter looks for a top-level `postgres:` key.
22
+
23
+
**Example `profiles.yml`:**
24
+
25
+
```yaml
26
+
postgres:
27
+
host: <your_postgres_host>
28
+
port: 5432# Default PostgreSQL port
29
+
user: <your_username>
30
+
password: <your_password>
31
+
database: <your_database_name>
32
+
schema: <your_schema_name>
33
+
```
34
+
35
+
## Usage
36
+
37
+
### Reading Data from PostgreSQL
38
+
39
+
To include a PostgreSQL table, view, or materialized view in your `SemanticModel`, define it in your input dictionary with `type: "postgres"` and use the `identifier` key to specify the object name.
40
+
41
+
:::caution Important
42
+
The dictionary key for your dataset (e.g., `"CUSTOMERS"`) must exactly match the table, view, or materialized view name specified in the `identifier`.
43
+
:::
44
+
45
+
```python
46
+
from intugle import SemanticModel
47
+
48
+
datasets = {
49
+
"CUSTOMERS": {
50
+
"identifier": "CUSTOMERS", # Must match the key above
51
+
"type": "postgres"
52
+
},
53
+
"ORDERS_VIEW": {
54
+
"identifier": "ORDERS_VIEW", # Can be a view
55
+
"type": "postgres"
56
+
},
57
+
"PRODUCT_MV": {
58
+
"identifier": "PRODUCT_MV", # Can be a materialized view
59
+
"type": "postgres"
60
+
}
61
+
}
62
+
63
+
# Initialize the semantic model
64
+
sm = SemanticModel(datasets, domain="E-commerce")
65
+
66
+
# Build the model as usual
67
+
sm.build()
68
+
```
69
+
70
+
### Materializing Data Products
71
+
72
+
When you use the `DataProduct` class with a PostgreSQL connection, the resulting data product can be materialized as a new **table**, **view**, or **materialized view** directly within your target schema.
Once your semantic model is built, you can deploy it to PostgreSQL using the `deploy()` method. This process syncs your model's intelligence to your physical tables by:
100
+
1. **Syncing Metadata:** It updates the comments on your physical PostgreSQL tables and columns with the business glossaries from your `intugle` model.
101
+
2. **Setting Constraints:** It sets `PRIMARY KEY` and `FOREIGN KEY` constraints on your tables based on the relationships discovered in the model.
102
+
103
+
```python
104
+
# Deploy the model to PostgreSQL
105
+
sm.deploy(target="postgres")
106
+
107
+
# You can also control which parts of the deployment to run
108
+
sm.deploy(
109
+
target="postgres",
110
+
sync_glossary=True,
111
+
set_primary_keys=True,
112
+
set_foreign_keys=True
113
+
)
114
+
```
115
+
116
+
:::info Required Permissions
117
+
To successfully deploy a semantic model, the PostgreSQL user you are using must have the following privileges:
118
+
* `USAGE` on the target schema.
119
+
* `CREATE TABLE`, `CREATE VIEW`, `CREATE MATERIALIZED VIEW` on the target schema.
120
+
* `COMMENT` privilege on tables and columns.
121
+
* `ALTER TABLE` to add primary and foreign key constraints.
Copy file name to clipboardExpand all lines: docsite/docs/core-concepts/semantic-intelligence/dataset.md
+15-4Lines changed: 15 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,13 +25,18 @@ Currently, the library supports `csv`, `parquet`, and `excel` files. More integr
25
25
26
26
### Centralized metadata
27
27
28
-
All analysis results for a data source are stored within the `dataset.source_table_model` attribute. This attribute is a structured Pydantic model that makes accessing metadata predictable and easy. For more convenient access to column-level data, the `DataSet` also provides a `columns` dictionary.
28
+
All analysis results for a data source are stored within the `dataset.source` attribute. This attribute is a structured Pydantic model that contains the `Source` object, which in turn holds the `SourceTables` model. This makes accessing metadata predictable and easy. For more convenient access to column-level data, the `DataSet` also provides a `columns` dictionary.
29
29
30
30
#### Metadata structure and access
31
31
32
32
The library organizes metadata using Pydantic models, but you can access it through the `DataSet`'s attributes.
33
33
34
-
-**Table-Level Metadata**: Accessed via `dataset.source_table_model`.
34
+
-**Source-Level Metadata**: Accessed via `dataset.source`.
35
+
-`.name: str`
36
+
-`.description: str`
37
+
-`.schema: str`
38
+
-`.database: str`
39
+
-**Table-Level Metadata**: Accessed via `dataset.source.table`.
35
40
-`.name: str`
36
41
-`.description: str`
37
42
-`.key: Optional[str]`
@@ -55,9 +60,15 @@ The library organizes metadata using Pydantic models, but you can access it thro
0 commit comments