Skip to content

Commit ee5f0d4

Browse files
added implementing a connector and llm injection docs
1 parent 76828fd commit ee5f0d4

File tree

3 files changed

+251
-0
lines changed

3 files changed

+251
-0
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
5+
# Implementing a Connector
6+
7+
:::tip Pro Tip: Use an AI Coding Assistant
8+
The fastest way to implement a new adapter is to use an AI coding assistant like the **Gemini CLI**, **Cursor**, or **Claude**.
9+
10+
1. **Provide Context:** Give the assistant the code for an existing, similar adapter (e.g., `SnowflakeAdapter` or `DatabricksAdapter`).
11+
2. **State Your Goal:** Ask it to replicate the structure and logic for your new data source. For example: *"Using the Snowflake adapter as a reference, create a new adapter for MyConnector."*
12+
3. **Iterate:** The assistant can generate the boilerplate code for the models, the adapter class, and the registration functions, allowing you to focus on the specific implementation details for your database driver.
13+
:::
14+
15+
`intugle` is designed to be extensible, allowing you to connect to any data source by creating a custom adapter. This guide walks you through the process of building your own connector.
16+
17+
If you build a connector that could benefit the community, we strongly encourage you to [open a pull request and contribute it](https://github.com/Intugle/data-tools/blob/main/CONTRIBUTING.md) to the `intugle` project!
18+
19+
## Overview
20+
21+
An adapter is a Python class that inherits from `intugle.adapters.adapter.Adapter` and implements a set of methods for interacting with a specific data source. It handles everything from connecting to the database to profiling data and executing queries.
22+
23+
The core steps to create a new connector are:
24+
1. **Create the Scaffolding:** Set up the necessary directory and files.
25+
2. **Define Configuration Models:** Create Pydantic models for your connector's configuration.
26+
3. **Implement the Adapter Class:** Write the logic to interact with your data source.
27+
4. **Register the Adapter:** Make your new adapter discoverable by the `intugle` factory.
28+
29+
## Step 1: Create the Scaffolding
30+
31+
First, create a new directory for your connector within the `src/intugle/adapters/types/` directory. For a connector named `myconnector`, you would create:
32+
33+
```
34+
src/intugle/adapters/types/myconnector/
35+
├── __init__.py
36+
├── models.py
37+
└── myconnector.py
38+
```
39+
40+
- `__init__.py`: Can be an empty file.
41+
- `models.py`: Will contain the Pydantic configuration models.
42+
- `myconnector.py`: Will contain the main adapter class logic.
43+
44+
## Step 2: Define Configuration Models
45+
46+
In `src/intugle/adapters/types/myconnector/models.py`, you need to define two Pydantic models:
47+
48+
1. **Connection Config:** Defines the parameters needed to connect to your data source (e.g., host, user, password). This will be the format that will be picked up from the profiles.yml
49+
2. **Data Config:** Defines how to identify a specific table or asset from that source. This will be the format that will be used to pass the datasets into the SemanticModel
50+
51+
**Example `models.py`:**
52+
```python
53+
from typing import Optional
54+
from intugle.common.schema import SchemaBase
55+
56+
class MyConnectorConnectionConfig(SchemaBase):
57+
host: str
58+
port: int
59+
user: str
60+
password: str
61+
schema: Optional[str] = None
62+
63+
class MyConnectorConfig(SchemaBase):
64+
identifier: str
65+
type: str = "myconnector"
66+
```
67+
68+
Finally, open `src/intugle/adapters/models.py` and add your new `MyConnectorConfig` to the `DataSetData` type hint:
69+
70+
```python
71+
# src/intugle/adapters/models.py
72+
73+
# ... other imports
74+
from intugle.adapters.types.myconnector.models import MyConnectorConfig
75+
76+
DataSetData = pd.DataFrame | DuckdbConfig | ... | MyConnectorConfig
77+
```
78+
79+
## Step 3: Implement the Adapter Class
80+
81+
In `src/intugle/adapters/types/myconnector/myconnector.py`, create your adapter class. It must inherit from `Adapter` and implement its abstract methods.
82+
83+
This is a simplified skeleton. You can look at the `DatabricksAdapter` or `SnowflakeAdapter` for a more complete example.
84+
85+
**Example `myconnector.py`:**
86+
```python
87+
from typing import Any, Optional
88+
import pandas as pd
89+
from intugle.adapters.adapter import Adapter
90+
from intugle.adapters.factory import AdapterFactory
91+
from intugle.adapters.models import ColumnProfile, ProfilingOutput
92+
from .models import MyConnectorConfig, MyConnectorConnectionConfig
93+
from intugle.core import settings
94+
95+
# Import your database driver
96+
# import myconnector_driver
97+
98+
class MyConnectorAdapter(Adapter):
99+
def __init__(self):
100+
# Initialize your connection here
101+
connection_params = settings.PROFILES.get("myconnector", {})
102+
config = MyConnectorConnectionConfig.model_validate(connection_params)
103+
# self.connection = myconnector_driver.connect(**config.model_dump())
104+
pass
105+
106+
# --- Must be implemented ---
107+
108+
def profile(self, data: Any, table_name: str) -> ProfilingOutput:
109+
# Return table-level metadata: row count, column names, and dtypes
110+
raise NotImplementedError()
111+
112+
def column_profile(self, data: Any, table_name: str, column_name: str, total_count: int) -> Optional[ColumnProfile]:
113+
# Return column-level statistics: null count, distinct count, samples, etc.
114+
raise NotImplementedError()
115+
116+
def execute(self, query: str):
117+
# Execute a query and return the raw results
118+
raise NotImplementedError()
119+
120+
def to_df_from_query(self, query: str) -> pd.DataFrame:
121+
# Execute a query and return the result as a pandas DataFrame
122+
raise NotImplementedError()
123+
124+
def create_table_from_query(self, table_name: str, query: str) -> str:
125+
# Materialize a query as a new table or view
126+
raise NotImplementedError()
127+
128+
def create_new_config_from_etl(self, etl_name: str) -> "DataSetData":
129+
# Return a new MyConnectorConfig for a materialized table
130+
return MyConnectorConfig(identifier=etl_name)
131+
132+
def intersect_count(self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str) -> int:
133+
# Calculate the count of intersecting values between two columns
134+
raise NotImplementedError()
135+
136+
# --- Other required methods ---
137+
138+
def load(self, data: Any, table_name: str):
139+
# For database adapters, this is often a no-op
140+
pass
141+
142+
def to_df(self, data: DataSetData, table_name: str):
143+
# Read an entire table into a pandas DataFrame
144+
config = MyConnectorConfig.model_validate(data)
145+
return self.to_df_from_query(f"SELECT * FROM {config.identifier}")
146+
147+
def get_details(self, data: DataSetData):
148+
config = MyConnectorConfig.model_validate(data)
149+
return config.model_dump()
150+
```
151+
152+
## Step 4: Register the Adapter
153+
154+
To make `intugle` aware of your new adapter, you must register it with the factory.
155+
156+
1. **Add registration functions to `myconnector.py`:** At the bottom of your adapter file, add two functions: one to check if the adapter can handle a given data config, and one to register it with the factory.
157+
158+
```python
159+
# In src/intugle/adapters/types/myconnector/myconnector.py
160+
161+
def can_handle_myconnector(df: Any) -> bool:
162+
try:
163+
MyConnectorConfig.model_validate(df)
164+
return True
165+
except Exception:
166+
return False
167+
168+
def register(factory: AdapterFactory):
169+
# Check if the required driver is installed
170+
# if MYCONNECTOR_DRIVER_AVAILABLE:
171+
factory.register("myconnector", can_handle_myconnector, MyConnectorAdapter)
172+
```
173+
174+
2. **Add the adapter to the default plugins list:** Open `src/intugle/adapters/factory.py` and add the path to your new adapter module.
175+
176+
```python
177+
# In src/intugle/adapters/factory.py
178+
179+
DEFAULT_PLUGINS = [
180+
"intugle.adapters.types.pandas.pandas",
181+
# ... other adapters
182+
"intugle.adapters.types.myconnector.myconnector",
183+
]
184+
```
185+
186+
## Step 5: Add Optional Dependencies
187+
188+
If your adapter requires a specific driver library (like `databricks-sql-connector` for Databricks), you should add it as an optional dependency.
189+
190+
1. Open the `pyproject.toml` file at the root of the project.
191+
2. Add a new extra under the `[project.optional-dependencies]` section.
192+
193+
```toml
194+
# In pyproject.toml
195+
196+
[project.optional-dependencies]
197+
# ... other dependencies
198+
myconnector = ["myconnector-driver-library>=1.0.0"]
199+
```
200+
201+
This allows users to install the necessary libraries by running `pip install "intugle[myconnector]"`.
202+
203+
That's it! You have now implemented and registered a custom connector.

docsite/docs/core-concepts/semantic-intelligence/semantic-search.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,30 @@ export AZURE_OPENAI_ENDPOINT="your-azure-openai-endpoint"
7171
export OPENAI_API_VERSION="your-openai-api-version"
7272
```
7373

74+
#### Using a Custom Embeddings Instance
75+
76+
If you need to use a pre-initialized embeddings model, you can directly inject the model instance.
77+
78+
The custom model must be an instance of `langchain_core.embeddings.embeddings.Embeddings`.
79+
80+
You can set the custom instance by modifying the `intugle.core.settings` module **before** you import and use the `SemanticModel`.
81+
82+
**Example:**
83+
```python
84+
# main.py
85+
from intugle.core import settings
86+
87+
# This must be an object that inherits from Embeddings
88+
my_embeddings_instance = ...
89+
90+
# Set the custom instance in the settings
91+
settings.CUSTOM_EMBEDDINGS_INSTANCE = my_embeddings_instance
92+
93+
# Now, any intugle modules imported after this point will use your custom model
94+
# from intugle import SemanticModel
95+
# ...
96+
```
97+
7498
## Usage with SemanticModel
7599

76100
The simplest way to use semantic search is through the `SemanticModel` after the semantic model has been built.

docsite/docs/getting-started.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,4 +55,28 @@ Here's an example of how to set these variables in your environment:
5555
```bash
5656
export LLM_PROVIDER="openai:gpt-3.5-turbo"
5757
export OPENAI_API_KEY="your-openai-api-key"
58+
```
59+
60+
### Using a Custom LLM Instance
61+
62+
For environments where you need to use a pre-initialized language model, you can directly inject the model instance.
63+
64+
The custom LLM must be an instance of `langchain_core.language_models.chat_models.BaseChatModel`.
65+
66+
You can set the custom instance by modifying the `intugle.core.settings` module **before** you import and use any `intugle` classes.
67+
68+
**Example:**
69+
```python
70+
# main.py
71+
from intugle.core import settings
72+
73+
# This must be an object that inherits from BaseChatModel
74+
my_llm_instance = ...
75+
76+
# Set the custom instance in the settings
77+
settings.CUSTOM_LLM_INSTANCE = my_llm_instance
78+
79+
# Now, any intugle modules imported after this point will use your custom LLM
80+
81+
# ... rest of your code
5882
```

0 commit comments

Comments
 (0)