Skip to content

Commit 87073fe

Browse files
authored
chore: Adding AGENTS.md (#1183)
### Description Adds AGENTS.md to improve developing this project with AI ### Checklist - [ ] I have run this code in development and it appears to resolve the stated issue - [ ] This PR includes tests, or tests are not required/relevant for this PR - [ ] I have updated the `CHANGELOG.md` and added information about my change to the "dbt-databricks next" section.
1 parent affbd61 commit 87073fe

File tree

1 file changed

+300
-0
lines changed

1 file changed

+300
-0
lines changed

AGENTS.md

Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
# AGENTS.md - AI Agent Guide for dbt-databricks
2+
3+
This guide helps AI agents quickly understand and work productively with the dbt-databricks adapter codebase.
4+
5+
## 🚀 Quick Start for Agents
6+
7+
### Project Overview
8+
9+
- **What**: dbt adapter for Databricks Lakehouse platform
10+
- **Based on**: dbt-spark adapter with Databricks-specific enhancements
11+
- **Key Features**: Unity Catalog support, Delta Lake, Python models, streaming tables
12+
- **Language**: Python 3.9+ with Jinja2 SQL macros
13+
- **Architecture**: Inherits from Spark adapter, extends with Databricks-specific functionality
14+
15+
### Essential Files to Understand
16+
17+
```
18+
dbt/adapters/databricks/
19+
├── impl.py # Main adapter implementation (DatabricksAdapter class)
20+
├── connections.py # Connection management and SQL execution
21+
├── credentials.py # Authentication (token, OAuth, Azure AD)
22+
├── relation.py # Databricks-specific relation handling
23+
├── python_models/ # Python model execution on clusters
24+
├── relation_configs/ # Table/view configuration management
25+
└── catalogs/ # Unity Catalog vs Hive Metastore logic
26+
27+
dbt/include/databricks/macros/ # Jinja2 SQL templates
28+
├── adapters/ # Core adapter macros
29+
├── materializations/ # Model materialization strategies
30+
├── relations/ # Table/view creation and management
31+
└── utils/ # Utility macros
32+
```
33+
34+
## 🛠 Development Environment
35+
36+
**Prerequisites**: Python 3.9+ installed on your system
37+
38+
**Install Hatch** (recommended):
39+
40+
```bash
41+
# Install Hatch globally - see https://hatch.pypa.io/dev/install/
42+
pip install hatch
43+
44+
# Create default environment (Hatch installs needed Python versions)
45+
hatch env create
46+
```
47+
48+
**Essential commands**:
49+
50+
```bash
51+
hatch run code-quality # Format, lint, type-check
52+
hatch run unit # Run unit tests
53+
hatch run cluster-e2e # Run functional tests
54+
```
55+
56+
> 📖 **See [Development Guide](docs/dbt-databricks-dev.md)** for comprehensive setup documentation
57+
> 📖 **See [Testing Guide](docs/testing.md)** for comprehensive testing documentation
58+
59+
## 🧪 Testing Strategy
60+
61+
### Test Types & When to Use
62+
63+
1. **Unit Tests** (`tests/unit/`): Fast, isolated, no external dependencies
64+
65+
- Test individual functions, utility methods, SQL generation
66+
- Mock external dependencies (database calls, API calls)
67+
- Run with: `hatch run unit`
68+
69+
2. **Functional Tests** (`tests/functional/`): End-to-end with real Databricks
70+
- Test complete dbt workflows (run, seed, test, snapshot)
71+
- Require live Databricks workspace
72+
- Run with: `hatch run cluster-e2e` (or `uc-cluster-e2e`, `sqlw-e2e`)
73+
74+
### Test Environments
75+
76+
- **HMS Cluster** (`databricks_cluster`): Legacy Hive Metastore
77+
- **Unity Catalog Cluster** (`databricks_uc_cluster`): Modern UC features
78+
- **SQL Warehouse** (`databricks_uc_sql_endpoint`): Serverless compute
79+
80+
### Writing Tests
81+
82+
#### Unit Test Example
83+
84+
```python
85+
from dbt.adapters.databricks.utils import redact_credentials
86+
87+
def test_redact_credentials():
88+
sql = "WITH (credential ('KEY' = 'SECRET_VALUE'))"
89+
expected = "WITH (credential ('KEY' = '[REDACTED]'))"
90+
assert redact_credentials(sql) == expected
91+
```
92+
93+
#### Macro Test Example
94+
95+
```python
96+
from tests.unit.macros.base import MacroTestBase
97+
98+
class TestCreateTable(MacroTestBase):
99+
@pytest.fixture(scope="class")
100+
def template_name(self) -> str:
101+
return "create.sql" # File in macros/relations/table/
102+
103+
@pytest.fixture(scope="class")
104+
def macro_folders_to_load(self) -> list:
105+
return ["macros", "macros/relations/table"]
106+
107+
def test_create_table_sql(self, template_bundle):
108+
result = self.run_macro(template_bundle.template, "create_table",
109+
template_bundle.relation, "select 1")
110+
expected = "create table `database`.`schema`.`table` as (select 1)"
111+
self.assert_sql_equal(result, expected)
112+
```
113+
114+
#### Functional Test Example
115+
116+
```python
117+
from dbt.tests import util
118+
119+
class TestIncrementalModel:
120+
@pytest.fixture(scope="class")
121+
def models(self):
122+
return {
123+
"my_model.sql": """
124+
{{ config(materialized='incremental', unique_key='id') }}
125+
select 1 as id, 'test' as name
126+
"""
127+
}
128+
129+
def test_incremental_run(self, project):
130+
results = util.run_dbt(["run"])
131+
assert len(results) == 1
132+
# Verify table exists and has expected data
133+
results = project.run_sql("select count(*) from my_model", fetch="all")
134+
assert results[0][0] == 1
135+
```
136+
137+
## 🏗 Architecture Deep Dive
138+
139+
### Adapter Inheritance Chain
140+
141+
```
142+
DatabricksAdapter (impl.py)
143+
↳ SparkAdapter (from dbt-spark)
144+
↳ SQLAdapter (from dbt-core)
145+
↳ BaseAdapter (from dbt-core)
146+
```
147+
148+
### Key Components
149+
150+
#### Connection Management (`connections.py`)
151+
152+
- Extends Spark connection manager for Databricks
153+
- Manages connection lifecycle and query execution
154+
- Handles query comments and context tracking
155+
- Integrates with `credentials.py` for authentication and `handle.py` for cursor operations
156+
157+
#### Authentication & Credentials (`credentials.py`)
158+
159+
- Defines credential dataclass with all auth methods (token, OAuth, Azure AD)
160+
- Handles credential validation and session properties
161+
- Manages compute resource configuration
162+
163+
#### SQL Execution (`handle.py`)
164+
165+
- Provides cursor wrapper for Databricks SQL connector
166+
- Implements retry logic and connection pooling
167+
- Handles SQL execution details and error handling
168+
169+
#### Relation Handling (`relation.py`)
170+
171+
- Extends Spark relations with Databricks features
172+
- Handles Unity Catalog 3-level namespace (catalog.schema.table)
173+
- Manages relation metadata and configuration
174+
175+
#### Python Models (`python_models/`)
176+
177+
- Executes Python models on Databricks clusters
178+
- Supports multiple submission methods (jobs, workflows, serverless)
179+
- Handles dependency management and result collection
180+
181+
#### Macros (`dbt/include/databricks/macros/`)
182+
183+
- Jinja2 templates that generate SQL
184+
- Override Spark macros with Databricks-specific logic
185+
- Handle materializations (table, view, incremental, snapshot)
186+
- Implement Databricks features (liquid clustering, column masks, tags)
187+
188+
### Configuration System
189+
190+
Models can be configured with Databricks-specific options:
191+
192+
```sql
193+
{{ config(
194+
materialized='table',
195+
file_format='delta',
196+
liquid_clustering=['column1', 'column2'],
197+
tblproperties={'key': 'value'},
198+
column_tags={'pii_col': ['sensitive']},
199+
location_root='/mnt/external/'
200+
) }}
201+
```
202+
203+
## 🔧 Common Development Tasks
204+
205+
### Adding New Materialization
206+
207+
1. Create macro in `macros/materializations/`
208+
2. Implement SQL generation logic
209+
3. Add configuration options to relation configs
210+
4. Write unit tests for macro
211+
5. Write functional tests for end-to-end behavior
212+
6. Update documentation
213+
214+
### Adding New Adapter Method
215+
216+
1. Add method to `DatabricksAdapter` class in `impl.py`
217+
2. Implement database interaction logic
218+
3. Add corresponding macro if SQL generation needed
219+
4. Write unit tests with mocked database calls
220+
5. Write functional tests with real database
221+
222+
### Modifying SQL Generation
223+
224+
1. Locate relevant macro in `macros/` directory
225+
2. Test current behavior with unit tests
226+
3. Modify macro logic
227+
4. Update unit tests to verify new behavior
228+
5. Run affected functional tests to ensure no regressions
229+
230+
### Adding Configuration Option
231+
232+
1. Add field to appropriate config class in `relation_configs/`
233+
2. Update macro to use new configuration
234+
3. Add validation logic if needed
235+
4. Write tests for both valid and invalid configurations
236+
237+
## 🐛 Debugging Guide
238+
239+
### Common Issues
240+
241+
1. **SQL Generation**: Use macro unit tests with `assert_sql_equal()`
242+
2. **Connection Problems**: Check credentials and environment variables
243+
3. **Python Model Failures**: Check cluster configuration and dependencies
244+
4. **Test Failures**: Review logs in `logs/` directory, look for red text
245+
246+
### Debugging Tools
247+
248+
- **IDE Test Runner**: Set breakpoints and step through code
249+
- **Log Analysis**: dbt generates detailed debug logs by default
250+
- **SQL Inspection**: Print generated SQL in macros for debugging
251+
- **Mock Inspection**: Verify mocked calls in unit tests
252+
253+
## 📚 Key Resources
254+
255+
### Documentation
256+
257+
- **Development**: `docs/dbt-databricks-dev.md` - Setup and workflow
258+
- **Testing**: `docs/testing.md` - Comprehensive testing guide
259+
- **Contributing**: `CONTRIBUTING.MD` - Code standards and PR process
260+
- **User Docs**: [docs.getdbt.com](https://docs.getdbt.com/reference/resource-configs/databricks-configs)
261+
262+
### Important Files for Agents
263+
264+
- `pyproject.toml` - Project configuration, dependencies, tool settings
265+
- `test.env.example` - Template for test environment variables
266+
- `tests/conftest.py` - Global test configuration
267+
- `tests/profiles.py` - Test database profiles
268+
269+
### Code Patterns to Follow
270+
271+
1. **Error Handling**: Use dbt's exception classes, provide helpful messages
272+
2. **Logging**: Use `logger` from `dbt.adapters.databricks.logging`
273+
3. **SQL Generation**: Prefer macros over Python string manipulation
274+
4. **Testing**: Write both unit and functional tests for new features
275+
5. **Configuration**: Use dataclasses with validation for new config options
276+
277+
## 🚨 Common Pitfalls for Agents
278+
279+
1. **Don't modify dbt-spark behavior** without understanding inheritance
280+
2. **Always run code-quality** before committing changes
281+
3. **Test on multiple environments** (HMS, UC cluster, SQL warehouse)
282+
4. **Mock external dependencies** in unit tests properly
283+
5. **Use appropriate test fixtures** from dbt-tests-adapter
284+
6. **Follow SQL normalization** in test assertions with `assert_sql_equal()`
285+
7. **Handle Unity Catalog vs HMS differences** in feature implementations
286+
8. **Consider backward compatibility** when modifying existing behavior
287+
288+
## 🎯 Success Metrics
289+
290+
When working on this codebase, ensure:
291+
292+
- [ ] All tests pass (`hatch run code-quality && hatch run unit`)
293+
- [ ] New features have both unit and functional tests
294+
- [ ] SQL generation follows Databricks best practices
295+
- [ ] Changes maintain backward compatibility
296+
- [ ] Code follows project style guidelines
297+
298+
---
299+
300+
_This guide is maintained by the dbt-databricks team. When making significant architectural changes, update this guide to help future agents understand the codebase._

0 commit comments

Comments
 (0)