Skip to content

Commit 22b6dc8

Browse files
committed
Merge branch 'main' into 1.10.latest
2 parents 05a0b8d + d5352d7 commit 22b6dc8

File tree

20 files changed

+529
-32
lines changed

20 files changed

+529
-32
lines changed

AGENTS.md

Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
# AGENTS.md - AI Agent Guide for dbt-databricks
2+
3+
This guide helps AI agents quickly understand and work productively with the dbt-databricks adapter codebase.
4+
5+
## 🚀 Quick Start for Agents
6+
7+
### Project Overview
8+
9+
- **What**: dbt adapter for Databricks Lakehouse platform
10+
- **Based on**: dbt-spark adapter with Databricks-specific enhancements
11+
- **Key Features**: Unity Catalog support, Delta Lake, Python models, streaming tables
12+
- **Language**: Python 3.9+ with Jinja2 SQL macros
13+
- **Architecture**: Inherits from Spark adapter, extends with Databricks-specific functionality
14+
15+
### Essential Files to Understand
16+
17+
```
18+
dbt/adapters/databricks/
19+
├── impl.py # Main adapter implementation (DatabricksAdapter class)
20+
├── connections.py # Connection management and SQL execution
21+
├── credentials.py # Authentication (token, OAuth, Azure AD)
22+
├── relation.py # Databricks-specific relation handling
23+
├── python_models/ # Python model execution on clusters
24+
├── relation_configs/ # Table/view configuration management
25+
└── catalogs/ # Unity Catalog vs Hive Metastore logic
26+
27+
dbt/include/databricks/macros/ # Jinja2 SQL templates
28+
├── adapters/ # Core adapter macros
29+
├── materializations/ # Model materialization strategies
30+
├── relations/ # Table/view creation and management
31+
└── utils/ # Utility macros
32+
```
33+
34+
## 🛠 Development Environment
35+
36+
**Prerequisites**: Python 3.9+ installed on your system
37+
38+
**Install Hatch** (recommended):
39+
40+
```bash
41+
# Install Hatch globally - see https://hatch.pypa.io/dev/install/
42+
pip install hatch
43+
44+
# Create default environment (Hatch installs needed Python versions)
45+
hatch env create
46+
```
47+
48+
**Essential commands**:
49+
50+
```bash
51+
hatch run code-quality # Format, lint, type-check
52+
hatch run unit # Run unit tests
53+
hatch run cluster-e2e # Run functional tests
54+
```
55+
56+
> 📖 **See [Development Guide](docs/dbt-databricks-dev.md)** for comprehensive setup documentation
57+
> 📖 **See [Testing Guide](docs/testing.md)** for comprehensive testing documentation
58+
59+
## 🧪 Testing Strategy
60+
61+
### Test Types & When to Use
62+
63+
1. **Unit Tests** (`tests/unit/`): Fast, isolated, no external dependencies
64+
65+
- Test individual functions, utility methods, SQL generation
66+
- Mock external dependencies (database calls, API calls)
67+
- Run with: `hatch run unit`
68+
69+
2. **Functional Tests** (`tests/functional/`): End-to-end with real Databricks
70+
- Test complete dbt workflows (run, seed, test, snapshot)
71+
- Require live Databricks workspace
72+
- Run with: `hatch run cluster-e2e` (or `uc-cluster-e2e`, `sqlw-e2e`)
73+
74+
### Test Environments
75+
76+
- **HMS Cluster** (`databricks_cluster`): Legacy Hive Metastore
77+
- **Unity Catalog Cluster** (`databricks_uc_cluster`): Modern UC features
78+
- **SQL Warehouse** (`databricks_uc_sql_endpoint`): Serverless compute
79+
80+
### Writing Tests
81+
82+
#### Unit Test Example
83+
84+
```python
85+
from dbt.adapters.databricks.utils import redact_credentials
86+
87+
def test_redact_credentials():
88+
sql = "WITH (credential ('KEY' = 'SECRET_VALUE'))"
89+
expected = "WITH (credential ('KEY' = '[REDACTED]'))"
90+
assert redact_credentials(sql) == expected
91+
```
92+
93+
#### Macro Test Example
94+
95+
```python
96+
from tests.unit.macros.base import MacroTestBase
97+
98+
class TestCreateTable(MacroTestBase):
99+
@pytest.fixture(scope="class")
100+
def template_name(self) -> str:
101+
return "create.sql" # File in macros/relations/table/
102+
103+
@pytest.fixture(scope="class")
104+
def macro_folders_to_load(self) -> list:
105+
return ["macros", "macros/relations/table"]
106+
107+
def test_create_table_sql(self, template_bundle):
108+
result = self.run_macro(template_bundle.template, "create_table",
109+
template_bundle.relation, "select 1")
110+
expected = "create table `database`.`schema`.`table` as (select 1)"
111+
self.assert_sql_equal(result, expected)
112+
```
113+
114+
#### Functional Test Example
115+
116+
```python
117+
from dbt.tests import util
118+
119+
class TestIncrementalModel:
120+
@pytest.fixture(scope="class")
121+
def models(self):
122+
return {
123+
"my_model.sql": """
124+
{{ config(materialized='incremental', unique_key='id') }}
125+
select 1 as id, 'test' as name
126+
"""
127+
}
128+
129+
def test_incremental_run(self, project):
130+
results = util.run_dbt(["run"])
131+
assert len(results) == 1
132+
# Verify table exists and has expected data
133+
results = project.run_sql("select count(*) from my_model", fetch="all")
134+
assert results[0][0] == 1
135+
```
136+
137+
## 🏗 Architecture Deep Dive
138+
139+
### Adapter Inheritance Chain
140+
141+
```
142+
DatabricksAdapter (impl.py)
143+
↳ SparkAdapter (from dbt-spark)
144+
↳ SQLAdapter (from dbt-core)
145+
↳ BaseAdapter (from dbt-core)
146+
```
147+
148+
### Key Components
149+
150+
#### Connection Management (`connections.py`)
151+
152+
- Extends Spark connection manager for Databricks
153+
- Manages connection lifecycle and query execution
154+
- Handles query comments and context tracking
155+
- Integrates with `credentials.py` for authentication and `handle.py` for cursor operations
156+
157+
#### Authentication & Credentials (`credentials.py`)
158+
159+
- Defines credential dataclass with all auth methods (token, OAuth, Azure AD)
160+
- Handles credential validation and session properties
161+
- Manages compute resource configuration
162+
163+
#### SQL Execution (`handle.py`)
164+
165+
- Provides cursor wrapper for Databricks SQL connector
166+
- Implements retry logic and connection pooling
167+
- Handles SQL execution details and error handling
168+
169+
#### Relation Handling (`relation.py`)
170+
171+
- Extends Spark relations with Databricks features
172+
- Handles Unity Catalog 3-level namespace (catalog.schema.table)
173+
- Manages relation metadata and configuration
174+
175+
#### Python Models (`python_models/`)
176+
177+
- Executes Python models on Databricks clusters
178+
- Supports multiple submission methods (jobs, workflows, serverless)
179+
- Handles dependency management and result collection
180+
181+
#### Macros (`dbt/include/databricks/macros/`)
182+
183+
- Jinja2 templates that generate SQL
184+
- Override Spark macros with Databricks-specific logic
185+
- Handle materializations (table, view, incremental, snapshot)
186+
- Implement Databricks features (liquid clustering, column masks, tags)
187+
188+
### Configuration System
189+
190+
Models can be configured with Databricks-specific options:
191+
192+
```sql
193+
{{ config(
194+
materialized='table',
195+
file_format='delta',
196+
liquid_clustering=['column1', 'column2'],
197+
tblproperties={'key': 'value'},
198+
column_tags={'pii_col': ['sensitive']},
199+
location_root='/mnt/external/'
200+
) }}
201+
```
202+
203+
## 🔧 Common Development Tasks
204+
205+
### Adding New Materialization
206+
207+
1. Create macro in `macros/materializations/`
208+
2. Implement SQL generation logic
209+
3. Add configuration options to relation configs
210+
4. Write unit tests for macro
211+
5. Write functional tests for end-to-end behavior
212+
6. Update documentation
213+
214+
### Adding New Adapter Method
215+
216+
1. Add method to `DatabricksAdapter` class in `impl.py`
217+
2. Implement database interaction logic
218+
3. Add corresponding macro if SQL generation needed
219+
4. Write unit tests with mocked database calls
220+
5. Write functional tests with real database
221+
222+
### Modifying SQL Generation
223+
224+
1. Locate relevant macro in `macros/` directory
225+
2. Test current behavior with unit tests
226+
3. Modify macro logic
227+
4. Update unit tests to verify new behavior
228+
5. Run affected functional tests to ensure no regressions
229+
230+
### Adding Configuration Option
231+
232+
1. Add field to appropriate config class in `relation_configs/`
233+
2. Update macro to use new configuration
234+
3. Add validation logic if needed
235+
4. Write tests for both valid and invalid configurations
236+
237+
## 🐛 Debugging Guide
238+
239+
### Common Issues
240+
241+
1. **SQL Generation**: Use macro unit tests with `assert_sql_equal()`
242+
2. **Connection Problems**: Check credentials and environment variables
243+
3. **Python Model Failures**: Check cluster configuration and dependencies
244+
4. **Test Failures**: Review logs in `logs/` directory, look for red text
245+
246+
### Debugging Tools
247+
248+
- **IDE Test Runner**: Set breakpoints and step through code
249+
- **Log Analysis**: dbt generates detailed debug logs by default
250+
- **SQL Inspection**: Print generated SQL in macros for debugging
251+
- **Mock Inspection**: Verify mocked calls in unit tests
252+
253+
## 📚 Key Resources
254+
255+
### Documentation
256+
257+
- **Development**: `docs/dbt-databricks-dev.md` - Setup and workflow
258+
- **Testing**: `docs/testing.md` - Comprehensive testing guide
259+
- **Contributing**: `CONTRIBUTING.MD` - Code standards and PR process
260+
- **User Docs**: [docs.getdbt.com](https://docs.getdbt.com/reference/resource-configs/databricks-configs)
261+
262+
### Important Files for Agents
263+
264+
- `pyproject.toml` - Project configuration, dependencies, tool settings
265+
- `test.env.example` - Template for test environment variables
266+
- `tests/conftest.py` - Global test configuration
267+
- `tests/profiles.py` - Test database profiles
268+
269+
### Code Patterns to Follow
270+
271+
1. **Error Handling**: Use dbt's exception classes, provide helpful messages
272+
2. **Logging**: Use `logger` from `dbt.adapters.databricks.logging`
273+
3. **SQL Generation**: Prefer macros over Python string manipulation
274+
4. **Testing**: Write both unit and functional tests for new features
275+
5. **Configuration**: Use dataclasses with validation for new config options
276+
277+
## 🚨 Common Pitfalls for Agents
278+
279+
1. **Don't modify dbt-spark behavior** without understanding inheritance
280+
2. **Always run code-quality** before committing changes
281+
3. **Test on multiple environments** (HMS, UC cluster, SQL warehouse)
282+
4. **Mock external dependencies** in unit tests properly
283+
5. **Use appropriate test fixtures** from dbt-tests-adapter
284+
6. **Follow SQL normalization** in test assertions with `assert_sql_equal()`
285+
7. **Handle Unity Catalog vs HMS differences** in feature implementations
286+
8. **Consider backward compatibility** when modifying existing behavior
287+
288+
## 🎯 Success Metrics
289+
290+
When working on this codebase, ensure:
291+
292+
- [ ] All tests pass (`hatch run code-quality && hatch run unit`)
293+
- [ ] New features have both unit and functional tests
294+
- [ ] SQL generation follows Databricks best practices
295+
- [ ] Changes maintain backward compatibility
296+
- [ ] Code follows project style guidelines
297+
298+
---
299+
300+
_This guide is maintained by the dbt-databricks team. When making significant architectural changes, update this guide to help future agents understand the codebase._

CHANGELOG.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,15 @@
1-
## dbt-databricks 1.10.13 (TBD)
1+
## dbt-databricks 1.10.13 (October 21, 2025)
2+
3+
### Fixes
4+
5+
- Fix issue causing MV/STs to always trigger as having their config changed ([1181](http://github.com/databricks/dbt-databricks/pull/1181))
6+
- Fix pydantic v2 deprecation warning "Valid config keys have changed in V2" (thanks @Korijn!) ([1194](https://github.com/databricks/dbt-databricks/pull/1194))
7+
- Fix snapshots not applying databricks_tags config ([1192](https://github.com/databricks/dbt-databricks/pull/1192))
8+
- Fix to respect varchar and char when using describe extended as json ([1220](https://github.com/databricks/dbt-databricks/pull/1220))
9+
10+
### Under the hood
11+
12+
- Update dependency versions ([1223](https://github.com/databricks/dbt-databricks/pull/1223))
213

314
## dbt-databricks 1.10.12 (September 8, 2025)
415

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version = "1.10.12"
1+
version = "1.10.13"

dbt/adapters/databricks/column.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,8 @@ def _parse_type_from_json(cls, type_info: Any) -> str:
6464
- map: nested types handled
6565
- decimal: precision, scale handled
6666
- string: collation handled
67-
- varchar: Handled just in case, but the JSON should never contain a varchar type as
68-
these are just STRING types under the hood in Databricks.
69-
- char: Handled just in case, but the JSON should never contain a char type as these are
70-
just STRING types under the hood in Databricks.
67+
- varchar: length handled - preserves varchar(n) in DDL
68+
- char: length handled - preserves char(n) in DDL
7169
7270
Complex types can have other properties in the JSON schema such as nullable, defaults, etc.
7371
but those are ignored as they are not part of data type DDL
@@ -122,10 +120,16 @@ def _parse_type_from_json(cls, type_info: Any) -> str:
122120
return "timestamp"
123121

124122
elif type_name == "varchar":
125-
return "string"
123+
length = type_info.get("length")
124+
if length is not None:
125+
return f"varchar({length})"
126+
return "varchar"
126127

127128
elif type_name == "char":
128-
return "string"
129+
length = type_info.get("length")
130+
if length is not None:
131+
return f"char({length})"
132+
return "char"
129133

130134
else:
131135
# Handle primitive types and any other types

dbt/adapters/databricks/python_models/python_config.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33

44
from pydantic import BaseModel, Field, validator
55

6+
from .util import PYDANTIC_IS_V1
7+
68
DEFAULT_TIMEOUT = 60 * 60 * 24
79

810
JOB_PERMISSIONS = {"CAN_VIEW", "CAN_MANAGE_RUN", "CAN_MANAGE"}
@@ -84,4 +86,7 @@ def run_name(self) -> str:
8486
return f"{self.catalog}-{self.schema_}-{self.identifier}-{uuid.uuid4()}"
8587

8688
class Config:
87-
allow_population_by_field_name = True
89+
if PYDANTIC_IS_V1:
90+
allow_population_by_field_name = True
91+
else:
92+
populate_by_name = True
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from importlib.metadata import version
2+
3+
4+
def is_pydantic_v1() -> bool:
5+
"""Check if the installed version of pydantic is v1."""
6+
try:
7+
pydantic_version = version("pydantic")
8+
major = int(pydantic_version.split(".")[0])
9+
return major < 2
10+
except Exception:
11+
# If we can't determine the version, assume v1 for compatibility
12+
# See: https://github.com/databricks/dbt-databricks/pull/976#issuecomment-2748680090
13+
return True
14+
15+
16+
PYDANTIC_IS_V1 = is_pydantic_v1()

dbt/adapters/databricks/relation_configs/query.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,11 @@ def from_relation_config(cls, relation_config: RelationConfig) -> QueryConfig:
4242
raise DbtRuntimeError(
4343
f"Cannot compile model {relation_config.identifier} with no SQL query"
4444
)
45+
46+
47+
class DescribeQueryProcessor(QueryProcessor):
48+
@classmethod
49+
def from_relation_results(cls, result: RelationResults) -> QueryConfig:
50+
table = result["describe_extended"]
51+
row = next(x for x in table if x[0] == "View Text")
52+
return QueryConfig(query=SqlUtils.clean_sql(row[1]))

0 commit comments

Comments
 (0)