Skip to content

Commit f029291

Browse files
authored
refactor: Improve python model tests success rate (#1233)
### Description So I think the repeated failure of python model tests in our CI/CD is because we're trying to launch workflows on compute that already has an overwhelmed driver. For SQL, I think extra statements get queued by sql gateway, but I think for python models we eventually just time out saying the cluster couldn't be reached. Also taking this example to backport AGENTS.md because it's really annoying when working on 1.10.x stuff to not have the latest changes. ### Checklist - [ ] I have run this code in development and it appears to resolve the stated issue - [ ] This PR includes tests, or tests are not required/relevant for this PR - [ ] I have updated the `CHANGELOG.md` and added information about my change to the "dbt-databricks next" section.
1 parent cbc6d34 commit f029291

File tree

106 files changed

+373
-198
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+373
-198
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
repos:
22
- repo: https://github.com/astral-sh/ruff-pre-commit
33
# Ruff version.
4-
rev: v0.8.0
4+
rev: v0.14.2
55
hooks:
66
# Run the linter.
77
- id: ruff
88
# Run the formatter.
99
- id: ruff-format
1010

1111
- repo: https://github.com/pre-commit/mirrors-mypy
12-
rev: v1.13.0
12+
rev: v1.18.2
1313
hooks:
1414
- id: mypy
1515
additional_dependencies: [types-requests]

AGENTS.md

Lines changed: 96 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ dbt/adapters/databricks/
2020
├── connections.py # Connection management and SQL execution
2121
├── credentials.py # Authentication (token, OAuth, Azure AD)
2222
├── relation.py # Databricks-specific relation handling
23+
├── dbr_capabilities.py # DBR version capability system
2324
├── python_models/ # Python model execution on clusters
2425
├── relation_configs/ # Table/view configuration management
2526
└── catalogs/ # Unity Catalog vs Hive Metastore logic
@@ -37,20 +38,33 @@ dbt/include/databricks/macros/ # Jinja2 SQL templates
3738

3839
**Install Hatch** (recommended):
3940

41+
For Linux:
42+
4043
```bash
41-
# Install Hatch globally - see https://hatch.pypa.io/dev/install/
42-
pip install hatch
44+
# Download and install standalone binary
45+
curl -Lo hatch.tar.gz https://github.com/pypa/hatch/releases/latest/download/hatch-x86_64-unknown-linux-gnu.tar.gz
46+
tar -xzf hatch.tar.gz
47+
mkdir -p $HOME/bin
48+
mv hatch $HOME/bin/hatch
49+
chmod +x $HOME/bin/hatch
50+
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc
51+
export PATH="$HOME/bin:$PATH"
4352

4453
# Create default environment (Hatch installs needed Python versions)
4554
hatch env create
4655
```
4756

57+
For other platforms: see https://hatch.pypa.io/latest/install/
58+
4859
**Essential commands**:
4960

5061
```bash
5162
hatch run code-quality # Format, lint, type-check
5263
hatch run unit # Run unit tests
5364
hatch run cluster-e2e # Run functional tests
65+
66+
# For specific tests, use pytest directly:
67+
hatch run pytest path/to/test_file.py::TestClass::test_method -v
5468
```
5569

5670
> 📖 **See [Development Guide](docs/dbt-databricks-dev.md)** for comprehensive setup documentation
@@ -113,17 +127,38 @@ class TestCreateTable(MacroTestBase):
113127

114128
#### Functional Test Example
115129

130+
**Important**: SQL models and YAML schemas should be defined in a `fixtures.py` file in the same directory as the test, not inline in the test class. This keeps tests clean and fixtures reusable.
131+
132+
**fixtures.py:**
133+
134+
```python
135+
my_model_sql = """
136+
{{ config(materialized='incremental', unique_key='id') }}
137+
select 1 as id, 'test' as name
138+
"""
139+
140+
my_schema_yml = """
141+
version: 2
142+
models:
143+
- name: my_model
144+
columns:
145+
- name: id
146+
description: 'ID column'
147+
"""
148+
```
149+
150+
**test_my_feature.py:**
151+
116152
```python
117153
from dbt.tests import util
154+
from tests.functional.adapter.my_feature import fixtures
118155

119156
class TestIncrementalModel:
120157
@pytest.fixture(scope="class")
121158
def models(self):
122159
return {
123-
"my_model.sql": """
124-
{{ config(materialized='incremental', unique_key='id') }}
125-
select 1 as id, 'test' as name
126-
"""
160+
"my_model.sql": fixtures.my_model_sql,
161+
"schema.yml": fixtures.my_schema_yml,
127162
}
128163

129164
def test_incremental_run(self, project):
@@ -147,6 +182,46 @@ DatabricksAdapter (impl.py)
147182

148183
### Key Components
149184

185+
#### DBR Capability System (`dbr_capabilities.py`)
186+
187+
- **Purpose**: Centralized management of DBR version-dependent features
188+
- **Key Features**:
189+
- Per-compute caching (different clusters can have different capabilities)
190+
- Named capabilities instead of magic version numbers
191+
- Automatic detection of DBR version and SQL warehouse environments
192+
- **Supported Capabilities**:
193+
- `TIMESTAMPDIFF` (DBR 10.4+): Advanced date/time functions
194+
- `INSERT_BY_NAME` (DBR 12.2+): Name-based column matching in INSERT
195+
- `ICEBERG` (DBR 14.3+): Apache Iceberg table format
196+
- `COMMENT_ON_COLUMN` (DBR 16.1+): Modern column comment syntax
197+
- `JSON_COLUMN_METADATA` (DBR 16.2+): Efficient metadata retrieval
198+
- **Usage in Code**:
199+
200+
```python
201+
# In Python code
202+
if adapter.has_capability(DBRCapability.ICEBERG):
203+
# Use Iceberg features
204+
205+
# In Jinja macros
206+
{% if adapter.has_dbr_capability('comment_on_column') %}
207+
COMMENT ON COLUMN ...
208+
{% else %}
209+
ALTER TABLE ... ALTER COLUMN ...
210+
{% endif %}
211+
212+
{% if adapter.has_dbr_capability('insert_by_name') %}
213+
INSERT INTO table BY NAME SELECT ...
214+
{% else %}
215+
INSERT INTO table SELECT ... -- positional
216+
{% endif %}
217+
```
218+
219+
- **Adding New Capabilities**:
220+
1. Add to `DBRCapability` enum
221+
2. Add `CapabilitySpec` with version requirements
222+
3. Use `has_capability()` or `require_capability()` in code
223+
- **Important**: Each compute resource (identified by `http_path`) maintains its own capability cache
224+
150225
#### Connection Management (`connections.py`)
151226

152227
- Extends Spark connection manager for Databricks
@@ -184,6 +259,7 @@ DatabricksAdapter (impl.py)
184259
- Override Spark macros with Databricks-specific logic
185260
- Handle materializations (table, view, incremental, snapshot)
186261
- Implement Databricks features (liquid clustering, column masks, tags)
262+
- **Important**: To override a `spark__macro_name` macro, create `databricks__macro_name` (NOT `spark__macro_name`)
187263

188264
### Configuration System
189265

@@ -256,6 +332,7 @@ Models can be configured with Databricks-specific options:
256332

257333
- **Development**: `docs/dbt-databricks-dev.md` - Setup and workflow
258334
- **Testing**: `docs/testing.md` - Comprehensive testing guide
335+
- **DBR Capabilities**: `docs/dbr-capability-system.md` - Version-dependent features
259336
- **Contributing**: `CONTRIBUTING.MD` - Code standards and PR process
260337
- **User Docs**: [docs.getdbt.com](https://docs.getdbt.com/reference/resource-configs/databricks-configs)
261338

@@ -273,6 +350,11 @@ Models can be configured with Databricks-specific options:
273350
3. **SQL Generation**: Prefer macros over Python string manipulation
274351
4. **Testing**: Write both unit and functional tests for new features
275352
5. **Configuration**: Use dataclasses with validation for new config options
353+
6. **Imports**: Always import at the top of the file, never use local imports within functions or methods
354+
7. **Version Checks**: Use capability system instead of direct version comparisons:
355+
-`if adapter.compare_dbr_version(16, 1) >= 0:`
356+
-`if adapter.has_capability(DBRCapability.COMMENT_ON_COLUMN):`
357+
-`{% if adapter.has_dbr_capability('comment_on_column') %}`
276358

277359
## 🚨 Common Pitfalls for Agents
278360

@@ -284,12 +366,20 @@ Models can be configured with Databricks-specific options:
284366
6. **Follow SQL normalization** in test assertions with `assert_sql_equal()`
285367
7. **Handle Unity Catalog vs HMS differences** in feature implementations
286368
8. **Consider backward compatibility** when modifying existing behavior
369+
9. **Use capability system for version checks** - Never add new `compare_dbr_version()` calls
370+
10. **Remember per-compute caching** - Different clusters may have different capabilities in the same run
287371

288372
## 🎯 Success Metrics
289373

290374
When working on this codebase, ensure:
291375

292376
- [ ] All tests pass (`hatch run code-quality && hatch run unit`)
377+
- [ ] **CRITICAL: Run affected functional tests before declaring success**
378+
- If you modified connection/capability logic: Run tests that use multiple computes or check capabilities
379+
- If you modified incremental materializations: Run `tests/functional/adapter/incremental/`
380+
- If you modified Python models: Run `tests/functional/adapter/python_model/`
381+
- If you modified macros: Run tests that use those macros
382+
- **NEVER declare "mission accomplished" without running functional tests for affected features**
293383
- [ ] New features have both unit and functional tests
294384
- [ ] SQL generation follows Databricks best practices
295385
- [ ] Changes maintain backward compatibility

dbt/adapters/databricks/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from dbt.adapters.base import AdapterPlugin
2+
23
from dbt.adapters.databricks.credentials import DatabricksCredentials
34
from dbt.adapters.databricks.impl import DatabricksAdapter
45
from dbt.include import databricks

dbt/adapters/databricks/behaviors/columns.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
from abc import ABC, abstractmethod
22

3+
from dbt.adapters.sql import SQLAdapter
34
from dbt_common.exceptions import DbtDatabaseError
45
from dbt_common.utils.dict import AttrDict
56

67
from dbt.adapters.databricks.column import DatabricksColumn
78
from dbt.adapters.databricks.relation import DatabricksRelation
89
from dbt.adapters.databricks.utils import handle_missing_objects
9-
from dbt.adapters.sql import SQLAdapter
1010

1111

1212
class GetColumnsBehavior(ABC):

dbt/adapters/databricks/catalogs/_hive_metastore.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from dbt.adapters.catalogs import CatalogIntegration, CatalogIntegrationConfig
44
from dbt.adapters.contracts.relation import RelationConfig
5+
56
from dbt.adapters.databricks import constants, parse_model
67
from dbt.adapters.databricks.catalogs._relation import DatabricksCatalogRelation
78

dbt/adapters/databricks/catalogs/_unity.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from dbt.adapters.catalogs import CatalogIntegration, CatalogIntegrationConfig
44
from dbt.adapters.contracts.relation import RelationConfig
5+
56
from dbt.adapters.databricks import constants, parse_model
67
from dbt.adapters.databricks.catalogs._relation import DatabricksCatalogRelation
78

dbt/adapters/databricks/column.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
from dataclasses import dataclass
22
from typing import Any, ClassVar, Optional
33

4-
from dbt.adapters.databricks.utils import quote
54
from dbt.adapters.spark.column import SparkColumn
65

6+
from dbt.adapters.databricks.utils import quote
7+
78

89
@dataclass
910
class DatabricksColumn(SparkColumn):

dbt/adapters/databricks/connections.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,6 @@
55
from multiprocessing.context import SpawnContext
66
from typing import TYPE_CHECKING, Any, Optional, cast
77

8-
from dbt_common.events.contextvars import get_node_info
9-
from dbt_common.events.functions import fire_event
10-
from dbt_common.exceptions import DbtDatabaseError, DbtRuntimeError
11-
from dbt_common.utils import cast_to_str
12-
13-
from databricks.sql import __version__ as dbsql_version
14-
from databricks.sql.exc import Error
158
from dbt.adapters.base.query_headers import MacroQueryStringSetter
169
from dbt.adapters.contracts.connection import (
1710
DEFAULT_QUERY_COMMENT,
@@ -22,6 +15,22 @@
2215
Identifier,
2316
LazyHandle,
2417
)
18+
from dbt.adapters.events.types import (
19+
ConnectionClosedInCleanup,
20+
ConnectionReused,
21+
ConnectionUsed,
22+
NewConnection,
23+
SQLQuery,
24+
SQLQueryStatus,
25+
)
26+
from dbt.adapters.spark.connections import SparkConnectionManager
27+
from dbt_common.events.contextvars import get_node_info
28+
from dbt_common.events.functions import fire_event
29+
from dbt_common.exceptions import DbtDatabaseError, DbtRuntimeError
30+
from dbt_common.utils import cast_to_str
31+
32+
from databricks.sql import __version__ as dbsql_version
33+
from databricks.sql.exc import Error
2534
from dbt.adapters.databricks.__version__ import version as __version__
2635
from dbt.adapters.databricks.api_client import DatabricksApiClient
2736
from dbt.adapters.databricks.credentials import (
@@ -37,15 +46,6 @@
3746
from dbt.adapters.databricks.logging import logger
3847
from dbt.adapters.databricks.python_models.run_tracking import PythonRunTracker
3948
from dbt.adapters.databricks.utils import is_cluster_http_path, redact_credentials
40-
from dbt.adapters.events.types import (
41-
ConnectionClosedInCleanup,
42-
ConnectionReused,
43-
ConnectionUsed,
44-
NewConnection,
45-
SQLQuery,
46-
SQLQueryStatus,
47-
)
48-
from dbt.adapters.spark.connections import SparkConnectionManager
4949

5050
if TYPE_CHECKING:
5151
from agate import Table

dbt/adapters/databricks/constraints.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
from typing import Any, ClassVar, Optional, TypeVar
44
from uuid import uuid4
55

6+
from dbt.adapters.base import ConstraintSupport
7+
from dbt.adapters.events.types import ConstraintNotEnforced, ConstraintNotSupported
68
from dbt_common.contracts.constraints import (
79
ColumnLevelConstraint,
810
ConstraintType,
@@ -11,9 +13,6 @@
1113
from dbt_common.events.functions import warn_or_error
1214
from dbt_common.exceptions import DbtValidationError
1315

14-
from dbt.adapters.base import ConstraintSupport
15-
from dbt.adapters.events.types import ConstraintNotEnforced, ConstraintNotSupported
16-
1716
# Support constants
1817
CONSTRAINT_SUPPORT = {
1918
ConstraintType.check: ConstraintSupport.ENFORCED,

dbt/adapters/databricks/credentials.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@
55
from dataclasses import dataclass, field
66
from typing import Any, Callable, Optional, cast
77

8+
from dbt.adapters.contracts.connection import Credentials
89
from dbt_common.exceptions import DbtConfigError, DbtValidationError
910
from mashumaro import DataClassDictMixin
1011
from requests import PreparedRequest
1112
from requests.auth import AuthBase
1213

1314
from databricks.sdk import WorkspaceClient
1415
from databricks.sdk.core import Config, CredentialsProvider
15-
from dbt.adapters.contracts.connection import Credentials
1616
from dbt.adapters.databricks.global_state import GlobalState
1717
from dbt.adapters.databricks.logging import logger
1818

0 commit comments

Comments
 (0)