Skip to content

Commit 3277ef8

Browse files
authored
feat: Expand dbt version support (1.7.x - 1.10.x) and fix dbt 1.8+ compatibility (#127)
* feat: support other dbt versions * chore: fix linter issue
1 parent a18ac3a commit 3277ef8

File tree

13 files changed

+1065
-83
lines changed

13 files changed

+1065
-83
lines changed

AGENTS.md

Lines changed: 46 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,17 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
66

77
**data-pipelines-cli** (`dp`) is a CLI tool for managing data platform workflows. It orchestrates dbt projects, cloud deployments, Docker builds, and multi-service integrations (Airbyte, DataHub, Looker). Projects are created from templates using copier, compiled with environment-specific configs, and deployed to cloud storage (GCS, S3).
88

9-
**Version:** 0.31.0 | **Python:** 3.9-3.12 | **License:** Apache 2.0
9+
**Version:** 0.32.0 (unreleased) | **Python:** 3.9-3.12 | **License:** Apache 2.0
10+
11+
## Documentation Style
12+
13+
Write concise, technical, minimal descriptions. Developer-to-developer communication:
14+
- State facts, no verbose explanations
15+
- Focus on what changed, not why it matters
16+
- Example: "Expanded dbt-core support: `>=1.7.3,<2.0.0`" (good) vs "We expanded dbt support to allow users more flexibility..." (bad)
17+
- CHANGELOG: List changes only, no context or justification
18+
- Code comments: Describe implementation, not rationale
19+
- Commit messages: Precise technical changes
1020

1121
## Quick Command Reference
1222

@@ -33,6 +43,15 @@ flake8 data_pipelines_cli tests
3343
mypy data_pipelines_cli
3444
```
3545

46+
### Installation
47+
48+
Must install with adapter extra:
49+
```bash
50+
pip install data-pipelines-cli[snowflake] # Snowflake (primary)
51+
pip install data-pipelines-cli[bigquery] # BigQuery
52+
pip install data-pipelines-cli[snowflake,docker,datahub,gcs] # Multiple extras
53+
```
54+
3655
### CLI Workflow
3756
```bash
3857
# Initialize global config
@@ -177,6 +196,7 @@ run_dbt_command(("run",), env, profiles_path)
177196
|------|-------|---------|
178197
| **cli_commands/compile.py** | 160+ | Orchestrates compilation: file copying, config merging, dbt compile, Docker build |
179198
| **cli_commands/deploy.py** | 240+ | Orchestrates deployment: Docker, DataHub, Airbyte, Looker, cloud storage |
199+
| **cli_commands/publish.py** | 140+ | Publish dbt package to Git; parses manifest.json as plain JSON (no dbt Python API) |
180200
| **config_generation.py** | 175+ | Config merging logic, profiles.yml generation |
181201
| **dbt_utils.py** | 95+ | dbt subprocess execution with variable aggregation |
182202
| **filesystem_utils.py** | 75+ | LocalRemoteSync class for cloud storage (uses fsspec) |
@@ -190,7 +210,7 @@ run_dbt_command(("run",), env, profiles_path)
190210
### Core (always installed)
191211
- **click** (8.1.3): CLI framework
192212
- **copier** (7.0.1): Project templating
193-
- **dbt-core** (1.7.3): Data build tool
213+
- **dbt-core** (>=1.7.3,<2.0.0): Data build tool - supports 1.7.x through 1.10.x
194214
- **fsspec** (>=2024.6.0,<2025.0.0): Cloud filesystem abstraction
195215
- **jinja2** (3.1.2): Template rendering
196216
- **pyyaml** (6.0.1): Config parsing
@@ -200,11 +220,11 @@ run_dbt_command(("run",), env, profiles_path)
200220

201221
### Optional Extras
202222
```bash
203-
# dbt adapters
204-
pip install data-pipelines-cli[bigquery] # dbt-bigquery==1.7.2
205-
pip install data-pipelines-cli[snowflake] # dbt-snowflake==1.7.1
206-
pip install data-pipelines-cli[postgres] # dbt-postgres==1.7.3
207-
pip install data-pipelines-cli[databricks] # dbt-databricks-factory
223+
# dbt adapters (version ranges support 1.7.x through 1.10.x)
224+
pip install data-pipelines-cli[snowflake] # dbt-snowflake>=1.7.1,<2.0.0 (PRIMARY)
225+
pip install data-pipelines-cli[bigquery] # dbt-bigquery>=1.7.2,<2.0.0
226+
pip install data-pipelines-cli[postgres] # dbt-postgres>=1.7.3,<2.0.0
227+
pip install data-pipelines-cli[databricks] # dbt-databricks-factory>=0.1.1
208228
pip install data-pipelines-cli[dbt-all] # All adapters
209229

210230
# Cloud/integrations
@@ -332,6 +352,25 @@ my_pipeline/ # Created by dp create
332352
- **Code generation** requires compilation first (needs manifest.json)
333353
- **Test mocking:** S3 uses moto, GCS uses gcp-storage-emulator
334354

355+
## Recent Changes (v0.32.0 - Unreleased)
356+
357+
**dbt Version Support Expanded**
358+
- All adapters: version ranges `>=1.7.x,<2.0.0` (was exact pins)
359+
- dbt-core removed from INSTALL_REQUIREMENTS (adapters provide it)
360+
- Snowflake added to test suite (primary adapter)
361+
- **CRITICAL:** `cli_commands/publish.py` refactored to parse `manifest.json` as plain JSON instead of using dbt Python API (fixes dbt 1.8+ compatibility)
362+
- All other commands use subprocess calls to dbt CLI
363+
- No dependency on unstable `dbt.contracts.*` modules
364+
- Works across dbt 1.7.x through 1.10.x (verified with 70 test executions)
365+
- See `design/001-dbt-manifest-api-migration.md` for full details
366+
367+
**dbt Pre-release Installation Edge Case**
368+
- Stable `dbt-snowflake==1.10.3` declares `dbt-core>=1.10.0rc0` dependency
369+
- The `rc0` constraint allows pip to install beta versions (e.g., `dbt-core==1.11.0b4`)
370+
- This is PEP 440 standard behavior, not a bug
371+
- Added troubleshooting documentation: `pip install --force-reinstall 'dbt-core>=1.7.3,<2.0.0'`
372+
- No code changes needed (rare edge case, self-correcting when stable releases update)
373+
335374
## Recent Changes (v0.31.0)
336375

337376
**Python 3.11/3.12 Support**

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,23 @@
22

33
## [Unreleased]
44

5+
### Changed
6+
7+
- Expanded all dbt adapter version ranges to `>=1.7.x,<2.0.0` (Snowflake, BigQuery, Postgres, Redshift, Glue)
8+
- Added Snowflake adapter to test suite (tox.ini)
9+
- Removed dbt-core from base requirements (all adapters provide it as dependency)
10+
- Jinja2 version constraint: `==3.1.2``>=3.1.3,<4`
11+
12+
### Fixed
13+
14+
- `dp publish` compatibility with dbt 1.8+ (removed dependency on unstable Python API)
15+
- CLI import failure when GitPython not installed
16+
17+
### Removed
18+
19+
- MarkupSafe pin (managed by Jinja2)
20+
- Werkzeug dependency (unused)
21+
522
## [0.31.0] - 2025-11-03
623

724
## [0.30.0] - 2023-12-08

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ pip install -r requirements-dev.txt
1010
pre-commit install
1111
```
1212

13+
**Note:** A dbt adapter extra (e.g., `bigquery`, `snowflake`) is required because dbt-core is provided as a transitive dependency. Any adapter can be used for development.
14+
1315
## Running Tests
1416

1517
```bash

README.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
[![PyPI Version](https://badge.fury.io/py/data-pipelines-cli.svg)](https://pypi.org/project/data-pipelines-cli/)
55
[![Downloads](https://pepy.tech/badge/data-pipelines-cli)](https://pepy.tech/project/data-pipelines-cli)
66
[![Maintainability](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/maintainability)](https://codeclimate.com/github/getindata/data-pipelines-cli/maintainability)
7-
[![Test Coverage](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/test_coverage)](https://codeclimate.com/github/getindata/data-pipelines-cli/test_coverage)
7+
[![Test Coverage](https://img.shields.io/badge/test%20coverage-95%25-brightgreen.svg)](https://github.com/getindata/data-pipelines-cli)
88
[![Documentation Status](https://readthedocs.org/projects/data-pipelines-cli/badge/?version=latest)](https://data-pipelines-cli.readthedocs.io/en/latest/?badge=latest)
99

1010
CLI for data platform
@@ -14,12 +14,44 @@ CLI for data platform
1414
Read the full documentation at [https://data-pipelines-cli.readthedocs.io/](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)
1515

1616
## Installation
17-
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install [dp (data-pipelines-cli)](https://pypi.org/project/data-pipelines-cli/):
17+
18+
**Requirements:** Python 3.9-3.12
19+
20+
### Required
21+
22+
A dbt adapter extra must be installed:
23+
24+
```bash
25+
pip install data-pipelines-cli[snowflake] # Snowflake
26+
pip install data-pipelines-cli[bigquery] # BigQuery
27+
pip install data-pipelines-cli[postgres] # PostgreSQL
28+
pip install data-pipelines-cli[databricks] # Databricks
29+
```
30+
31+
To pin a specific dbt-core version:
32+
33+
```bash
34+
pip install data-pipelines-cli[snowflake] 'dbt-core>=1.8.0,<1.9.0'
35+
```
36+
37+
### Optional
38+
39+
Additional integrations: `docker`, `datahub`, `looker`, `gcs`, `s3`, `git`
40+
41+
### Example
1842

1943
```bash
2044
pip install data-pipelines-cli[bigquery,docker,datahub,gcs]
2145
```
2246

47+
### Troubleshooting
48+
49+
**Pre-release dbt versions**: data-pipelines-cli requires stable dbt-core releases. If you encounter errors with beta or RC versions, reinstall with stable versions:
50+
51+
```bash
52+
pip install --force-reinstall 'dbt-core>=1.7.3,<2.0.0'
53+
```
54+
2355
## Usage
2456
First, create a repository with a global configuration file that you or your organization will be using. The repository
2557
should contain `dp.yml.tmpl` file looking similar to this:

data_pipelines_cli/cli_commands/publish.py

Lines changed: 43 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1+
from __future__ import annotations
2+
13
import json
24
import pathlib
35
import shutil
4-
from typing import Any, Dict, List, Tuple, cast
6+
from typing import Any, Dict, List, Tuple
57

68
import click
79
import yaml
8-
from dbt.contracts.graph.manifest import Manifest
9-
from dbt.contracts.graph.nodes import ColumnInfo, ManifestNode
1010

1111
from ..cli_constants import BUILD_DIR
1212
from ..cli_utils import echo_info, echo_warning
@@ -29,43 +29,52 @@ def _get_project_name_and_version() -> Tuple[str, str]:
2929
return dbt_project_config["name"], dbt_project_config["version"]
3030

3131

32-
def _get_database_and_schema_name(manifest: Manifest) -> Tuple[str, str]:
33-
try:
34-
model = next(
35-
node
36-
for node in (cast(ManifestNode, n) for n in manifest.nodes.values())
37-
if node.resource_type == "model"
38-
)
39-
return model.database, model.schema
40-
except StopIteration:
41-
raise DataPipelinesError("There is no model in 'manifest.json' file.")
32+
def _get_database_and_schema_name(manifest_dict: Dict[str, Any]) -> Tuple[str, str]:
33+
nodes = manifest_dict.get("nodes")
34+
if not nodes:
35+
raise DataPipelinesError("Invalid manifest.json: missing 'nodes' key")
36+
37+
for node_id, node in nodes.items():
38+
if node.get("resource_type") == "model":
39+
database = node.get("database")
40+
schema = node.get("schema")
41+
if not database or not schema:
42+
raise DataPipelinesError(
43+
f"Model {node.get('name', node_id)} missing database or schema"
44+
)
45+
return database, schema
46+
47+
raise DataPipelinesError("There is no model in 'manifest.json' file.")
4248

4349

44-
def _parse_columns_dict_into_table_list(columns: Dict[str, ColumnInfo]) -> List[DbtTableColumn]:
50+
def _parse_columns_dict_into_table_list(columns: Dict[str, Any]) -> List[DbtTableColumn]:
4551
return [
4652
DbtTableColumn(
47-
name=column.name,
48-
description=column.description,
49-
meta=column.meta,
50-
quote=column.quote,
51-
tags=column.tags,
53+
name=col_data.get("name", ""),
54+
description=col_data.get("description", ""),
55+
meta=col_data.get("meta", {}),
56+
quote=col_data.get("quote"),
57+
tags=col_data.get("tags", []),
5258
)
53-
for column in columns.values()
59+
for col_data in columns.values()
5460
]
5561

5662

57-
def _parse_models_schema(manifest: Manifest) -> List[DbtModel]:
58-
return [
59-
DbtModel(
60-
name=node.name,
61-
description=node.description,
62-
tags=node.tags,
63-
meta=node.meta,
64-
columns=_parse_columns_dict_into_table_list(node.columns),
65-
)
66-
for node in (cast(ManifestNode, n) for n in manifest.nodes.values())
67-
if node.resource_type == "model"
68-
]
63+
def _parse_models_schema(manifest_dict: Dict[str, Any]) -> List[DbtModel]:
64+
nodes = manifest_dict.get("nodes", {})
65+
models = []
66+
for node_id, node in nodes.items():
67+
if node.get("resource_type") == "model":
68+
models.append(
69+
DbtModel(
70+
name=node.get("name", ""),
71+
description=node.get("description", ""),
72+
tags=node.get("tags", []),
73+
meta=node.get("meta", {}),
74+
columns=_parse_columns_dict_into_table_list(node.get("columns", {})),
75+
)
76+
)
77+
return models
6978

7079

7180
def _get_dag_id() -> str:
@@ -76,15 +85,14 @@ def _get_dag_id() -> str:
7685
def _create_source(project_name: str) -> DbtSource:
7786
with open(pathlib.Path.cwd().joinpath("target", "manifest.json"), "r") as manifest_json:
7887
manifest_dict = json.load(manifest_json)
79-
manifest = Manifest.from_dict(manifest_dict)
8088

81-
database_name, schema_name = _get_database_and_schema_name(manifest)
89+
database_name, schema_name = _get_database_and_schema_name(manifest_dict)
8290

8391
return DbtSource(
8492
name=project_name,
8593
database=database_name,
8694
schema=schema_name,
87-
tables=_parse_models_schema(manifest),
95+
tables=_parse_models_schema(manifest_dict),
8896
meta={"dag": _get_dag_id()},
8997
tags=[f"project:{project_name}"],
9098
)

data_pipelines_cli/looker_utils.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
from __future__ import annotations
2+
13
import glob
24
import os
35
import pathlib
@@ -6,10 +8,17 @@
68

79
import requests
810
import yaml
9-
from git import Repo
1011

1112
from .cli_constants import BUILD_DIR
12-
from .cli_utils import echo_info, subprocess_run
13+
from .cli_utils import echo_info, echo_warning, subprocess_run
14+
15+
try:
16+
from git import Repo
17+
18+
GIT_EXISTS = True
19+
except ImportError:
20+
echo_warning("Git support not installed.")
21+
GIT_EXISTS = False
1322
from .config_generation import (
1423
generate_profiles_yml,
1524
read_dictionary_from_config_directory,
@@ -48,6 +57,11 @@ def deploy_lookML_model(key_path: str, env: str) -> None:
4857
:param env: Name of the environment
4958
:type env: str
5059
"""
60+
if not GIT_EXISTS:
61+
from .errors import DependencyNotInstalledError
62+
63+
raise DependencyNotInstalledError("git")
64+
5165
profiles_path = generate_profiles_yml(env, False)
5266
run_dbt_command(("docs", "generate"), env, profiles_path)
5367

0 commit comments

Comments
 (0)