Skip to content

Commit 8201ee7

Browse files
paco-valdezCopilot
andauthored
Enhance support for Primary Keys and perf opt (#16)
* fix utf8 issue * Support primary keys from dbt tests * Added orjson for perf * Update QUICK_REFERENCE.md Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]>
1 parent 530a59e commit 8201ee7

File tree

11 files changed

+769
-22
lines changed

11 files changed

+769
-22
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,5 @@ cython_debug/
160160
# and can be added to the global gitignore or merged into this file. For a more nuclear
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
#.idea/
163+
.DS_Store
164+

CLAUDE.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
**cube_dbt** is a Python package that converts dbt models and columns into Cube semantic layer definitions. It parses dbt manifest files and provides Jinja-compatible YAML output for integrating data models with Cube's semantic layer.
8+
9+
## Common Development Commands
10+
11+
```bash
12+
# Testing
13+
pdm run test # Run all tests (34 unit tests)
14+
pytest tests/ -v # Run tests with verbose output
15+
pytest tests/test_dbt.py # Run specific test file
16+
pytest -k "test_model" # Run tests matching pattern
17+
18+
# Development Setup
19+
pdm install # Install project with dev dependencies
20+
pdm install --prod # Install production dependencies only
21+
pdm lock # Update pdm.lock file
22+
pdm update # Update all dependencies
23+
24+
# Building & Publishing
25+
pdm build # Build distribution packages
26+
pdm publish # Publish to PyPI (requires credentials)
27+
28+
# Development Workflow
29+
pdm run python -m cube_dbt # Run the module directly
30+
python -c "from cube_dbt import Dbt; print(Dbt.version())" # Check version
31+
```
32+
33+
## High-Level Architecture
34+
35+
The package consists of 4 core classes that work together:
36+
37+
### Core Classes
38+
39+
**Dbt (src/cube_dbt/dbt.py)**
40+
- Entry point for loading dbt manifest files
41+
- Supports file paths and URLs via `from_file()` and `from_url()` class methods
42+
- Implements chainable filtering API: `filter(paths=[], tags=[], names=[])`
43+
- Lazy initialization - models are only loaded when accessed
44+
- Handles manifest v1-v12 formats
45+
46+
**Model (src/cube_dbt/model.py)**
47+
- Represents a single dbt model from the manifest
48+
- Key method: `as_cube()` - exports model as Cube-compatible YAML
49+
- Supports multiple primary keys via column tags
50+
- Provides access to columns, description, database, schema, and alias
51+
- Handles special characters in model names (spaces, dots, dashes)
52+
53+
**Column (src/cube_dbt/column.py)**
54+
- Represents dbt columns with comprehensive type mapping
55+
- Maps 130+ database-specific types to 5 Cube dimension types:
56+
- string, number, time, boolean, geo
57+
- Database support: BigQuery, Snowflake, Redshift, generic SQL
58+
- Primary key detection via `primary_key` tag in column metadata
59+
- Raises RuntimeError for unknown column types (fail-fast approach)
60+
61+
**Dump (src/cube_dbt/dump.py)**
62+
- Custom YAML serialization utilities
63+
- Returns Jinja SafeString for template compatibility
64+
- Handles proper indentation for nested structures
65+
- Used internally by Model.as_cube() for output formatting
66+
67+
### Key Design Patterns
68+
69+
1. **Lazy Loading**: Models are loaded only when first accessed via `dbt.models` property
70+
2. **Builder Pattern**: Filter methods return self for chaining: `dbt.filter(tags=['tag1']).filter(paths=['path1'])`
71+
3. **Factory Methods**: `Dbt.from_file()` and `Dbt.from_url()` for different data sources
72+
4. **Type Mapping Strategy**: Centralized database type to Cube type conversion in Column class
73+
74+
### Data Flow
75+
76+
```
77+
manifest.json → Dbt.from_file() → filter() → models → Model.as_cube() → YAML output
78+
79+
columns → Column.dimension_type()
80+
```
81+
82+
## Testing Structure
83+
84+
Tests use a real dbt manifest fixture (tests/manifest.json, ~397KB) with example models:
85+
86+
- **test_dbt.py**: Tests manifest loading, filtering by paths/tags/names, version checking
87+
- **test_model.py**: Tests YAML export, primary key handling, special character escaping
88+
- **test_column.py**: Tests type mapping for different databases, primary key detection
89+
- **test_dump.py**: Tests YAML formatting and Jinja compatibility
90+
91+
Run specific test scenarios:
92+
```bash
93+
pytest tests/test_column.py::TestColumn::test_bigquery_types -v
94+
pytest tests/test_model.py::TestModel::test_multiple_primary_keys -v
95+
```
96+
97+
## Important Implementation Details
98+
99+
### Primary Key Configuration
100+
Primary keys are defined using tags in dbt column metadata:
101+
```yaml
102+
# In dbt schema.yml
103+
columns:
104+
- name: id
105+
meta:
106+
tags: ['primary_key']
107+
```
108+
109+
### Type Mapping Behavior
110+
- Unknown types raise RuntimeError immediately (fail-fast)
111+
- Database-specific types are checked first, then generic SQL types
112+
- Default mappings can be found in `src/cube_dbt/column.py` TYPE_MAP dictionaries
113+
114+
### Jinja Template Integration
115+
All output from `as_cube()` is wrapped in Jinja SafeString to prevent double-escaping in templates. Use the `safe` filter if needed in templates.
116+
117+
### URL Loading Authentication
118+
When using `Dbt.from_url()`, basic authentication is supported:
119+
```python
120+
dbt = Dbt.from_url("https://user:[email protected]/manifest.json")
121+
```
122+
123+
## Recent Changes (from git history)
124+
125+
- Multiple primary key support (#15)
126+
- Documentation of package properties (#14)
127+
- Extended dbt contract data type support (#10)
128+
- Jinja escaping protection for as_cube() (#2)
129+
130+
## Package Metadata
131+
132+
- **Version**: Defined in `src/cube_dbt/__init__.py`
133+
- **Python Requirement**: >= 3.8
134+
- **Production Dependency**: PyYAML >= 6.0.1
135+
- **License**: MIT
136+
- **Build System**: PDM with PEP 517/518 compliance

QUICK_REFERENCE.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# cube_dbt Quick Reference
2+
3+
## What is cube_dbt?
4+
A Python package that converts dbt models and columns into Cube semantic layer definitions. It parses dbt manifests and provides Jinja-compatible YAML output.
5+
6+
## Install & Run Tests
7+
```bash
8+
pdm install # Set up environment
9+
pdm run test # Run all tests
10+
```
11+
12+
## Basic Usage
13+
```python
14+
from cube_dbt import Dbt
15+
16+
# Load and filter
17+
dbt = Dbt.from_file('manifest.json').filter(
18+
paths=['marts/'],
19+
tags=['cube'],
20+
names=['model_name']
21+
)
22+
23+
# Access models
24+
model = dbt.model('my_model')
25+
print(model.name)
26+
print(model.sql_table)
27+
print(model.columns)
28+
29+
# Export to Cube (YAML)
30+
print(model.as_cube())
31+
print(model.as_dimensions())
32+
```
33+
34+
## Project Structure
35+
```
36+
src/cube_dbt/
37+
├── dbt.py - Dbt class (manifest loading & filtering)
38+
├── model.py - Model class (cube export)
39+
├── column.py - Column class (type mapping)
40+
├── dump.py - YAML utilities (Jinja-safe)
41+
└── __init__.py - Public exports
42+
43+
tests/ - 34 unit tests, all passing
44+
```
45+
46+
## Key Classes
47+
48+
### Dbt
49+
- `from_file(path)` - Load from JSON
50+
- `from_url(url)` - Load from remote URL
51+
- `filter(paths=[], tags=[], names=[])` - Chainable filtering
52+
- `.models` - Get all filtered models
53+
- `.model(name)` - Get single model
54+
55+
### Model
56+
- `.name`, `.description`, `.sql_table` - Properties
57+
- `.columns` - List of Column objects
58+
- `.primary_key` - List of primary key columns
59+
- `.as_cube()` - Export as Cube definition (YAML)
60+
- `.as_dimensions()` - Export dimensions (YAML)
61+
62+
### Column
63+
- `.name`, `.description`, `.type`, `.meta` - Properties
64+
- `.primary_key` - Boolean
65+
- `.as_dimension()` - Export dimension (YAML)
66+
67+
Type mapping: BigQuery, Snowflake, Redshift → Cube types (number, string, time, boolean, geo)
68+
69+
## Dependencies
70+
- Production: PyYAML >= 6.0.1, orjson >= 3.10.15
71+
- Note: orjson is used for fast JSON parsing. If unavailable, the package may fall back to standard libraries.
72+
- Development: pytest >= 7.4.2
73+
- Python: >= 3.8
74+
75+
## Common Tasks
76+
| Task | Command |
77+
|------|---------|
78+
| Run tests | `pdm run test` |
79+
| Run specific test | `pytest tests/test_dbt.py -v` |
80+
| Install deps | `pdm install` |
81+
| Lock deps | `pdm lock` |
82+
| Build package | `pdm build` |
83+
84+
## Recent Changes
85+
- v0.6.2: Multiple primary keys support
86+
- Type support for dbt contracts
87+
- Jinja template safe rendering
88+
89+
## Publishing
90+
GitHub Actions auto-publishes to PyPI on release.

0 commit comments

Comments
 (0)