Skip to content

Commit dcdf61d

Browse files
author
roller100 (BearingNode)
committed
docs(dbt): Document custom facets and integrate coverage analysis
Final documentation updates for PostgreSQL migration: 1. SPECIFICATION_COVERAGE_ANALYSIS.md: - Updated test configuration (PostgreSQL 15, 22 events, matrix testing) - Added comprehensive 'Known Validation Warnings' section - Documented dbt_version and dbt_run custom facets - Explained why warnings occur (vendor extensions vs official spec) - Clarified impact: tests pass, events valid, warnings expected - Listed resolution options and current workaround status 2. README.md: - Distinguished local vs GitHub Actions testing workflows - Added 'Custom dbt Facets and Validation Warnings' section - Cross-referenced SPECIFICATION_COVERAGE_ANALYSIS.md at two key points - Clarified that validation warnings are expected behavior These docs ensure contributors understand: - The difference between local Docker Compose and CI/CD testing - Why dbt events generate validation warnings (custom facets) - That warnings are documented, expected, and acceptable - Where to find detailed technical analysis Ready for upstream PR to OpenLineage compatibility-tests repo. Signed-off-by: roller100 (BearingNode) <[email protected]>
1 parent 162b607 commit dcdf61d

File tree

2 files changed

+218
-24
lines changed

2 files changed

+218
-24
lines changed

producer/dbt/README.md

Lines changed: 176 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,12 @@ This test validates that the `openlineage-dbt` integration correctly generates O
5252
- `dataQualityAssertions` (for dbt tests)
5353
- **Specification Compliance**: Events are validated against the OpenLineage specification schema (version `2-0-2`).
5454

55-
A detailed, facet-by-facet analysis of specification coverage is available in `SPECIFICATION_COVERAGE_ANALYSIS.md`.
55+
**For detailed coverage analysis**, see **[`SPECIFICATION_COVERAGE_ANALYSIS.md`](./SPECIFICATION_COVERAGE_ANALYSIS.md)** which provides:
56+
- Comprehensive facet-by-facet coverage breakdown (39% overall specification coverage)
57+
- Detailed explanation of custom dbt facets and validation warnings
58+
- Analysis of what's tested vs. what's not tested and why
59+
- Recommendations for future coverage improvements
60+
- Resolution status for known validation warnings
5661

5762
## Test Structure
5863

@@ -75,54 +80,174 @@ producer/dbt/
7580

7681
## How to Run the Tests
7782

78-
To execute the test suite, you will need a local clone of the main [OpenLineage repository](https://github.com/OpenLineage/OpenLineage), as the validation tool requires access to the specification files.
83+
There are two primary ways to run the dbt compatibility tests: **locally for development and debugging**, or via **GitHub Actions for automated CI/CD validation**. Both approaches use the same underlying test framework but differ in their database setup and execution environment.
7984

80-
### Prerequisites
85+
### Running Tests via GitHub Actions (Automated CI/CD)
8186

82-
1. **Install Python Dependencies**:
87+
**This is the standard, automated test runner for the repository and community.**
88+
89+
GitHub Actions provides the canonical testing environment with:
90+
- PostgreSQL 15 service container (automatically provisioned)
91+
- Matrix testing across multiple dbt and OpenLineage versions
92+
- Automated event validation against OpenLineage specifications
93+
- Integration with the repository's reporting and compatibility tracking
94+
95+
#### Triggering GitHub Actions Workflows
96+
97+
1. **Automatic Trigger on Pull Requests**: The workflow runs automatically when changes are detected in `producer/dbt/` paths.
98+
99+
2. **Manual Trigger via Workflow Dispatch**:
100+
```bash
101+
# Trigger for specific branch
102+
gh workflow run main_pr.yml --ref feature/your-branch -f components="dbt"
103+
104+
# Watch the run
105+
gh run watch
106+
```
107+
108+
3. **Via Pull Request**: Opening a PR that modifies dbt producer files will automatically trigger the test suite.
109+
110+
The GitHub Actions workflow:
111+
- Provisions a PostgreSQL 15 container with health checks
112+
- Installs `dbt-core`, `dbt-postgres`, and `openlineage-dbt` at specified versions
113+
- Executes all scenarios defined in `scenarios/`
114+
- Validates events against OpenLineage JSON schemas
115+
- Generates compatibility reports and uploads artifacts
116+
117+
**Configuration**: See `.github/workflows/producer_dbt.yml` for the complete workflow definition.
118+
119+
---
120+
121+
### Running Tests Locally (Development & Debugging)
122+
123+
**Use this approach for iterative development, debugging, and testing changes before pushing to GitHub.**
124+
125+
Local testing provides:
126+
- Faster feedback loops for development
127+
- Direct access to event files and logs
128+
- Ability to inspect database state
129+
- Control over specific test scenarios
130+
131+
#### Prerequisites
132+
133+
1. **Start PostgreSQL Container**:
83134
```bash
84135
# From the producer/dbt/ directory
136+
docker-compose up -d
137+
138+
# Verify container is healthy
139+
docker-compose ps
140+
```
141+
142+
2. **Install Python Dependencies**:
143+
```bash
144+
# Activate virtual environment (recommended)
145+
python -m venv venv
146+
source venv/bin/activate # On Windows: venv\Scripts\activate
147+
148+
# Install requirements
85149
pip install -r test_runner/requirements.txt
86150
```
87151

88-
2. **Install dbt and the PostgreSQL adapter**:
152+
3. **Install dbt and the PostgreSQL adapter**:
89153
```bash
90154
pip install dbt-core dbt-postgres
91155
```
92156

93-
3. **Install the OpenLineage dbt integration**:
157+
4. **Install the OpenLineage dbt integration**:
94158
```bash
95159
pip install openlineage-dbt
96160
```
97161

98-
### Execution
162+
5. **Verify dbt Connection**:
163+
```bash
164+
cd runner/
165+
dbt debug
166+
cd ..
167+
```
168+
169+
#### Local Execution Options
99170

100-
Run the main test script, providing the path to your local OpenLineage repository.
171+
**Option 1: Using the Test Runner CLI (Recommended)**
101172

102-
#### Basic Example
103-
This command runs the test suite with default settings, validating against the `2-0-2` OpenLineage release and saving events to the `events/` directory.
173+
The test runner CLI provides the same orchestration used in GitHub Actions:
104174

105175
```bash
106-
# Example assuming the OpenLineage repo is cloned in a sibling directory
107-
./run_dbt_tests.sh --openlineage-directory ../OpenLineage
176+
# Run a specific scenario
177+
python test_runner/cli.py run-scenario \
178+
--scenario csv_to_postgres_local \
179+
--output-dir ./test_output/$(date +%s)
180+
181+
# List available scenarios
182+
python test_runner/cli.py list-scenarios
108183
```
109184

110-
#### Full Example
111-
This command demonstrates how to override the default settings by specifying all available arguments.
185+
**Option 2: Direct dbt-ol Execution (For debugging)**
186+
187+
For fine-grained control and debugging, run `dbt-ol` commands directly:
188+
189+
```bash
190+
cd runner/
191+
192+
# Generate events for seed operation
193+
dbt-ol seed
194+
195+
# Generate events for model execution
196+
dbt-ol run
197+
198+
# Generate events for tests
199+
dbt-ol test
200+
201+
# Inspect generated events
202+
cat ../events/openlineage_events.jsonl | jq '.'
203+
```
204+
205+
**Option 3: Legacy Shell Script (Deprecated)**
206+
207+
The `run_dbt_tests.sh` script is deprecated but still available:
112208

113209
```bash
114210
./run_dbt_tests.sh \
115-
--openlineage-directory /path/to/your/OpenLineage \
116-
--producer-output-events-dir /tmp/dbt_events \
211+
--openlineage-directory /path/to/OpenLineage \
212+
--producer-output-events-dir ./events \
117213
--openlineage-release 2-0-2 \
118-
--report-path /tmp/dbt_report.json
214+
--report-path ./dbt_report.json
215+
```
216+
217+
#### Local vs. GitHub Actions: Key Differences
218+
219+
| Aspect | Local Testing | GitHub Actions |
220+
|--------|---------------|----------------|
221+
| **Database** | Docker Compose (manual start) | PostgreSQL service container (auto-provisioned) |
222+
| **Environment** | Uses local environment variables from `profiles.yml` | Uses workflow-defined environment variables |
223+
| **Event Output** | Writes to `events/openlineage_events.jsonl` by default | Writes to temporary directory defined by workflow |
224+
| **Validation** | Manual inspection or via test runner CLI | Automated validation against OpenLineage schemas |
225+
| **Use Case** | Development, debugging, local verification | CI/CD, PR validation, compatibility reporting |
226+
| **Cleanup** | Manual (`docker-compose down -v`) | Automatic container cleanup |
227+
228+
#### Cleaning Up Local Environment
229+
230+
```bash
231+
# Stop PostgreSQL container
232+
docker-compose down
233+
234+
# Remove PostgreSQL data volume (clean slate)
235+
docker-compose down -v
236+
237+
# Remove generated event files
238+
rm -rf events/*.jsonl test_output/
119239
```
120240

121-
### Command-Line Arguments
122-
- `--openlineage-directory` (**Required**): Path to the root of a local clone of the OpenLineage repository, which contains the `spec/` directory.
123-
- `--producer-output-events-dir`: Directory where generated OpenLineage events will be saved. (Default: `events/`)
124-
- `--openlineage-release`: The OpenLineage release version to validate against. (Default: `2-0-2`)
125-
- `--report-path`: Path where the final JSON test report will be generated. (Default: `../dbt_producer_report.json`)
241+
---
242+
243+
### Command-Line Arguments (Legacy Script)
244+
245+
For the deprecated `run_dbt_tests.sh` script:
246+
247+
- `--openlineage-directory` (**Required**): Path to a local clone of the OpenLineage repository
248+
- `--producer-output-events-dir`: Directory for generated OpenLineage events (Default: `events/`)
249+
- `--openlineage-release`: OpenLineage release version to validate against (Default: `2-0-2`)
250+
- `--report-path`: Path for the final JSON test report (Default: `../dbt_producer_report.json`)
126251

127252
## Important dbt Integration Notes
128253

@@ -135,6 +260,35 @@ This integration has several nuances that are important to understand when analy
135260
- The availability of certain dbt-specific facets may depend on the version of `dbt-core` being used.
136261
- The file transport configuration in `openlineage.yml` directly controls the location and format of the event output.
137262

263+
### Custom dbt Facets and Validation Warnings
264+
265+
**The dbt integration emits custom facets that generate expected validation warnings:**
266+
267+
The `openlineage-dbt` integration adds vendor-specific facets to OpenLineage events that are **not part of the official OpenLineage specification**:
268+
269+
1. **`dbt_version`** - Captures the dbt-core version
270+
2. **`dbt_run`** - Captures dbt execution metadata (invocation_id, profile_name, project_name, etc.)
271+
272+
These facets:
273+
- ✅ Have valid schema definitions in the OpenLineage repository
274+
- ✅ Provide valuable dbt-specific context for lineage consumers
275+
- ⚠️ Generate validation warnings: `"facet type dbt_version not recognized"` and `"facet type dbt_run not recognized"`
276+
- ℹ️ Are **expected behavior** for vendor-specific OpenLineage extensions
277+
278+
**Impact on Test Results:**
279+
- All dbt operations complete successfully (seed, run, test)
280+
- All events are generated with correct OpenLineage structure
281+
- Core facets (schema, dataSource, sql, columnLineage, etc.) validate successfully
282+
- Custom dbt facets trigger warnings during schema validation but do **not indicate test failure**
283+
284+
These warnings are **documented and accepted** as expected behavior.
285+
286+
**📊 For complete technical details**, see **[`SPECIFICATION_COVERAGE_ANALYSIS.md`](./SPECIFICATION_COVERAGE_ANALYSIS.md)** which documents:
287+
- The exact structure and purpose of `dbt_version` and `dbt_run` facets
288+
- Why validation warnings occur (vendor extensions vs. official spec)
289+
- Impact assessment on test results
290+
- Current workarounds and long-term resolution options
291+
138292
## Future Enhancements
139293

140294
To support community discussions around forward and backward compatibility, the `future/` directory contains design documents exploring a potential approach to multi-spec and multi-implementation version testing.

producer/dbt/SPECIFICATION_COVERAGE_ANALYSIS.md

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,52 @@ This document analyzes the OpenLineage specification coverage achieved by our db
55

66
## Test Configuration
77
- **OpenLineage Specification**: 2-0-2 (target specification)
8-
- **dbt-openlineage Implementation**: 1.37.0
8+
- **dbt-openlineage Implementation**: 1.39.0 / 1.23.0 (matrix tested)
9+
- **Database**: PostgreSQL 15 (migrated from DuckDB)
910
- **Test Scenario**: CSV → dbt models → PostgreSQL (includes data quality tests)
10-
- **Events Generated**: 20 events total
11+
- **Events Generated**: 22 events total
1112
- 3 dbt models (START/COMPLETE pairs)
1213
- 5 data quality test suites (START/COMPLETE pairs)
1314
- 1 job orchestration wrapper (START/COMPLETE)
15+
- Additional seed operations
16+
17+
## ⚠️ Known Validation Warnings
18+
19+
The dbt integration emits **custom facets that are not part of the official OpenLineage specification**. These generate validation warnings but are **expected and acceptable**:
20+
21+
### Custom dbt Facets:
22+
1. **`dbt_version`** (Run Facet)
23+
- **Purpose**: Captures the version of dbt-core being used
24+
- **Schema**: `dbt-version-run-facet.json`
25+
- **Example**: `{"version": "1.10.15"}`
26+
- **Validation Warning**: `"$.run.facets.dbt_version facet type dbt_version not recognized"`
27+
28+
2. **`dbt_run`** (Run Facet)
29+
- **Purpose**: Captures dbt-specific execution metadata
30+
- **Schema**: `dbt-run-run-facet.json`
31+
- **Fields**: `dbt_runtime`, `invocation_id`, `profile_name`, `project_name`, `project_version`
32+
- **Validation Warning**: `"$.run.facets.dbt_run facet type dbt_run not recognized"`
33+
34+
### Why These Warnings Occur:
35+
- The OpenLineage specification validator checks against the **official spec schemas**
36+
- Custom vendor-specific facets (like dbt's) are **extensions** to the core spec
37+
- These facets have valid schema URLs but are not included in the official OpenLineage specification
38+
- The warnings indicate the validator found facets it doesn't recognize, **not that the events are invalid**
39+
40+
### Impact on Testing:
41+
-**All dbt operations execute successfully** (seed, run, test)
42+
-**All 22 events are generated correctly** with proper structure
43+
-**Core OpenLineage facets validate successfully** (schema, dataSource, sql, etc.)
44+
- ⚠️ **Custom dbt facets generate warnings** during schema validation
45+
- ℹ️ **This is expected behavior** for vendor-specific extensions to OpenLineage
46+
47+
### Resolution Status:
48+
- **Current State**: Warnings are documented and accepted as expected behavior
49+
- **Workaround**: `fail-for-new-failures` temporarily disabled in GitHub Actions for feature branch testing
50+
- **Long-term Options**:
51+
1. Update validation to allow custom facets with valid schema URLs
52+
2. Propose dbt facets for inclusion in official OpenLineage specification
53+
3. Accept warnings as documented known behavior after merge to main
1454

1555
## Facet Coverage Analysis
1656

0 commit comments

Comments
 (0)