From 0c15e786abcf9130253a14e8dae639ada8034cd1 Mon Sep 17 00:00:00 2001 From: Hayden Date: Tue, 4 Nov 2025 22:43:46 -0700 Subject: [PATCH] Add auto-generated documentation --- docs/autogenerated_docs/1-overview.md | 232 +++++++++ .../1.1-architecture-overview.md | 266 ++++++++++ .../2-core-execution-system.md | 252 +++++++++ .../2.1-data-freshness-management.md | 301 +++++++++++ .../2.2-semantic-interfaces-and-metricflow.md | 212 ++++++++ .../3-project-parsing-system.md | 179 +++++++ ...onfiguration-validation-and-json-schema.md | 274 ++++++++++ .../3.2-model-configuration-processing.md | 165 ++++++ docs/autogenerated_docs/4-cli-system.md | 491 ++++++++++++++++++ .../4.1-command-interface-and-deprecations.md | 350 +++++++++++++ .../5-configuration-system.md | 282 ++++++++++ .../5.1-project-configuration-and-schema.md | 288 ++++++++++ .../5.2-hierarchical-configuration-parsing.md | 188 +++++++ .../6-event-and-logging-system.md | 221 ++++++++ .../6.1-deprecation-management.md | 187 +++++++ 15 files changed, 3888 insertions(+) create mode 100644 docs/autogenerated_docs/1-overview.md create mode 100644 docs/autogenerated_docs/1.1-architecture-overview.md create mode 100644 docs/autogenerated_docs/2-core-execution-system.md create mode 100644 docs/autogenerated_docs/2.1-data-freshness-management.md create mode 100644 docs/autogenerated_docs/2.2-semantic-interfaces-and-metricflow.md create mode 100644 docs/autogenerated_docs/3-project-parsing-system.md create mode 100644 docs/autogenerated_docs/3.1-configuration-validation-and-json-schema.md create mode 100644 docs/autogenerated_docs/3.2-model-configuration-processing.md create mode 100644 docs/autogenerated_docs/4-cli-system.md create mode 100644 docs/autogenerated_docs/4.1-command-interface-and-deprecations.md create mode 100644 docs/autogenerated_docs/5-configuration-system.md create mode 100644 docs/autogenerated_docs/5.1-project-configuration-and-schema.md create mode 100644 docs/autogenerated_docs/5.2-hierarchical-configuration-parsing.md create mode 100644 docs/autogenerated_docs/6-event-and-logging-system.md create mode 100644 docs/autogenerated_docs/6.1-deprecation-management.md diff --git a/docs/autogenerated_docs/1-overview.md b/docs/autogenerated_docs/1-overview.md new file mode 100644 index 00000000000..87bb88ac077 --- /dev/null +++ b/docs/autogenerated_docs/1-overview.md @@ -0,0 +1,232 @@ +# Overview + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/0.0.0.md](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/0.0.0.md) + +
+ + + +This page introduces dbt-core, its purpose, architecture, and key components. It provides a high-level understanding of the system to help developers and contributors understand how dbt-core works. + +## What is dbt-core? + +dbt-core (data build tool) is a transformation framework that enables data analysts and engineers to transform data using software engineering best practices. It allows users to transform data by writing SQL select statements while dbt handles converting these statements into tables and views in a data warehouse. + +The core functionality of dbt includes: + +- Building and managing relationships between models +- Providing a templating system (Jinja) for SQL generation +- Testing data quality +- Documentation generation +- Dependency management +- Incremental model updates +- Snapshots for slowly changing dimensions + +Sources: [README.md:10-18](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/README.md#L10-L18) + +## High-Level Architecture + +The following diagram illustrates the high-level architecture of dbt-core: + +```mermaid +graph TD + subgraph "UserInterface" + CLI["CLI System\n(entry points)"] + dbtRunner["dbtRunner\n(programmatic API)"] + end + + subgraph "CoreProcessing" + Parser["ManifestLoader\n(builds project graph)"] + Compiler["CompilationContext\n(renders SQL)"] + RunTask["RunTask\n(orchestrates execution)"] + NodeRunner["NodeExecutor\n(executes SQL)"] + end + + subgraph "DataRepresentation" + Manifest["Manifest\n(graph representation)"] + GraphQueue["GraphQueue\n(execution order)"] + Results["Results\n(execution results)"] + end + + subgraph "Configuration" + ProjectConfig["dbt_project.yml"] + ProfilesConfig["profiles.yml"] + SelectorConfig["selectors.yml"] + end + + subgraph "AdapterSystem" + BaseAdapter["BaseAdapter"] + DatabaseAdapters["Specific DB Adapters"] + end + + CLI -->|commands| RunTask + dbtRunner -->|invokes| RunTask + + ProjectConfig --> Parser + ProfilesConfig --> Parser + SelectorConfig --> RunTask + + Parser -->|builds| Manifest + RunTask -->|reads| Manifest + RunTask -->|creates| GraphQueue + RunTask -->|uses| NodeRunner + NodeRunner -->|returns| Results + + Compiler -->|provides context to| Parser + Compiler -->|provides context to| NodeRunner + + NodeRunner -->|uses| DatabaseAdapters + DatabaseAdapters -.->|implements| BaseAdapter +``` + +Sources: [core/setup.py:45-47](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/setup.py#L45-L47), [README.md:10-21](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/README.md#L10-L21) + +## Core Execution Flow + +The following sequence diagram shows how dbt-core executes a typical command: + +```mermaid +sequenceDiagram + participant User + participant CLI as "CLI System" + participant Task as "Task (e.g. RunTask)" + participant ML as "ManifestLoader" + participant Queue as "GraphQueue" + participant Runner as "NodeExecutor" + participant Adapter as "Database Adapter" + + User->>CLI: "run dbt command" + CLI->>Task: "initialize task" + Task->>ML: "get_full_manifest()" + ML->>ML: "parse project files" + ML->>ML: "resolve refs & sources" + ML->>ML: "build dependency graph" + ML-->>Task: "manifest" + + Task->>Task: "select nodes" + Task->>Queue: "create execution queue" + + loop "Until queue is empty" + Task->>Queue: "get next node" + Queue-->>Task: "node" + Task->>Runner: "execute node" + Runner->>Adapter: "compile SQL" + Runner->>Adapter: "execute SQL" + Adapter-->>Runner: "results" + Runner-->>Task: "execution status" + Task->>Queue: "mark node complete" + end + + Task-->>CLI: "execution results" + CLI-->>User: "display results" +``` + +Sources: [core/setup.py:45-47](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/setup.py#L45-L47) + +## Key Components + +### 1. Command-Line Interface (CLI) + +The CLI is the primary way users interact with dbt. It provides commands like `run`, `test`, `docs`, etc., and is implemented using the Click library. The entry point for the CLI is defined in `dbt.cli.main:cli`. + +``` +dbt [options] +``` + +For programmatic usage, dbt provides `dbtRunner`, a Python API to invoke dbt commands from code. + +For more details on the CLI system, see [CLI System](#4). + +Sources: [core/setup.py:45-47](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/setup.py#L45-L47) + +### 2. ManifestLoader and Parser + +The ManifestLoader is responsible for parsing all project files (SQL models, YAML files, macros, etc.) and building a comprehensive graph representation called the Manifest. This component: + +- Parses SQL files to extract model definitions +- Processes schema YAML files for sources, tests, and documentation +- Loads macros and makes them available for rendering +- Resolves dependencies between models using `ref()` and `source()` calls +- Builds the node dependency graph + +For more details on the parsing system, see [Project Parsing System](#3). + +Sources: [README.md:16-18](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/README.md#L16-L18) + +### 3. Compilation and Execution System + +Once the manifest is built, dbt's execution system: + +- Selects nodes based on user criteria (e.g., specific models, tags) +- Determines the execution order based on dependencies +- Compiles SQL by rendering Jinja templates +- Executes the compiled SQL against the target database +- Collects and reports execution results + +For more details, see [Core Execution System](#2) and [RunTask and Node Execution](#2.1). + +### 4. Database Adapters + +dbt uses adapters to communicate with different database systems. Each adapter implements: + +- Connection management +- SQL compilation specific to the database dialect +- Database-specific materialization strategies +- Metadata operations (e.g., creating schemas, listing tables) + +The adapter system allows dbt to support many different databases while maintaining consistent behavior. + +### 5. Context System + +The Context system provides variables and functions for use in Jinja templates. Key functions include: + +- `ref()`: Reference other models +- `source()`: Reference source tables +- `config()`: Set model-specific configurations +- `env_var()`: Access environment variables + +For more details, see [Jinja Templating and Parsing](#3.2). + +## Configuration + +dbt projects are configured through several files: + +### dbt_project.yml + +The main project configuration file that defines: + +- Project name and version +- Source directories +- Model materialization settings +- Custom configurations + +### profiles.yml + +Defines database connection details: + +- Target database type (e.g., Snowflake, BigQuery) +- Authentication credentials +- Default schemas +- Connection parameters + +For more details, see [Configuration System](#5). + +## Version and Compatibility + +dbt-core is currently at version 1.10.0b2. It requires Python 3.9 or higher and supports Python versions up to 3.13. + +Key dependencies include: +- Jinja2 for templating +- Click for CLI functionality +- Various other libraries for data processing and configuration + +Sources: [core/setup.py:28](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/setup.py#L28), [core/dbt/version.py:229](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/version.py#L229), [.bumpversion.cfg:2](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.bumpversion.cfg#L2) + +## Conclusion + +dbt-core provides a powerful framework for data transformation, enabling analysts and engineers to work with SQL using software engineering best practices. Its architecture supports extensibility, modularity, and reliability through components like the manifest system, adapters, and context providers. \ No newline at end of file diff --git a/docs/autogenerated_docs/1.1-architecture-overview.md b/docs/autogenerated_docs/1.1-architecture-overview.md new file mode 100644 index 00000000000..3601837d6ae --- /dev/null +++ b/docs/autogenerated_docs/1.1-architecture-overview.md @@ -0,0 +1,266 @@ +# Architecture Overview + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.bumpversion.cfg](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.bumpversion.cfg) +- [.changes/README.md](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/README.md) + +
+ + + +## Purpose and Scope + +This document provides a high-level overview of dbt-core's system architecture, focusing on the major subsystems and their interactions. It covers the release management pipeline, configuration validation systems, and core processing components that enable dbt's data transformation capabilities. + +For detailed information about specific subsystems, see: +- Core execution systems: [Core Execution System](#2) +- Configuration validation: [Configuration Validation and JSON Schema](#3.1) +- Release management details: [Release Process and Version Management](#11.1) +- Changelog automation: [Changelog Automation](#11.2) + +## Overall System Architecture + +dbt-core is structured as a modular system with distinct layers for release management, configuration validation, data processing, and external integrations. + +### High-Level Component Architecture + +```mermaid +graph TB + subgraph "Release Management Layer" + bumpversion[".bumpversion.cfg
Version Control"] + changie["changie
Changelog Automation"] + changes_dir["/.changes/
Change Fragments"] + end + + subgraph "Configuration Layer" + json_schema["JSON Schema
Validation"] + dbt_project["dbt_project.yml
Schema"] + model_config["Model Configuration
Processing"] + end + + subgraph "Core Processing Layer" + freshness_mgmt["Source/Model
Freshness Management"] + semantic_layer["Semantic Interfaces
MetricFlow Integration"] + node_selection["Node Selection
Graph Processing"] + end + + subgraph "Interface Layer" + cli_system["CLI System
Command Processing"] + manifest_artifacts["Manifest/Artifacts
Management"] + testing_framework["Testing Framework
Unit/Generic Tests"] + end + + subgraph "External Dependencies" + pydantic["pydantic v1/v2
Data Validation"] + jsonschema_lib["jsonschema 4.19.1+
Schema Validation"] + dbt_common["dbt-common 1.25.1+
Shared Components"] + end + + bumpversion --> changie + changie --> changes_dir + + json_schema --> dbt_project + json_schema --> model_config + + freshness_mgmt --> semantic_layer + node_selection --> manifest_artifacts + + cli_system --> freshness_mgmt + cli_system --> node_selection + testing_framework --> model_config + + json_schema --> jsonschema_lib + model_config --> pydantic + freshness_mgmt --> dbt_common +``` + +**Sources:** [.bumpversion.cfg:1-38](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.bumpversion.cfg#L1-L38), [.changes/README.md:1-54](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/README.md#L1-L54) + +## Release and Version Management + +The release management system uses automated tooling to maintain version consistency and generate documentation. + +### Version Management Pipeline + +```mermaid +flowchart TD + dev_change["Developer Change"] --> change_fragment["changie new
/.changes/unreleased/*.yaml"] + change_fragment --> batch_cmd["changie batch "] + batch_cmd --> merge_cmd["changie merge"] + merge_cmd --> changelog["CHANGELOG.md
Generation"] + + version_bump["Version Update Needed"] --> bumpversion_cmd["bumpversion"] + bumpversion_cmd --> setup_py["core/setup.py
Version Update"] + bumpversion_cmd --> version_py["core/dbt/version.py
Version Update"] + + subgraph "Version Components" + major_minor_patch["major.minor.patch"] + prerelease["prekind: a|b|rc"] + prerelease_num["num: numeric"] + nightly["nightly: dev builds"] + end + + bumpversion_cmd --> major_minor_patch + bumpversion_cmd --> prerelease + bumpversion_cmd --> prerelease_num + bumpversion_cmd --> nightly + + changelog --> release["Final Release"] + setup_py --> release + version_py --> release +``` + +The version management system supports semantic versioning with pre-release and nightly build capabilities. The `bumpversion` tool manages version strings across multiple files using a regex pattern that parses version components [.bumpversion.cfg:3-12](). + +**Sources:** [.bumpversion.cfg:1-38](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.bumpversion.cfg#L1-L38), [.changes/README.md:13-47](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/README.md#L13-L47) + +## Core Processing Systems + +The core processing layer handles data transformation, freshness tracking, and semantic layer integration. + +### Data Processing Flow + +```mermaid +graph LR + subgraph "Input Processing" + external_sources["External Data
Sources"] + source_config["Source
Configuration"] + end + + subgraph "Freshness Management" + source_freshness["Source Freshness
Tracking"] + model_freshness["Model Freshness
Tracking"] + loaded_at_query["loaded_at_query
Processing"] + end + + subgraph "Model Processing" + model_parsing["Model SQL
Parsing"] + model_validation["Configuration
Validation"] + model_execution["Model
Execution"] + end + + subgraph "Semantic Layer" + semantic_interfaces["Semantic
Interfaces"] + metricflow["MetricFlow
Integration"] + time_spine["Time Spine
Processing"] + end + + external_sources --> source_freshness + source_config --> source_freshness + source_freshness --> loaded_at_query + + model_parsing --> model_validation + model_validation --> model_execution + model_freshness --> loaded_at_query + + source_freshness --> semantic_interfaces + model_execution --> semantic_interfaces + semantic_interfaces --> metricflow + metricflow --> time_spine +``` + +**Sources:** [.changes/README.md:1-54](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/README.md#L1-L54) + +## Configuration and Validation Architecture + +The configuration system uses JSON schema validation to ensure project consistency and provide structured validation across different configuration types. + +### Configuration Validation Flow + +```mermaid +graph TB + subgraph "Configuration Sources" + dbt_project_yml["dbt_project.yml
Project Schema"] + model_sql_files["Model SQL Files
Inline Config"] + source_yaml["Source YAML
Configuration"] + end + + subgraph "Validation Engine" + json_schema_validator["JSON Schema
Validator"] + adapter_validation["Adapter-Specific
Validation"] + deprecation_warnings["Deprecation
Management"] + end + + subgraph "Configuration Types" + freshness_config["Freshness
Configuration"] + test_config["Test
Configuration"] + model_config_props["Model Config
Properties"] + end + + subgraph "Processing Systems" + model_processor["Model
Processing"] + source_processor["Source
Processing"] + test_processor["Test
Processing"] + end + + dbt_project_yml --> json_schema_validator + model_sql_files --> json_schema_validator + source_yaml --> json_schema_validator + + json_schema_validator --> adapter_validation + json_schema_validator --> deprecation_warnings + + adapter_validation --> freshness_config + adapter_validation --> test_config + adapter_validation --> model_config_props + + freshness_config --> source_processor + test_config --> test_processor + model_config_props --> model_processor +``` + +**Sources:** [.changes/README.md:1-54](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/README.md#L1-L54) + +## Integration and Dependency Architecture + +dbt-core integrates with external dependencies and provides extension points for adapters and plugins. + +### Dependency Integration + +| Component | Dependency | Version Requirement | Purpose | +|-----------|------------|-------------------|---------| +| Schema Validation | `jsonschema` | 4.19.1+ | JSON schema validation | +| Data Validation | `pydantic` | v1/v2 support | Configuration parsing | +| Shared Components | `dbt-common` | 1.25.1+ | Core utilities | +| Semantic Layer | `dbt-semantic-interfaces` | 0.9.0+ | MetricFlow integration | + +### Extension Architecture + +```mermaid +graph TD + subgraph "dbt-core" + core_engine["Core Engine"] + adapter_interface["Adapter
Interface"] + plugin_system["Plugin
System"] + end + + subgraph "External Adapters" + postgres_adapter["dbt-postgres"] + snowflake_adapter["dbt-snowflake"] + bigquery_adapter["dbt-bigquery"] + other_adapters["Other
Adapters"] + end + + subgraph "Extensions" + semantic_layer_ext["Semantic Layer
Extensions"] + custom_tests["Custom
Tests"] + macros["Custom
Macros"] + end + + core_engine --> adapter_interface + adapter_interface --> postgres_adapter + adapter_interface --> snowflake_adapter + adapter_interface --> bigquery_adapter + adapter_interface --> other_adapters + + core_engine --> plugin_system + plugin_system --> semantic_layer_ext + plugin_system --> custom_tests + plugin_system --> macros +``` + +**Sources:** [.changes/README.md:1-54](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/README.md#L1-L54) \ No newline at end of file diff --git a/docs/autogenerated_docs/2-core-execution-system.md b/docs/autogenerated_docs/2-core-execution-system.md new file mode 100644 index 00000000000..0ab0a39833d --- /dev/null +++ b/docs/autogenerated_docs/2-core-execution-system.md @@ -0,0 +1,252 @@ +# Core Execution System + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Fixes-20250605-110645.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250605-110645.yaml) + +
+ + + +## Purpose and Scope + +The Core Execution System encompasses the fundamental components responsible for executing dbt operations, processing data transformations, and managing the lifecycle of models, sources, and tests within a dbt project. This system coordinates the parsing, validation, and execution of dbt resources while maintaining data freshness tracking and semantic layer integrations. + +This document covers the high-level architecture and coordination mechanisms of dbt's execution engine. For detailed information about data freshness tracking and validation, see [Data Freshness Management](#2.1). For semantic layer processing and metric flow integration, see [Semantic Interfaces and MetricFlow](#2.2). + +## System Architecture + +The Core Execution System operates through several interconnected subsystems that handle different aspects of dbt's execution pipeline: + +### Execution Flow Architecture + +```mermaid +flowchart TD + subgraph "Parsing Layer" + ProjectParser["project_parser"] + ModelParser["model_parser"] + SourceParser["source_parser"] + TestParser["test_parser"] + end + + subgraph "Validation Layer" + ConfigValidator["config_validator"] + SchemaValidator["schema_validator"] + FreshnessValidator["freshness_validator"] + end + + subgraph "Execution Engine" + ModelExecutor["model_executor"] + TestExecutor["test_executor"] + FreshnessChecker["freshness_checker"] + SemanticProcessor["semantic_processor"] + end + + subgraph "State Management" + ManifestBuilder["manifest_builder"] + StateComparator["state_comparator"] + ArtifactWriter["artifact_writer"] + end + + ProjectParser --> ConfigValidator + ModelParser --> SchemaValidator + SourceParser --> FreshnessValidator + TestParser --> SchemaValidator + + ConfigValidator --> ModelExecutor + SchemaValidator --> ModelExecutor + FreshnessValidator --> FreshnessChecker + + ModelExecutor --> ManifestBuilder + TestExecutor --> ManifestBuilder + FreshnessChecker --> SemanticProcessor + SemanticProcessor --> ArtifactWriter + + ManifestBuilder --> StateComparator + StateComparator --> ArtifactWriter +``` + +Sources: Inferred from system architecture diagrams and change log references to parsing systems + +### Resource Processing Pipeline + +The execution system processes different types of dbt resources through specialized pipelines: + +```mermaid +graph LR + subgraph "Input Sources" + SqlFiles["*.sql files"] + YmlFiles["*.yml files"] + ProjectYml["dbt_project.yml"] + end + + subgraph "Resource Types" + Models["models"] + Sources["sources"] + Tests["tests"] + Metrics["metrics"] + Exposures["exposures"] + end + + subgraph "Processing Systems" + ModelProcessor["model_processing_system"] + SourceProcessor["source_processing_system"] + TestProcessor["test_processing_system"] + MetricProcessor["metric_processing_system"] + end + + subgraph "Execution Outputs" + CompiledSQL["compiled_sql"] + TestResults["test_results"] + FreshnessResults["freshness_results"] + SemanticArtifacts["semantic_artifacts"] + end + + SqlFiles --> Models + YmlFiles --> Sources + YmlFiles --> Tests + ProjectYml --> Metrics + YmlFiles --> Exposures + + Models --> ModelProcessor + Sources --> SourceProcessor + Tests --> TestProcessor + Metrics --> MetricProcessor + + ModelProcessor --> CompiledSQL + TestProcessor --> TestResults + SourceProcessor --> FreshnessResults + MetricProcessor --> SemanticArtifacts +``` + +Sources: Inferred from system architecture showing model, source, and test processing systems + +## Core Components + +### Parsing and Validation Systems + +The execution system begins with comprehensive parsing and validation of project resources: + +| Component | Responsibility | Key Validations | +|-----------|---------------|-----------------| +| `project_parser` | Parse `dbt_project.yml` configuration | Schema compliance, dependency resolution | +| `model_parser` | Process model SQL files and configurations | Syntax validation, reference resolution | +| `source_parser` | Handle source definitions and freshness configs | Freshness settings, `build_after` presence | +| `test_parser` | Parse generic and singular tests | Test argument validation, column references | + +The parsing layer ensures that `build_after` configurations are present in model freshness definitions during parsing, preventing runtime errors in freshness validation. + +Sources: `.changes/unreleased/Fixes-20250605-110645.yaml` + +### Execution Coordination + +The execution engine coordinates the running of dbt operations through several key systems: + +#### Model Execution System +- Handles compilation of model SQL +- Manages dependency resolution and execution ordering +- Coordinates incremental model updates +- Integrates with adapter-specific execution logic + +#### Test Execution System +- Executes generic tests with proper argument binding +- Runs singular tests with compiled SQL +- Manages test failure handling and reporting +- Coordinates test result artifact generation + +#### Freshness Execution System +- Executes source freshness checks using configured queries +- Validates `loaded_at_field` and `loaded_at_query` configurations +- Generates freshness results and warnings +- Updates freshness metadata in project artifacts + +Sources: Inferred from system diagrams showing execution components and freshness tracking + +### State Management and Artifacts + +The execution system maintains comprehensive state through the manifest and artifact generation: + +```mermaid +graph TB + subgraph "State Inputs" + CurrentManifest["current_manifest"] + PreviousManifest["previous_manifest"] + RunResults["run_results"] + end + + subgraph "State Processing" + StateComparator["state_comparator"] + ManifestDiffer["manifest_differ"] + NodeSelector["node_selector"] + end + + subgraph "Execution Coordination" + ExecutionPlan["execution_plan"] + DependencyGraph["dependency_graph"] + TaskScheduler["task_scheduler"] + end + + subgraph "Output Artifacts" + UpdatedManifest["updated_manifest.json"] + RunResultsArtifact["run_results.json"] + CatalogArtifact["catalog.json"] + FreshnessArtifact["sources.json"] + end + + CurrentManifest --> StateComparator + PreviousManifest --> StateComparator + RunResults --> ManifestDiffer + + StateComparator --> NodeSelector + ManifestDiffer --> ExecutionPlan + NodeSelector --> DependencyGraph + + ExecutionPlan --> TaskScheduler + DependencyGraph --> TaskScheduler + + TaskScheduler --> UpdatedManifest + TaskScheduler --> RunResultsArtifact + TaskScheduler --> CatalogArtifact + TaskScheduler --> FreshnessArtifact +``` + +Sources: Inferred from system architecture showing state management and manifest processing + +## Integration Points + +### CLI Integration +The execution system receives commands and configuration from the CLI system, processing flags and arguments to determine execution scope and behavior. Model selection flags are processed to build the appropriate execution plan. + +### Configuration System Integration +Execution components consume validated project configuration from the Configuration System, ensuring all execution parameters are properly validated before operations begin. + +### Node Selection Integration +The execution system works closely with Node Selection to determine which resources should be processed based on selection criteria and state comparison results. + +### Adapter Integration +Each execution component interfaces with database adapters to translate generic dbt operations into adapter-specific SQL and execution patterns. + +Sources: Inferred from system architecture showing interconnections between major systems + +## Error Handling and Recovery + +The execution system implements comprehensive error handling: + +- **Parse-time Validation**: Ensures required configuration like `build_after` is present before execution begins +- **Execution-time Validation**: Validates runtime conditions and dependencies during execution +- **Graceful Degradation**: Skips freshness definitions with missing required configuration rather than failing completely +- **State Recovery**: Maintains execution state to enable resumption of interrupted operations + +Sources: `.changes/unreleased/Fixes-20250605-110645.yaml` + +## Performance Considerations + +The execution system optimizes performance through: + +- **Parallel Execution**: Coordinates parallel processing of independent resources +- **Incremental Processing**: Leverages state comparison to process only changed resources when possible +- **Lazy Loading**: Defers resource loading until required for execution +- **Caching**: Maintains compiled SQL and metadata caches to avoid recomputation \ No newline at end of file diff --git a/docs/autogenerated_docs/2.1-data-freshness-management.md b/docs/autogenerated_docs/2.1-data-freshness-management.md new file mode 100644 index 00000000000..58c0f554d4c --- /dev/null +++ b/docs/autogenerated_docs/2.1-data-freshness-management.md @@ -0,0 +1,301 @@ +# Data Freshness Management + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250623-113130.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250623-113130.yaml) +- [.changes/unreleased/Fixes-20250530-005804.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250530-005804.yaml) +- [.changes/unreleased/Fixes-20250605-110645.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250605-110645.yaml) +- [.changes/unreleased/Fixes-20250609-175239.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250609-175239.yaml) +- [.changes/unreleased/Fixes-20250610-211241.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250610-211241.yaml) +- [.changes/unreleased/Fixes-20250616-085600.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250616-085600.yaml) + +
+ + + +The Data Freshness Management system in dbt-core provides functionality for tracking and validating the recency of data in both source tables and derived models. This system enables users to configure freshness checks that monitor when data was last loaded or updated, ensuring data quality and timeliness in data pipelines. + +For information about configuration validation and JSON schema processing, see [Configuration Validation and JSON Schema](#3.1). For project-level configuration management, see [Project Configuration and Schema](#5.1). + +## System Overview + +The freshness management system operates on two primary entities: source tables and models. It supports configurable freshness checks through `loaded_at_query` and `loaded_at_field` configurations, allowing flexible monitoring strategies for different data patterns. + +```mermaid +graph TB + subgraph "Freshness Management Core" + SFS["SourceFreshnessSystem"] + MFS["ModelFreshnessSystem"] + FCC["FreshnessConfigController"] + end + + subgraph "Configuration Sources" + SC["source_config"] + TC["table_config"] + MC["model_config"] + PYC["project_yml_config"] + end + + subgraph "Freshness Queries" + LAQ["loaded_at_query"] + LAF["loaded_at_field"] + BAF["build_after_field"] + end + + subgraph "Validation Layer" + FCV["FreshnessConfigValidator"] + NPV["NullParsingValidator"] + ICV["InlineConfigValidator"] + end + + SC --> SFS + TC --> SFS + MC --> MFS + PYC --> FCC + + SFS --> LAQ + SFS --> LAF + MFS --> LAQ + MFS --> BAF + + FCC --> FCV + FCV --> NPV + FCV --> ICV + + SFS --> FCV + MFS --> FCV +``` + +Sources: [.changes/unreleased/Features-20250623-113130.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250623-113130.yaml#L1-L7), [.changes/unreleased/Fixes-20250530-005804.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250530-005804.yaml#L1-L7) + +## Source Freshness Management + +Source freshness tracking monitors the recency of data in external source tables. The system supports configuration through both `loaded_at_query` and `loaded_at_field` options, providing flexibility for different source table schemas and data loading patterns. + +### Source Configuration Processing + +The source freshness system processes configurations from multiple levels, with proper handling of explicit null values and configuration inheritance. + +```mermaid +graph LR + subgraph "Source Node Processing" + SN["SourceNode"] + SF["source.freshness"] + SCF["source.config.freshness"] + end + + subgraph "Configuration Resolution" + CR["ConfigResolver"] + NH["NullHandler"] + CI["ConfigInheritance"] + end + + subgraph "Freshness Execution" + FE["FreshnessExecutor"] + QG["QueryGenerator"] + FC["FreshnessChecker"] + end + + SN --> SF + SN --> SCF + SF --> CR + SCF --> CR + + CR --> NH + CR --> CI + NH --> FE + CI --> FE + + FE --> QG + FE --> FC +``` + +The system ensures that `source.freshness` equals `source.config.freshness` for consistency across the node representation. + +Sources: [.changes/unreleased/Fixes-20250530-005804.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250530-005804.yaml#L1-L7), [.changes/unreleased/Fixes-20250609-175239.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250609-175239.yaml#L1-L7) + +## Model Freshness Management + +Model freshness tracking monitors the recency of derived models and materialized tables. This system has undergone recent architectural changes to standardize configuration handling and improve validation. + +### Model Freshness Configuration Evolution + +Recent changes have moved model freshness configuration from property-level to config-level support, providing better consistency with the overall dbt configuration system. + +```mermaid +graph TB + subgraph "Legacy System (Deprecated)" + MFP["model_freshness_property"] + PLS["property_level_support"] + end + + subgraph "Current System" + MFC["model_freshness_config"] + CLS["config_level_support"] + ICV["inline_config_validation"] + end + + subgraph "Parsing Layer" + MFP_PARSER["ModelFreshnessPropertyParser"] + MFC_PARSER["ModelFreshnessConfigParser"] + BAP["build_after_presence_check"] + end + + subgraph "Validation Rules" + VR["ValidationRules"] + ICC["invalid_config_check"] + SFD["skip_freshness_definition"] + end + + MFP -.-> MFC + PLS -.-> CLS + + MFC --> MFC_PARSER + MFC_PARSER --> BAP + MFC_PARSER --> ICV + + BAP --> VR + ICV --> VR + VR --> ICC + VR --> SFD +``` + +The parsing system now ensures that `build_after` is present in model freshness definitions, otherwise it skips the freshness definition entirely. + +Sources: [.changes/unreleased/Fixes-20250605-110645.yaml:1-8](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250605-110645.yaml#L1-L8), [.changes/unreleased/Fixes-20250616-085600.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250616-085600.yaml#L1-L7), [.changes/unreleased/Fixes-20250610-211241.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250610-211241.yaml#L1-L7) + +## Configuration System Architecture + +The freshness management system integrates with dbt-core's hierarchical configuration system, supporting both source and table-level configuration inheritance. + +### Configuration Hierarchy + +| Configuration Level | Source Freshness | Model Freshness | Table Freshness | +|-------------------|------------------|-----------------|-----------------| +| Project Level | ✓ | ✓ | ✓ | +| Source Level | ✓ | - | - | +| Model Level | - | ✓ | - | +| Table Level | - | - | ✓ | +| Inline Config | - | ✓ | ✓ | + +### Supported Configuration Parameters + +The system supports the following freshness configuration parameters across different entity types: + +```mermaid +graph LR + subgraph "Freshness Config Parameters" + LAQ["loaded_at_query"] + LAF["loaded_at_field"] + WI["warn_after"] + EI["error_after"] + BA["build_after"] + end + + subgraph "Source Support" + S_LAQ["source.loaded_at_query"] + S_LAF["source.loaded_at_field"] + S_WI["source.warn_after"] + S_EI["source.error_after"] + end + + subgraph "Table Support" + T_LAQ["table.loaded_at_query"] + T_LAF["table.loaded_at_field"] + T_WI["table.warn_after"] + T_EI["table.error_after"] + end + + subgraph "Model Support" + M_LAQ["model.loaded_at_query"] + M_LAF["model.loaded_at_field"] + M_BA["model.build_after"] + end + + LAQ --> S_LAQ + LAQ --> T_LAQ + LAQ --> M_LAQ + + LAF --> S_LAF + LAF --> T_LAF + LAF --> M_LAF + + WI --> S_WI + WI --> T_WI + + EI --> S_EI + EI --> T_EI + + BA --> M_BA +``` + +Sources: [.changes/unreleased/Features-20250623-113130.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250623-113130.yaml#L1-L7) + +## Validation and Error Handling + +The freshness management system includes robust validation and error handling mechanisms to ensure configuration correctness and provide clear feedback for invalid configurations. + +### Validation Pipeline + +```mermaid +graph TB + subgraph "Input Validation" + CV["ConfigValidator"] + NV["NullValidator"] + TV["TypeValidator"] + end + + subgraph "Business Logic Validation" + BLV["BusinessLogicValidator"] + BAV["BuildAfterValidator"] + FDV["FreshnessDefinitionValidator"] + end + + subgraph "Error Handling" + EH["ErrorHandler"] + ICI["InvalidConfigIgnore"] + SFD["SkipFreshnessDefinition"] + VE["ValidationError"] + end + + CV --> BLV + NV --> BLV + TV --> BLV + + BLV --> BAV + BLV --> FDV + + BAV --> EH + FDV --> EH + + EH --> ICI + EH --> SFD + EH --> VE +``` + +The validation system handles several key scenarios: +- Explicit null handling in source freshness configurations +- Missing `build_after` fields in model freshness definitions +- Invalid inline model configuration validation +- Configuration consistency checks between node properties and config objects + +Sources: [.changes/unreleased/Fixes-20250530-005804.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250530-005804.yaml#L1-L7), [.changes/unreleased/Fixes-20250605-110645.yaml:1-8](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250605-110645.yaml#L1-L8), [.changes/unreleased/Fixes-20250610-211241.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250610-211241.yaml#L1-L7), [.changes/unreleased/Fixes-20250609-175239.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250609-175239.yaml#L1-L7) + +## Recent System Improvements + +The freshness management system has undergone significant improvements to enhance reliability and standardize configuration handling: + +### Key Enhancements + +1. **Extended Configuration Support**: Added `loaded_at_query` and `loaded_at_field` support for both source and table configurations, providing greater flexibility in freshness monitoring strategies. + +2. **Improved Null Handling**: Enhanced the system's ability to handle explicit null values in source freshness configurations, preventing configuration resolution errors. + +3. **Standardized Model Configuration**: Moved from property-level to config-level support for model freshness, aligning with dbt-core's configuration patterns and improving consistency. + +4. **Enhanced Validation**: Implemented stricter validation rules for model freshness definitions, ensuring required fields are present before processing. + +5. **Configuration Consistency**: Established consistency between node freshness properties and configuration objects, eliminating discrepancies in node representation. \ No newline at end of file diff --git a/docs/autogenerated_docs/2.2-semantic-interfaces-and-metricflow.md b/docs/autogenerated_docs/2.2-semantic-interfaces-and-metricflow.md new file mode 100644 index 00000000000..fc8efa8d9e3 --- /dev/null +++ b/docs/autogenerated_docs/2.2-semantic-interfaces-and-metricflow.md @@ -0,0 +1,212 @@ +# Semantic Interfaces and MetricFlow + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Dependencies-20250709-132213.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Dependencies-20250709-132213.yaml) +- [.changes/unreleased/Fixes-20250528-092055.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250528-092055.yaml) + +
+ + + +## Purpose and Scope + +This document covers dbt-core's integration with semantic interfaces and MetricFlow systems, including saved query support and time spine processing. This system enables dbt to work with metric definitions and semantic layer functionality through the `dbt-semantic-interfaces` dependency. + +For information about data freshness management that integrates with semantic interfaces, see [Data Freshness Management](#2.1). For details on semantic layer dependencies, see [Semantic Dependencies](#9.2). + +## System Architecture + +The semantic interfaces system provides integration between dbt-core and external semantic layer tools, primarily through the `dbt-semantic-interfaces` library. This system handles metric definitions, saved queries, and time spine configurations. + +```mermaid +graph TB + subgraph "dbt-core Semantic Integration" + SemanticInt["Semantic Interfaces"] + SavedQuery["Saved Query Support"] + MetricFlow["MetricFlow Integration"] + TimeSpine["Time Spine Processing"] + end + + subgraph "External Dependencies" + DBTSemantic["dbt-semantic-interfaces"] + MetricFlowExt["External MetricFlow"] + end + + subgraph "Core dbt Systems" + FreshnessSystem["Freshness Management"] + ProjectConfig["Project Configuration"] + ManifestSystem["Manifest System"] + end + + SemanticInt --> SavedQuery + SemanticInt --> MetricFlow + MetricFlow --> TimeSpine + + SemanticInt --> DBTSemantic + MetricFlow --> MetricFlowExt + + FreshnessSystem --> SemanticInt + ProjectConfig --> SemanticInt + SemanticInt --> ManifestSystem +``` + +**Sources:** `.changes/unreleased/Dependencies-20250709-132213.yaml`, system architecture analysis + +## Saved Query Support + +The semantic interfaces system provides support for saved queries through integration with `dbt-semantic-interfaces`. This functionality has been enhanced with the upgrade to version 0.9.0. + +### Saved Query Integration + +```mermaid +graph LR + subgraph "Saved Query Processing" + QueryDef["Query Definitions"] + QueryValidation["Query Validation"] + QueryExecution["Query Execution Support"] + end + + subgraph "Integration Layer" + SemanticInterface["Semantic Interface Layer"] + DBTSemantic["dbt-semantic-interfaces 0.9.0+"] + end + + QueryDef --> QueryValidation + QueryValidation --> QueryExecution + + QueryDef --> SemanticInterface + SemanticInterface --> DBTSemantic + DBTSemantic --> QueryExecution +``` + +The system handles saved query definitions and provides validation and execution support through the semantic interfaces integration. The recent upgrade to `dbt-semantic-interfaces==0.9.0` provides more robust saved query functionality. + +**Sources:** `.changes/unreleased/Dependencies-20250709-132213.yaml` + +## MetricFlow Integration + +MetricFlow integration enables dbt to work with metric definitions and time-based data processing. The system includes specific handling for time spine configurations. + +### Time Spine Processing + +The system provides specific handling for `metricflow_time_spine` configurations with different granularities. A recent fix addressed warning issues for non-day grain time spine configurations. + +```mermaid +graph TD + subgraph "Time Spine System" + TimeSpineConfig["metricflow_time_spine Configuration"] + GrainValidation["Grain Validation"] + WarningSystem["Warning Management"] + end + + subgraph "Grain Types" + DayGrain["Day Grain"] + NonDayGrain["Non-Day Grain"] + CustomGrain["Custom Grain"] + end + + TimeSpineConfig --> GrainValidation + GrainValidation --> DayGrain + GrainValidation --> NonDayGrain + GrainValidation --> CustomGrain + + NonDayGrain --> WarningSystem + WarningSystem --> TimeSpineConfig +``` + +The fix ensures that `metricflow_time_spine` configurations with non-day grains do not trigger inappropriate warnings, improving the user experience when working with different time granularities. + +**Sources:** `.changes/unreleased/Fixes-20250528-092055.yaml` + +### MetricFlow Configuration Processing + +```mermaid +graph TB + subgraph "Configuration Processing" + ProjectYAML["dbt_project.yml"] + MetricConfig["Metric Configurations"] + TimeSpineConfig["Time Spine Settings"] + end + + subgraph "Validation Layer" + SchemaValidation["Schema Validation"] + GrainValidation["Grain Type Validation"] + ConfigValidation["Configuration Validation"] + end + + subgraph "Processing Output" + ProcessedMetrics["Processed Metrics"] + TimeSpineModels["Time Spine Models"] + ManifestIntegration["Manifest Integration"] + end + + ProjectYAML --> MetricConfig + ProjectYAML --> TimeSpineConfig + + MetricConfig --> SchemaValidation + TimeSpineConfig --> GrainValidation + SchemaValidation --> ConfigValidation + GrainValidation --> ConfigValidation + + ConfigValidation --> ProcessedMetrics + ConfigValidation --> TimeSpineModels + ProcessedMetrics --> ManifestIntegration + TimeSpineModels --> ManifestIntegration +``` + +**Sources:** System architecture analysis based on change files + +## Dependency Management + +The semantic interfaces system relies on the `dbt-semantic-interfaces` external dependency, which provides the core functionality for metric definitions and semantic layer integration. + +| Component | Version | Purpose | +|-----------|---------|---------| +| dbt-semantic-interfaces | 0.9.0+ | Core semantic layer functionality | +| MetricFlow | External | Metric processing and time spine handling | + +### Version Upgrades + +The system has been upgraded to use `dbt-semantic-interfaces==0.9.0`, which provides: +- More robust saved query support +- Enhanced metric definition handling +- Improved integration stability + +**Sources:** `.changes/unreleased/Dependencies-20250709-132213.yaml` + +## Integration Points + +The semantic interfaces system integrates with several other dbt-core systems: + +```mermaid +graph TB + subgraph "Integration Architecture" + SemanticCore["Semantic Interfaces Core"] + + subgraph "Input Systems" + ProjectSystem["Project Configuration"] + FreshnessSystem["Freshness Management"] + ConfigSystem["Configuration Validation"] + end + + subgraph "Output Systems" + ManifestSystem["Manifest Generation"] + ExecutionSystem["Execution Engine"] + ValidationSystem["Validation Framework"] + end + end + + ProjectSystem --> SemanticCore + FreshnessSystem --> SemanticCore + ConfigSystem --> SemanticCore + + SemanticCore --> ManifestSystem + SemanticCore --> ExecutionSystem + SemanticCore --> ValidationSystem +``` + +The semantic interfaces system serves as a bridge between dbt's configuration and execution systems and external semantic layer tools, enabling metric-driven data transformation workflows. \ No newline at end of file diff --git a/docs/autogenerated_docs/3-project-parsing-system.md b/docs/autogenerated_docs/3-project-parsing-system.md new file mode 100644 index 00000000000..45d734a06b5 --- /dev/null +++ b/docs/autogenerated_docs/3-project-parsing-system.md @@ -0,0 +1,179 @@ +# Project Parsing System + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250625-151818.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250625-151818.yaml) + +
+ + + +## Purpose and Scope + +The Project Parsing System is responsible for parsing project configurations, validating schemas, and processing model definitions within dbt-core. This system serves as the foundation for transforming raw configuration files and SQL model definitions into validated, structured representations that can be consumed by the execution engine. + +The system handles parsing from multiple configuration sources including `dbt_project.yml`, model SQL files, source configurations, and table configurations. For information about the underlying JSON schema validation mechanisms, see [Configuration Validation and JSON Schema](#3.1). For details about model-specific configuration processing, see [Model Configuration Processing](#3.2). + +## System Architecture + +The Project Parsing System operates as a multi-stage pipeline that ingests various configuration sources, applies validation rules, and produces structured configuration objects for downstream processing. + +### Core Parsing Pipeline + +```mermaid +graph TD + subgraph "Configuration Sources" + DBT_PROJECT["dbt_project.yml"] + MODEL_SQL["Model SQL Files"] + SOURCE_YAML["Source YAML Files"] + TABLE_CONFIG["Table Configurations"] + end + + subgraph "Parser Components" + PROJ_PARSER["ProjectParser"] + MODEL_PARSER["ModelParser"] + SOURCE_PARSER["SourceParser"] + CONFIG_PARSER["ConfigParser"] + end + + subgraph "Validation Layer" + SCHEMA_VALIDATOR["SchemaValidator"] + CONFIG_VALIDATOR["ConfigValidator"] + SQL_CONFIG_VALIDATOR["SQLConfigValidator"] + end + + subgraph "Output Artifacts" + PROJECT_CONFIG["ProjectConfig"] + MODEL_CONFIGS["ModelConfigs"] + SOURCE_CONFIGS["SourceConfigs"] + PARSED_MANIFEST["ParsedManifest"] + end + + DBT_PROJECT --> PROJ_PARSER + MODEL_SQL --> MODEL_PARSER + SOURCE_YAML --> SOURCE_PARSER + TABLE_CONFIG --> CONFIG_PARSER + + PROJ_PARSER --> SCHEMA_VALIDATOR + MODEL_PARSER --> CONFIG_VALIDATOR + SOURCE_PARSER --> SCHEMA_VALIDATOR + CONFIG_PARSER --> SQL_CONFIG_VALIDATOR + + SCHEMA_VALIDATOR --> PROJECT_CONFIG + CONFIG_VALIDATOR --> MODEL_CONFIGS + SCHEMA_VALIDATOR --> SOURCE_CONFIGS + SQL_CONFIG_VALIDATOR --> PARSED_MANIFEST +``` + +Sources: *.changes/unreleased/Features-20250625-151818.yaml* + +### Configuration Source Processing + +The system processes configuration data from multiple sources, each with distinct parsing requirements and validation rules: + +| Configuration Source | Parser Component | Primary Validation | Output Type | +|---------------------|------------------|-------------------|-------------| +| `dbt_project.yml` | `ProjectParser` | JSON Schema | `ProjectConfig` | +| Model SQL Files | `ModelParser` | SQL Config Validation | `ModelConfig` | +| Source YAML Files | `SourceParser` | Source Schema | `SourceConfig` | +| Table Configurations | `ConfigParser` | Config Schema | `TableConfig` | + +## Schema Validation Integration + +```mermaid +graph LR + subgraph "Input Processing" + RAW_CONFIG["Raw Configuration"] + PARSED_CONFIG["Parsed Configuration"] + end + + subgraph "Validation Pipeline" + JSON_SCHEMA_VAL["JSONSchemaValidator"] + ADAPTER_SPECIFIC_VAL["AdapterSpecificValidator"] + DEPRECATION_CHECK["DeprecationChecker"] + end + + subgraph "Validation Rules" + DATA_TEST_PROPS["DataTestProperties"] + FRESHNESS_CONFIG["FreshnessConfig"] + EXPOSURE_CONFIG["ExposureConfig"] + LOADED_AT_CONFIG["LoadedAtConfig"] + end + + subgraph "Validation Output" + VALIDATED_CONFIG["ValidatedConfig"] + VALIDATION_ERRORS["ValidationErrors"] + DEPRECATION_WARNINGS["DeprecationWarnings"] + end + + RAW_CONFIG --> JSON_SCHEMA_VAL + JSON_SCHEMA_VAL --> PARSED_CONFIG + PARSED_CONFIG --> ADAPTER_SPECIFIC_VAL + ADAPTER_SPECIFIC_VAL --> DEPRECATION_CHECK + + JSON_SCHEMA_VAL --> DATA_TEST_PROPS + JSON_SCHEMA_VAL --> FRESHNESS_CONFIG + JSON_SCHEMA_VAL --> EXPOSURE_CONFIG + JSON_SCHEMA_VAL --> LOADED_AT_CONFIG + + DEPRECATION_CHECK --> VALIDATED_CONFIG + ADAPTER_SPECIFIC_VAL --> VALIDATION_ERRORS + DEPRECATION_CHECK --> DEPRECATION_WARNINGS +``` + +Sources: *.changes/unreleased/Features-20250625-151818.yaml* + +## Model SQL Configuration Processing + +Recent developments in the parsing system include validation of configurations directly from model SQL files, as indicated by the feature addition for validating configs from model SQL files. This enhancement allows the system to extract and validate configuration parameters embedded within SQL model definitions. + +### SQL Configuration Extraction Flow + +```mermaid +flowchart TD + SQL_FILE["Model SQL File"] --> SQL_PARSER["SQLParser"] + SQL_PARSER --> CONFIG_EXTRACTOR["ConfigExtractor"] + CONFIG_EXTRACTOR --> INLINE_CONFIG["InlineConfig"] + + INLINE_CONFIG --> SQL_CONFIG_VALIDATOR["SQLConfigValidator"] + SQL_CONFIG_VALIDATOR --> VALIDATION_RESULT["ValidationResult"] + + VALIDATION_RESULT --> MERGED_CONFIG["MergedModelConfig"] + YAML_CONFIG["YAML Model Config"] --> CONFIG_MERGER["ConfigMerger"] + MERGED_CONFIG --> CONFIG_MERGER + CONFIG_MERGER --> FINAL_MODEL_CONFIG["FinalModelConfig"] +``` + +The system now processes configuration data embedded within SQL files, enabling more cohesive model definitions where configuration and logic coexist within the same file. + +Sources: *.changes/unreleased/Features-20250625-151818.yaml* + +## Error Handling and Validation Reporting + +The parsing system implements comprehensive error handling to provide meaningful feedback when configuration parsing or validation fails: + +- **Schema Validation Errors**: Reported when configurations don't conform to expected JSON schemas +- **SQL Configuration Errors**: Reported when inline SQL configurations are malformed or invalid +- **Deprecation Warnings**: Generated when deprecated configuration patterns are detected +- **Adapter Compatibility Warnings**: Issued when configurations may not be compatible with specific database adapters + +## Integration Points + +The Project Parsing System integrates with several other dbt-core systems: + +- **Manifest System**: Provides parsed configurations for manifest generation (see [Manifest and Artifacts](#8)) +- **Node Selection**: Supplies configuration data for node filtering and selection (see [Node Selection](#7)) +- **Execution System**: Provides validated configurations for model execution (see [Core Execution System](#2)) +- **CLI System**: Receives configuration overrides and flags from command-line interface (see [CLI System](#4)) + +## Performance Considerations + +The parsing system implements several optimization strategies: + +- **Incremental Parsing**: Only re-parses configurations that have changed since the last run +- **Schema Caching**: Caches JSON schemas to avoid repeated validation overhead +- **Parallel Processing**: Processes independent configuration sources concurrently when possible +- **Lazy Loading**: Defers parsing of unused configurations until they are explicitly requested \ No newline at end of file diff --git a/docs/autogenerated_docs/3.1-configuration-validation-and-json-schema.md b/docs/autogenerated_docs/3.1-configuration-validation-and-json-schema.md new file mode 100644 index 00000000000..c9452bb1b38 --- /dev/null +++ b/docs/autogenerated_docs/3.1-configuration-validation-and-json-schema.md @@ -0,0 +1,274 @@ +# Configuration Validation and JSON Schema + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250714-232524.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250714-232524.yaml) +- [.changes/unreleased/Fixes-20250707-103418.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250707-103418.yaml) +- [.changes/unreleased/Fixes-20250710-170148.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250710-170148.yaml) + +
+ + + +This document covers dbt-core's JSON schema validation system, which provides structured validation for project configurations, model definitions, and other dbt artifacts. The system includes adapter-specific validation gating, schema definition management, and deprecation handling for configuration validation processes. + +For information about hierarchical configuration parsing and project-level settings, see [Hierarchical Configuration Parsing](#5.2). For model-specific configuration processing, see [Model Configuration Processing](#3.2). + +## Purpose and Scope + +The Configuration Validation and JSON Schema system serves as the validation layer for dbt project configurations, ensuring that YAML configurations, SQL model configurations, and other project artifacts conform to expected schemas. The system provides: + +- JSON schema-based validation for multiple configuration types +- Adapter-specific validation gating to customize validation behavior +- Deprecation management for evolving configuration standards +- Integration with the broader project parsing pipeline + +## System Architecture + +**Configuration Validation System Architecture** + +```mermaid +graph TB + subgraph ConfigSources["Configuration Sources"] + dbtProject["dbt_project.yml"] + modelSQL["Model SQL Files"] + sourceYAML["Source YAML Files"] + tableConfigs["Table Configurations"] + end + + subgraph ValidationCore["Validation Core"] + jsonSchemaValidator["JSON Schema Validator"] + adapterGate["Adapter-Specific Gate"] + deprecationHandler["GenericJSONSchemaValidationDeprecation"] + end + + subgraph SchemaRules["Schema Rules"] + dataTestProps["Data Test Properties"] + exposureConfigs["Exposure Configurations"] + freshnessConfigs["Freshness Configurations"] + loadedAtConfigs["loaded_at Configurations"] + end + + subgraph ProcessingSystems["Processing Systems"] + modelProcessor["Model Processing"] + sourceProcessor["Source Processing"] + testProcessor["Test Processing"] + end + + dbtProject --> jsonSchemaValidator + modelSQL --> jsonSchemaValidator + sourceYAML --> jsonSchemaValidator + tableConfigs --> jsonSchemaValidator + + jsonSchemaValidator --> adapterGate + adapterGate --> deprecationHandler + + jsonSchemaValidator --> dataTestProps + jsonSchemaValidator --> exposureConfigs + jsonSchemaValidator --> freshnessConfigs + jsonSchemaValidator --> loadedAtConfigs + + dataTestProps --> modelProcessor + dataTestProps --> testProcessor + freshnessConfigs --> sourceProcessor + loadedAtConfigs --> sourceProcessor + loadedAtConfigs --> modelProcessor + + deprecationHandler -.-> modelProcessor + deprecationHandler -.-> sourceProcessor + deprecationHandler -.-> testProcessor +``` + +Sources: Based on architecture diagrams and change file patterns from `.changes/unreleased/Features-20250714-232524.yaml`, `.changes/unreleased/Fixes-20250707-103418.yaml` + +## Core Validation Components + +### JSON Schema Validator + +The JSON Schema Validator serves as the primary validation engine for dbt configurations. This component processes various configuration sources and applies appropriate schema validation rules. + +**Validation Flow** + +```mermaid +flowchart TD + configInput["Configuration Input"] + schemaLoad["Load JSON Schema"] + validate["Validate Against Schema"] + adapterCheck["Check Adapter Compatibility"] + errorHandle["Handle Validation Errors"] + success["Validation Success"] + + configInput --> schemaLoad + schemaLoad --> validate + validate --> adapterCheck + adapterCheck -->|"Valid"| success + adapterCheck -->|"Invalid"| errorHandle + validate -->|"Schema Error"| errorHandle +``` + +The validator handles multiple configuration types including: +- Project-level configurations from `dbt_project.yml` +- Model SQL configurations embedded in SQL files +- Source and table configuration definitions +- Test and exposure configurations + +Sources: `.changes/unreleased/Features-20250714-232524.yaml`, `.changes/unreleased/Fixes-20250707-103418.yaml` + +### Adapter-Specific Validation Gating + +The system implements adapter-specific validation gating to customize validation behavior based on the target database adapter. This allows different adapters to have different validation requirements. + +| Component | Purpose | Gating Behavior | +|-----------|---------|----------------| +| `adapterGate` | Controls validation scope | Enables/disables validations per adapter | +| `jsonSchemaValidator` | Core validation logic | Respects adapter gating decisions | +| Configuration processors | Apply validated configs | Receive adapter-filtered results | + +**Adapter Gating Process** + +```mermaid +stateDiagram-v2 + [*] --> ConfigurationInput + ConfigurationInput --> AdapterDetection + AdapterDetection --> GateEvaluation + + state GateEvaluation { + [*] --> CheckAdapterRules + CheckAdapterRules --> ValidationEnabled + CheckAdapterRules --> ValidationSkipped + } + + ValidationEnabled --> JSONSchemaValidation + ValidationSkipped --> DirectPass + JSONSchemaValidation --> ConfigurationOutput + DirectPass --> ConfigurationOutput + ConfigurationOutput --> [*] +``` + +Sources: `.changes/unreleased/Features-20250714-232524.yaml` + +### Deprecation Management + +The `GenericJSONSchemaValidationDeprecation` component manages the deprecation of configuration validation features, providing controlled migration paths for evolving validation standards. + +Key aspects of deprecation management: +- Preview deprecation status for gradual migration +- Warning generation for deprecated validation patterns +- Compatibility maintenance during transition periods + +Sources: `.changes/unreleased/Fixes-20250710-170148.yaml` + +## Configuration Sources and Processing + +### Configuration Input Types + +The validation system processes multiple types of configuration inputs: + +**Configuration Source Mapping** + +```mermaid +graph LR + subgraph ProjectLevel["Project Level"] + dbtProjectYml["dbt_project.yml"] + globalConfigs["Global Configurations"] + end + + subgraph ModelLevel["Model Level"] + modelSqlConfigs["SQL Model Configs"] + modelYamlConfigs["Model YAML Configs"] + end + + subgraph SourceLevel["Source Level"] + sourceConfigurations["Source Configurations"] + tableConfigurations["Table Configurations"] + end + + subgraph TestLevel["Test Level"] + testProperties["Test Properties"] + dataTestConfigs["Data Test Configs"] + end + + dbtProjectYml --> ValidationPipeline["Validation Pipeline"] + globalConfigs --> ValidationPipeline + modelSqlConfigs --> ValidationPipeline + modelYamlConfigs --> ValidationPipeline + sourceConfigurations --> ValidationPipeline + tableConfigurations --> ValidationPipeline + testProperties --> ValidationPipeline + dataTestConfigs --> ValidationPipeline +``` + +### Schema Definition Updates + +Recent updates to the schema definitions include: +- Nested configuration definitions for complex configuration structures +- Cloud integration information handling +- Removal of deprecated source override configurations + +Sources: `.changes/unreleased/Fixes-20250707-103418.yaml` + +## Validation Rules and Processing + +### Rule Categories + +The system applies different validation rules based on configuration type: + +| Rule Category | Configuration Target | Validation Focus | +|---------------|---------------------|------------------| +| Data Test Properties | Model and source tests | Test argument validation | +| Exposure Configurations | Exposure definitions | Exposure metadata validation | +| Freshness Configurations | Source freshness settings | Freshness parameter validation | +| loaded_at Configurations | Source and model loading | Loading timestamp validation | + +### Processing Integration + +**Validation to Processing Flow** + +```mermaid +graph TD + subgraph ValidationResults["Validation Results"] + validatedDataTests["Validated Data Tests"] + validatedExposures["Validated Exposures"] + validatedFreshness["Validated Freshness"] + validatedLoadedAt["Validated loaded_at"] + end + + subgraph ProcessingEngines["Processing Engines"] + modelProcessor["Model Processor"] + sourceProcessor["Source Processor"] + testProcessor["Test Processor"] + end + + validatedDataTests --> modelProcessor + validatedDataTests --> testProcessor + validatedExposures --> modelProcessor + validatedFreshness --> sourceProcessor + validatedLoadedAt --> sourceProcessor + validatedLoadedAt --> modelProcessor + + modelProcessor --> ProcessedModels["Processed Models"] + sourceProcessor --> ProcessedSources["Processed Sources"] + testProcessor --> ProcessedTests["Processed Tests"] +``` + +Sources: Based on architecture patterns from provided diagrams and change file references + +## Recent System Evolution + +### Recent Changes + +The configuration validation system has undergone several recent improvements: + +1. **Adapter Gating Implementation** - Introduction of adapter-specific validation controls +2. **Schema Definition Updates** - Enhanced schema definitions with nested configurations and cloud integration +3. **Deprecation Management** - Conversion of `GenericJSONSchemaValidationDeprecation` to preview status + +### Migration Considerations + +The system maintains backward compatibility while evolving validation standards: +- Deprecated source overrides are being phased out +- Generic JSON schema validation deprecations provide migration warnings +- Adapter-specific gating allows gradual rollout of new validation rules \ No newline at end of file diff --git a/docs/autogenerated_docs/3.2-model-configuration-processing.md b/docs/autogenerated_docs/3.2-model-configuration-processing.md new file mode 100644 index 00000000000..a5c3aef08c7 --- /dev/null +++ b/docs/autogenerated_docs/3.2-model-configuration-processing.md @@ -0,0 +1,165 @@ +# Model Configuration Processing + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250625-151818.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250625-151818.yaml) + +
+ + + +## Purpose and Scope + +The Model Configuration Processing system handles the parsing, validation, and processing of configuration defined within dbt model SQL files. This system extracts configuration directives from model definitions and validates them against established schemas as part of the broader project parsing workflow. + +For information about JSON schema validation and adapter-specific validation, see [Configuration Validation and JSON Schema](#3.1). For project-level configuration management, see [Project Configuration and Schema](#5.1). + +## Configuration Sources and Processing Flow + +Model configurations in dbt-core can be specified in multiple locations, with SQL file configurations being processed as part of the model parsing pipeline. + +### Configuration Processing Pipeline + +```mermaid +flowchart TD + ModelSQL["Model SQL Files"] --> ConfigExtraction["Configuration Extraction"] + ConfigExtraction --> ConfigValidation["Configuration Validation"] + ConfigValidation --> ConfigMerging["Configuration Merging"] + ConfigMerging --> ModelProcessing["Model Processing"] + + ProjectYML["dbt_project.yml"] --> ConfigMerging + ModelProperties["Model Properties"] --> ConfigMerging + + ConfigValidation --> SchemaValidation["JSON Schema Validation"] + SchemaValidation --> ValidationResults["Validation Results"] + ValidationResults --> ModelProcessing + + subgraph "Validation Layer" + SchemaValidation + ValidationResults + end + + subgraph "Configuration Sources" + ModelSQL + ProjectYML + ModelProperties + end +``` + +Sources: `.changes/unreleased/Features-20250625-151818.yaml` + +## SQL Configuration Validation + +Recent enhancements to the model configuration processing system include the introduction of configuration validation directly from model SQL files. This represents a significant improvement in configuration management capabilities. + +### Validation Implementation + +```mermaid +graph TB + SQLParser["SQL Configuration Parser"] --> ConfigExtractor["Configuration Extractor"] + ConfigExtractor --> Validator["SQL Config Validator"] + Validator --> ValidationEngine["Validation Engine"] + + Validator --> ErrorHandler["Error Handler"] + ValidationEngine --> ResultProcessor["Result Processor"] + ErrorHandler --> ResultProcessor + + subgraph "Validation Process" + Validator + ValidationEngine + ErrorHandler + end + + subgraph "Output Processing" + ResultProcessor + ValidatedConfig["Validated Configuration"] + ValidationErrors["Validation Errors"] + end + + ResultProcessor --> ValidatedConfig + ResultProcessor --> ValidationErrors +``` + +The validation system processes configuration directives embedded within model SQL files, ensuring they conform to expected schemas and constraints before being integrated into the broader model processing pipeline. + +Sources: `.changes/unreleased/Features-20250625-151818.yaml` + +## Integration with Project Parsing + +Model configuration processing operates as a component within the larger project parsing system, coordinating with other parsing subsystems to build comprehensive model definitions. + +### Configuration Processing Workflow + +```mermaid +sequenceDiagram + participant Parser as "Project Parser" + participant SQLProcessor as "SQL Config Processor" + participant Validator as "Config Validator" + participant Schema as "Schema Engine" + participant ModelBuilder as "Model Builder" + + Parser->>SQLProcessor: "Process Model SQL" + SQLProcessor->>SQLProcessor: "Extract Config Directives" + SQLProcessor->>Validator: "Validate Extracted Config" + Validator->>Schema: "Apply Schema Validation" + Schema-->>Validator: "Validation Results" + Validator-->>SQLProcessor: "Validated Config" + SQLProcessor-->>Parser: "Processed Configuration" + Parser->>ModelBuilder: "Build Model with Config" +``` + +This workflow ensures that SQL-based configurations are properly validated and integrated into the model definition process, maintaining consistency with other configuration sources. + +Sources: `.changes/unreleased/Features-20250625-151818.yaml` + +## Configuration Validation Features + +The model configuration processing system includes several key validation capabilities: + +| Validation Type | Description | Implementation Status | +|-----------------|-------------|----------------------| +| Schema Validation | JSON schema-based validation of configuration structure | Active | +| Type Checking | Validation of configuration value types | Active | +| SQL Config Validation | Direct validation of configs from model SQL files | Recently Added | +| Cross-Reference Validation | Validation of configuration references to other models/sources | Active | + +### Recent Enhancements + +The system has been enhanced to begin validating configurations directly from model SQL files, addressing issue #11727. This enhancement expands the validation coverage to include configuration directives that are embedded within the model definition itself. + +Sources: `.changes/unreleased/Features-20250625-151818.yaml` + +## System Architecture Integration + +The Model Configuration Processing system integrates with several other dbt-core subsystems: + +```mermaid +graph LR + ModelConfigProcessor["Model Config Processor"] --> JSONSchemaValidator["JSON Schema Validator"] + ModelConfigProcessor --> ConfigMerger["Configuration Merger"] + ModelConfigProcessor --> ProjectParser["Project Parser"] + + JSONSchemaValidator --> ValidationResults["Validation Results"] + ConfigMerger --> MergedConfig["Merged Configuration"] + ProjectParser --> ModelDefinitions["Model Definitions"] + + subgraph "Validation Subsystem" + JSONSchemaValidator + ValidationResults + end + + subgraph "Configuration Subsystem" + ConfigMerger + MergedConfig + end + + subgraph "Parsing Subsystem" + ProjectParser + ModelDefinitions + end +``` + +This architecture ensures that model configuration processing operates seamlessly within the broader dbt-core parsing and validation ecosystem, maintaining consistency and reliability across all configuration sources. \ No newline at end of file diff --git a/docs/autogenerated_docs/4-cli-system.md b/docs/autogenerated_docs/4-cli-system.md new file mode 100644 index 00000000000..ae83207dc5e --- /dev/null +++ b/docs/autogenerated_docs/4-cli-system.md @@ -0,0 +1,491 @@ +# CLI System + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250611-160217.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250611-160217.yaml) + +
+ + + +The CLI System in dbt-core provides the command-line interface that users interact with as well as a programmatic API through the `dbtRunner` class. It handles command parsing, flag validation, environment setup, and command execution. + +This document focuses on how dbt's CLI system is implemented: the code structure, key components, and execution flow. For information about the programmatic API, see [dbtRunner and Programmatic Interface](#4.1). + +## Command Structure + +The CLI system is built using the Click library and follows a hierarchical command structure. The CLI system is primarily implemented in these files: + +- `core/dbt/cli/main.py`: Defines the CLI commands and the `dbtRunner` class +- `core/dbt/cli/params.py`: Defines the command-line parameters and flags +- `core/dbt/cli/flags.py`: Processes and validates flags +- `core/dbt/cli/requires.py`: Provides decorators for setting up the execution environment +- `core/dbt/cli/option_types.py`: Defines custom parameter types + +### CLI Architecture + +```mermaid +graph TD + subgraph "CLI Entry Point" + EP["dbt.cli.main:cli"] + end + + subgraph "Command Structure" + MAIN["Main Commands\n(run, build, compile, etc)"] + GROUP["Command Groups\n(docs, source)"] + SUB["Subcommands\n(docs generate, source freshness)"] + end + + subgraph "Command Processing" + PARSE["Command Parsing\n(Click library)"] + FLAG["Flag Processing\n(Flags class)"] + ENV["Environment Setup\n(requires decorators)"] + TASK["Task Instantiation\n(RunTask, BuildTask, etc)"] + EXEC["Task Execution\n(task.run())"] + RES["Result Interpretation\n(task.interpret_results())"] + end + + EP --> MAIN + EP --> GROUP + GROUP --> SUB + + MAIN --> PARSE + SUB --> PARSE + PARSE --> FLAG + FLAG --> ENV + ENV --> TASK + TASK --> EXEC + EXEC --> RES +``` + +Sources: [core/dbt/cli/main.py:153-169](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L153-L169), [core/setup.py:41-47](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/setup.py#L41-L47) + +### Entry Point + +The CLI entry point is defined in `setup.py` using Python's entry points system: + +```python +# From core/setup.py +entry_points={ + "console_scripts": ["dbt = dbt.cli.main:cli"], +}, +``` + +This maps the `dbt` command to the `cli` function in `dbt.cli.main`. The `cli` function is a Click command group that contains all dbt commands. + +Sources: [core/setup.py:41-47](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/setup.py#L41-L47) + +## Commands and Task Mapping + +Each CLI command is defined as a function decorated with `@cli.command()` and follows a consistent pattern: + +```mermaid +graph TD + subgraph "Command Definition Pattern" + CMD["@cli.command('command_name')"] + CTX["@click.pass_context"] + GFLAGS["@global_flags"] + CFLAGS["Command-specific flags"] + REQP["@requires.postflight"] + REQPF["@requires.preflight"] + REQPROF["@requires.profile"] + REQPROJ["@requires.project"] + REQRC["@requires.runtime_config"] + REQMAN["@requires.manifest"] + FN["def command_name(ctx, **kwargs)"] + TASK["task = TaskClass(...)"] + RUN["results = task.run()"] + INT["success = task.interpret_results(results)"] + RET["return results, success"] + end + + CMD --> CTX + CTX --> GFLAGS + GFLAGS --> CFLAGS + CFLAGS --> REQP + REQP --> REQPF + REQPF --> REQPROF + REQPROF --> REQPROJ + REQPROJ --> REQRC + REQRC --> REQMAN + REQMAN --> FN + FN --> TASK + TASK --> RUN + RUN --> INT + INT --> RET +``` + +Sources: [core/dbt/cli/main.py:169-837](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L169-L837) + +Each command maps to a corresponding task class: + +| Command | Task Class | File | +|------------|------------------------|----------------------------| +| run | RunTask | dbt/task/run.py | +| build | BuildTask | dbt/task/build.py | +| compile | CompileTask | dbt/task/compile.py | +| test | TestTask | dbt/task/test.py | +| seed | SeedTask | dbt/task/seed.py | +| snapshot | SnapshotTask | dbt/task/snapshot.py | +| parse | (returns manifest) | N/A | +| docs | GenerateTask/ServeTask | dbt/task/docs/*.py | +| debug | DebugTask | dbt/task/debug.py | +| deps | DepsTask | dbt/task/deps.py | +| clean | CleanTask | dbt/task/clean.py | + +Sources: [core/dbt/cli/main.py:169-837](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L169-L837) + +## Command Execution Flow + +```mermaid +sequenceDiagram + participant User + participant CLI as dbt.cli.main:cli + participant Flags as dbt.cli.flags:Flags + participant Requires as dbt.cli.requires + participant Task as Task Class + + User->>CLI: dbt run + CLI->>Flags: Initialize Flags + Flags->>CLI: Return processed flags + CLI->>Requires: preflight + Requires->>Requires: Set up environment + Requires->>Requires: Load profile + Requires->>Requires: Load project + Requires->>Requires: Create runtime config + Requires->>Requires: Load/parse manifest + CLI->>Task: Instantiate RunTask + CLI->>Task: task.run() + Task->>CLI: Return results + CLI->>Task: interpret_results(results) + CLI->>Requires: postflight + Requires->>Requires: Handle exceptions/cleanup + Requires->>User: Return success/failure +``` + +Sources: [core/dbt/cli/main.py:169-587](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L169-L587), [core/dbt/cli/requires.py:58-410](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/requires.py#L58-L410) + +When a user executes a dbt command, the following happens: + +1. The command is parsed by Click +2. The `Flags` class processes and validates flags +3. The `requires` decorators set up the execution environment +4. The appropriate task class is instantiated +5. The task's `run()` method is called +6. The task's `interpret_results()` method processes the results +7. The results and success status are returned + +## Flag System + +The flag system handles command-line options and their validation. + +### Flag Definition + +Flags are defined in `params.py` as Click options: + +```python +# Example from core/dbt/cli/params.py +debug = click.option( + "--debug/--no-debug", + "-d/ ", + envvar="DBT_DEBUG", + help="Display debug logging during dbt execution. Useful for debugging and making bug reports.", +) +``` + +The `global_flags` decorator in `main.py` applies common flags to all commands: + +```python +# From core/dbt/cli/main.py +def global_flags(func): + @p.cache_selected_only + @p.debug + @p.defer + # ... many more flags ... + def wrapper(*args, **kwargs): + return func(*args, **kwargs) + return wrapper +``` + +Sources: [core/dbt/cli/params.py](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/params.py), [core/dbt/cli/main.py:101-150](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L101-L150) + +### Flag Deprecations + +**Deprecated Model Selection Flags** + +As of recent versions, several model selection flags have been deprecated: + +| Deprecated Flag | Status | Recommended Alternative | +|-----------------|--------|------------------------| +| `--models` | Deprecated | Use `--select` instead | +| `--model` | Deprecated | Use `--select` instead | +| `-m` | Deprecated | Use `--select` instead | + +**CLI Flag Deprecation System** + +```mermaid +graph TD + subgraph "Flag Deprecation Flow" + OLD_FLAG["Deprecated Flag Usage\n(--models, --model, -m)"] + DETECT["Flag Detection\nFLAGS_DEFAULTS"] + WARN["Deprecation Warning\nfire_event()"] + MAP["Flag Mapping\nto --select"] + PROCESS["Normal Processing\nwith mapped flag"] + end + + OLD_FLAG --> DETECT + DETECT --> WARN + WARN --> MAP + MAP --> PROCESS +``` + +The deprecation system ensures backward compatibility while guiding users toward the current flag conventions. When deprecated flags are used, warning messages are emitted through the event system. + +Sources: [.changes/unreleased/Features-20250611-160217.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250611-160217.yaml), [core/dbt/cli/flags.py](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/flags.py) + +### Flag Processing + +**Flag Processing Pipeline** + +```mermaid +graph TD + subgraph "Flag_Sources" + DEFAULT["FLAGS_DEFAULTS\ndefault values"] + CMD["ctx.params\ncommand line args"] + ENV["os.environ\nenvironment variables"] + PROJ["ProjectFlags\ndbt_project.yml"] + end + + subgraph "Flag_Processing" + FLAGS["Flags.__init__()\nflag consolidation"] + VALID["_assert_mutually_exclusive()\nvalidation logic"] + DEP["Deprecated Flag Detection\nmodel selection flags"] + OVER["Project Override\nfrom dbt_project.yml"] + end + + DEFAULT --> FLAGS + CMD --> FLAGS + ENV --> FLAGS + PROJ --> FLAGS + + FLAGS --> DEP + DEP --> VALID + VALID --> OVER + + OVER --> FINAL["Final Flag Values\nused by task classes"] +``` + +The `Flags` class in `flags.py` processes flags through these steps: + +1. **Default Value Application**: Uses `FLAGS_DEFAULTS` for initial values +2. **Command Line Processing**: Extracts `ctx.params` from Click context +3. **Environment Variable Processing**: Reads from `os.environ` with `DBT_` prefixes +4. **Deprecated Flag Handling**: Maps deprecated model selection flags to `--select` +5. **Project Override Application**: Applies settings from `dbt_project.yml` +6. **Mutual Exclusion Validation**: Validates incompatible flag combinations using `_assert_mutually_exclusive` +7. **Final Validation**: Performs additional validation (e.g., event time flags) + +Sources: [core/dbt/cli/flags.py:87-390](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/flags.py#L87-L390) + +## Environment Setup Decorators + +The `requires.py` module provides decorators that set up the execution environment: + +```mermaid +graph TD + subgraph "Decorator Order" + POST["@requires.postflight"] + PRE["@requires.preflight"] + PROF["@requires.profile"] + PROJ["@requires.project"] + RC["@requires.runtime_config"] + MAN["@requires.manifest"] + CMD["Command Function"] + end + + subgraph "Context Population" + CTX["Context Object\nctx.obj"] + FLAGS["flags"] + PROFILE["profile"] + PROJECT["project"] + RTCONFIG["runtime_config"] + MANIFEST["manifest"] + end + + POST --> PRE + PRE --> PROF + PROF --> PROJ + PROJ --> RC + RC --> MAN + MAN --> CMD + + PRE --> |populates| FLAGS + PROF --> |populates| PROFILE + PROJ --> |populates| PROJECT + RC --> |populates| RTCONFIG + MAN --> |populates| MANIFEST + + FLAGS --> CTX + PROFILE --> CTX + PROJECT --> CTX + RTCONFIG --> CTX + MANIFEST --> CTX +``` + +Sources: [core/dbt/cli/requires.py:58-410](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/requires.py#L58-L410) + +These decorators: + +1. `@requires.postflight`: Handles exception management and result processing +2. `@requires.preflight`: Sets up the execution environment (logging, tracking, etc.) +3. `@requires.profile`: Loads the profile configuration +4. `@requires.project`: Loads the project configuration +5. `@requires.runtime_config`: Creates a runtime configuration from the profile and project +6. `@requires.manifest`: Loads or creates the project manifest + +## Programmatic Interface - dbtRunner + +The `dbtRunner` class in `main.py` provides a programmatic interface to dbt: + +```python +# Example usage +from dbt.cli.main import dbtRunner + +# Initialize the runner +runner = dbtRunner() + +# Invoke a command +result = runner.invoke(["run", "--models", "my_model"]) + +# Check the result +if result.success: + print("Command succeeded!") + print(result.result) # The return value of the command +else: + print("Command failed!") + print(result.exception) # The exception that caused the failure +``` + +The `dbtRunner` class: +- Takes an optional `manifest` parameter to reuse a previously parsed manifest +- Takes optional `callbacks` to receive events +- Has an `invoke()` method that takes command args as a list of strings +- Returns a `dbtRunnerResult` object with success status and result/exception + +Sources: [core/dbt/cli/main.py:23-97](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L23-L97), [tests/functional/dbt_runner/test_dbt_runner.py:14-166](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/dbt_runner/test_dbt_runner.py#L14-L166) + +## Special Features + +### Sample Mode + +The CLI system includes a "sample mode" feature, which allows running commands on a subset of data: + +```python +# From core/dbt/cli/params.py +sample = click.option( + "--sample", + envvar="DBT_SAMPLE", + help="Run in sample mode with given SAMPLE_WINDOW spec, such that ref/source calls are sampled by the sample window.", + default=None, + type=SampleType(), + hidden=True, # TODO: Unhide +) +``` + +Sample mode works with models, seeds, and snapshots that have an `event_time` configuration by filtering data based on a time window: + +```mermaid +graph TD + subgraph "Sample Mode Flow" + CMD["dbt Command\nwith --sample flag"] + PARSER["SampleType Parser\nConverts to SampleWindow"] + REF["ref() Function"] + SOURCE["source() Function"] + RESOLVER["BaseResolver.resolve_event_time_filter()"] + FILTER["EventTimeFilter\nAdded to relation query"] + SUBSET["Subset of Data\nFiltered by time window"] + end + + CMD --> PARSER + PARSER --> |sample window| REF + PARSER --> |sample window| SOURCE + REF --> RESOLVER + SOURCE --> RESOLVER + RESOLVER --> FILTER + FILTER --> SUBSET +``` + +Sources: [core/dbt/cli/params.py:527-534](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/params.py#L527-L534), [core/dbt/context/providers.py:245-301](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/context/providers.py#L245-L301), [tests/functional/sample_mode/test_sample_mode.py:110-366](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/sample_mode/test_sample_mode.py#L110-L366), [core/dbt/cli/option_types.py:97-121](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/option_types.py#L97-L121) + +### Version Information + +The CLI system provides version information through the `--version` flag, displaying the installed version of dbt-core and any plugins: + +```python +# From core/dbt/version.py +def get_version_information() -> str: + installed = get_installed_version() + latest = get_latest_version() + + core_msg_lines, core_info_msg = _get_core_msg_lines(installed, latest) + core_msg = _format_core_msg(core_msg_lines) + plugin_version_msg = _get_plugins_msg() + + msg_lines = [core_msg] + + if core_info_msg != "": + msg_lines.append(core_info_msg) + + msg_lines.append(plugin_version_msg) + msg_lines.append("") + + return "\n\n".join(msg_lines) +``` + +Sources: [core/dbt/version.py:16-33](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/version.py#L16-L33), [core/dbt/cli/params.py:720-740](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/params.py#L720-L740) + +## Integration with Other Systems + +```mermaid +graph TD + subgraph "CLI System" + CLI["CLI Commands\ndbt.cli.main"] + FLAGS["Flag System\ndbt.cli.flags"] + REQS["Environment Setup\ndbt.cli.requires"] + end + + subgraph "Core Systems" + CONFIG["Configuration\ndbt.config.*"] + ADAPTER["Adapters\ndbt.adapters.*"] + TASK["Tasks\ndbt.task.*"] + PARSER["Parser\ndbt.parser.*"] + EVENT["Events\ndbt.events.*"] + TRACK["Tracking\ndbt.tracking"] + end + + CLI --> FLAGS + CLI --> REQS + CLI --> TASK + + REQS --> CONFIG + REQS --> ADAPTER + REQS --> PARSER + REQS --> EVENT + REQS --> TRACK + + TASK --> ADAPTER + TASK --> PARSER + TASK --> EVENT +``` + +Sources: [core/dbt/cli/requires.py:58-410](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/requires.py#L58-L410), [core/dbt/cli/main.py:169-837](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L169-L837) + +The CLI system interacts with several other systems in dbt-core: +- **Configuration System**: Loads profiles and project settings +- **Adapter System**: Connects to the database +- **Task System**: Executes commands +- **Parser/Manifest System**: Parses project files and builds the manifest +- **Event System**: Logs events and handles tracking \ No newline at end of file diff --git a/docs/autogenerated_docs/4.1-command-interface-and-deprecations.md b/docs/autogenerated_docs/4.1-command-interface-and-deprecations.md new file mode 100644 index 00000000000..c8fbc0a082f --- /dev/null +++ b/docs/autogenerated_docs/4.1-command-interface-and-deprecations.md @@ -0,0 +1,350 @@ +# dbtRunner and Programmatic Interface + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250611-160217.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250611-160217.yaml) +- [.changes/unreleased/Features-20250703-175341.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250703-175341.yaml) + +
+ + + +This document covers the programmatic interface to dbt-core, specifically the `dbtRunner` class that allows users to invoke dbt commands from Python code rather than through the command line. This enables integration of dbt into larger Python applications, automated workflows, testing frameworks, or custom tooling built around dbt. + +## Overview of dbtRunner + +The `dbtRunner` class provides a programmatic way to execute dbt commands and capture their results. It serves as an alternative to using the command-line interface directly, allowing developers to: + +1. Execute dbt commands from Python code +2. Capture structured results of command execution +3. Register callbacks to process events during execution +4. Reuse parsed manifest objects for improved performance + +The class is designed to be a thin wrapper around the CLI system, providing the same functionality but with a Python interface instead of a command-line interface. + +Sources: [core/dbt/cli/main.py:40-97](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L40-L97) + +## Architecture + +The `dbtRunner` sits in the User Interface Layer of dbt-core, functioning as an alternative entry point to the CLI interface. Both interfaces ultimately invoke the same internal dbt components. + +```mermaid +graph TD + subgraph "User Interface Layer" + User["User Python Code"] + dbtRunner["dbtRunner Class"] + CLI["CLI System\n(Command-line Interface)"] + end + + subgraph "Core Processing Layer" + Parser["ManifestLoader & Parser"] + Runner["RunTask & Node Execution"] + Tasks["Task Classes\n(BuildTask, RunTask, etc.)"] + end + + subgraph "Infrastructure Systems" + Events["Event System"] + Manifest["Manifest"] + end + + User -->|"invoke()"| dbtRunner + dbtRunner -->|"calls"| CLI + CLI -->|"initializes"| Tasks + Tasks -->|"uses"| Parser + Tasks -->|"uses"| Runner + Events -->|"calls"| dbtRunner + dbtRunner -->|"registers callbacks"| Events + dbtRunner -->|"can provide"| Manifest + Tasks -->|"reads/writes"| Manifest +``` + +When a user invokes a command through `dbtRunner`, it: +1. Creates a CLI context with the provided arguments +2. Registers any user-provided callbacks with the event system +3. Provides any pre-loaded manifest if one was given +4. Executes the requested command through the CLI system +5. Captures the result and returns it in a structured format + +Sources: [core/dbt/cli/main.py:42-97](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L42-L97), [core/dbt/cli/requires.py:58-218](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/requires.py#L58-L218) + +## dbtRunner Class + +The `dbtRunner` class is defined in the `dbt.cli.main` module and provides the main entry point for programmatic use of dbt. + +```mermaid +classDiagram + class dbtRunner { + +manifest: Optional[Manifest] + +callbacks: List[Callable] + +__init__(manifest, callbacks) + +invoke(args, **kwargs): dbtRunnerResult + } + + class dbtRunnerResult { + +success: bool + +exception: Optional[BaseException] + +result: Union[bool, CatalogArtifact, List[str], Manifest, None, RunExecutionResult] + +__init__(success, exception, result) + } + + dbtRunner --> dbtRunnerResult : returns +``` + +### Constructor + +The `dbtRunner` constructor accepts two optional parameters: + +- `manifest`: An optional pre-loaded `Manifest` object, which can be reused from a previous run to avoid reparsing the project +- `callbacks`: An optional list of callback functions that will be called when events occur during execution + +### Method: invoke + +The primary method of `dbtRunner` is `invoke`, which executes a dbt command: + +```python +def invoke(self, args: List[str], **kwargs) -> dbtRunnerResult: + """ + Executes a dbt command with the given arguments and returns a result object. + + Parameters: + args: A list of command-line arguments (e.g., ["run", "--select", "my_model"]) + **kwargs: Additional parameters to override CLI arguments + + Returns: + A dbtRunnerResult object containing success status, exception (if any), and result + """ +``` + +The `invoke` method: +1. Creates a click context with the provided arguments +2. Adds the manifest and callbacks to the context object +3. Applies any kwargs as parameter overrides +4. Invokes the command +5. Captures and returns the result in a `dbtRunnerResult` object + +Sources: [core/dbt/cli/main.py:53-97](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L53-L97) + +### Exception Handling + +The `invoke` method handles various types of exceptions that might occur during command execution: + +- `requires.ResultExit`: Returns the result with success=False +- `requires.ExceptionExit`: Returns the exception with success=False +- Click exceptions (`BadOptionUsage`, `NoSuchOption`, `UsageError`): Converted to `DbtUsageException` +- `ClickExit`: Handled based on exit code +- Other exceptions: Returned as-is with success=False + +Sources: [core/dbt/cli/main.py:71-97](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L71-L97) + +## dbtRunnerResult Class + +The `dbtRunnerResult` class encapsulates the result of invoking a dbt command: + +```python +@dataclass +class dbtRunnerResult: + success: bool # Whether the command executed successfully + exception: Optional[BaseException] = None # Any exception that occurred + result: Union[ # Command-specific result object + bool, # debug + CatalogArtifact, # docs generate + List[str], # list/ls + Manifest, # parse + None, # clean, deps, init, source + RunExecutionResult, # build, compile, run, seed, snapshot, test, run-operation + ] = None +``` + +The type of `result` depends on the command executed: +- `debug`: Boolean indicating success +- `docs generate`: `CatalogArtifact` object +- `list`/`ls`: List of strings +- `parse`: `Manifest` object +- `clean`/`deps`/`init`/`source`: None +- `build`/`compile`/`run`/`seed`/`snapshot`/`test`/`run-operation`: `RunExecutionResult` object + +Sources: [core/dbt/cli/main.py:23-37](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/main.py#L23-L37) + +## Usage Examples + +### Basic Usage + +```python +from dbt.cli.main import dbtRunner + +# Create a dbtRunner instance +dbt = dbtRunner() + +# Run a command +result = dbt.invoke(["run"]) + +# Check if the command succeeded +if result.success: + print("Command executed successfully!") +else: + print(f"Command failed: {result.exception}") +``` + +### Passing Command Arguments + +```python +# Run with model selection +result = dbt.invoke(["run", "--select", "my_model"]) + +# Run with full-refresh +result = dbt.invoke(["run", "--full-refresh"]) +``` + +### Passing Keyword Arguments + +You can pass additional parameters as keyword arguments, which will override any CLI arguments: + +```python +# Override configuration with kwargs +result = dbt.invoke( + ["run"], + log_format="json", # Override --log-format + log_path="logs/dbt.log", # Override --log-path + target="dev", # Override --target +) +``` + +Sources: [tests/functional/dbt_runner/test_dbt_runner.py:58-72](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/dbt_runner/test_dbt_runner.py#L58-L72) + +### Error Handling + +```python +# Handle potential errors +result = dbt.invoke(["run"]) + +if not result.success: + if result.exception: + if isinstance(result.exception, DbtUsageException): + print("Usage error:", result.exception) + elif isinstance(result.exception, DbtProjectError): + print("Project error:", result.exception) + else: + print("Error:", result.exception) +``` + +Sources: [tests/functional/dbt_runner/test_dbt_runner.py:26-45](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/dbt_runner/test_dbt_runner.py#L26-L45) + +## Event Handling with Callbacks + +The `dbtRunner` allows registering callback functions that will be called when events occur during execution: + +```mermaid +sequenceDiagram + participant User as "User Code" + participant dbtRunner as "dbtRunner" + participant EventSystem as "Event System" + participant Task as "dbt Task" + + User->>dbtRunner: create with callbacks=[callback_fn] + User->>dbtRunner: invoke(["run"]) + dbtRunner->>EventSystem: register callbacks + dbtRunner->>Task: execute command + Task->>EventSystem: emit event + EventSystem->>User: call callback_fn(event) + Task-->>dbtRunner: return result + dbtRunner-->>User: return dbtRunnerResult +``` + +### Defining a Callback Function + +A callback function takes an `EventMsg` object and processes it: + +```python +from dbt_common.events.base_types import EventMsg + +def my_callback(event: EventMsg) -> None: + # Process the event + print(f"Event: {event.info}") + +# Register the callback +dbt = dbtRunner(callbacks=[my_callback]) + +# Run a command +result = dbt.invoke(["run"]) +``` + +Sources: [tests/functional/dbt_runner/test_dbt_runner.py:50-56](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/dbt_runner/test_dbt_runner.py#L50-L56) + +## Advanced Usage + +### Reusing Manifests for Performance + +One powerful feature of `dbtRunner` is the ability to reuse a previously parsed manifest, which can significantly improve performance for subsequent commands: + +```python +# Parse the project once +parse_result = dbt.invoke(["parse"]) +manifest = parse_result.result + +# Use the manifest for subsequent commands +dbt_with_manifest = dbtRunner(manifest=manifest) +run_result = dbt_with_manifest.invoke(["run"]) +test_result = dbt_with_manifest.invoke(["test"]) +``` + +This avoids reparsing the project for each command, which can be a significant performance improvement for large projects. + +Sources: [tests/functional/dbt_runner/test_dbt_runner.py:90-99](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/dbt_runner/test_dbt_runner.py#L90-L99) + +### Integration with Custom Applications + +The `dbtRunner` can be integrated into larger Python applications, allowing for sophisticated workflows: + +```python +def run_dbt_pipeline(): + dbt = dbtRunner() + + # Run deps to ensure packages are installed + deps_result = dbt.invoke(["deps"]) + if not deps_result.success: + raise Exception("Failed to install dependencies") + + # Parse the project + parse_result = dbt.invoke(["parse"]) + if not parse_result.success: + raise Exception("Failed to parse project") + + # Reuse the manifest for run and test + manifest = parse_result.result + dbt_with_manifest = dbtRunner(manifest=manifest) + + # Run models + run_result = dbt_with_manifest.invoke(["run"]) + if not run_result.success: + raise Exception("Failed to run models") + + # Test models + test_result = dbt_with_manifest.invoke(["test"]) + if not test_result.success: + raise Exception("Tests failed") + + return "Pipeline completed successfully" +``` + +## Sample Mode Usage + +The dbtRunner interface also supports advanced features like sample mode, which allows testing or previewing data within specific time windows: + +```python +# Run in sample mode with a relative time window +result = dbt.invoke(["run", "--sample=1 day"]) + +# Run in sample mode with specific start and end dates +result = dbt.invoke(["run", "--sample={'start': '2023-01-01', 'end': '2023-01-02'}"]) +``` + +The sample mode is particularly useful for testing incremental models or working with time-series data. + +Sources: [tests/functional/sample_mode/test_sample_mode.py:149-174](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/tests/functional/sample_mode/test_sample_mode.py#L149-L174), [core/dbt/cli/option_types.py:97-122](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/core/dbt/cli/option_types.py#L97-L122) + +## Conclusion + +The `dbtRunner` class provides a powerful, programmatic interface to dbt that enables integration with Python applications and automated workflows. By understanding its architecture, methods, and usage patterns, developers can leverage dbt's capabilities in more flexible and automated ways beyond the command line. \ No newline at end of file diff --git a/docs/autogenerated_docs/5-configuration-system.md b/docs/autogenerated_docs/5-configuration-system.md new file mode 100644 index 00000000000..27875125cc7 --- /dev/null +++ b/docs/autogenerated_docs/5-configuration-system.md @@ -0,0 +1,282 @@ +# Configuration System + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250617-142516.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250617-142516.yaml) + +
+ + + +## Purpose and Scope + +The Configuration System manages all project-level configuration in dbt-core, including schema validation, configuration parsing, and the hierarchical resolution of settings across multiple sources. This system processes `dbt_project.yml` files, validates configurations against JSON schemas, and provides a unified interface for accessing configuration data throughout the dbt execution pipeline. + +For information about project-specific configuration schemas and catalog integration, see [Project Configuration and Schema](#5.1). For details about hierarchical configuration parsing and nested configuration management, see [Hierarchical Configuration Parsing](#5.2). + +## Configuration Architecture Overview + +The Configuration System serves as the foundation for all dbt operations by managing how configuration data flows from various sources into the execution environment. It handles validation, parsing, and resolution of configuration conflicts across multiple hierarchical levels. + +```mermaid +graph TB + subgraph "Configuration Sources" + dbt_project["dbt_project.yml"] + env_vars["Environment Variables"] + cli_flags["CLI Flags"] + profiles["profiles.yml"] + end + + subgraph "Configuration Processing" + schema_validator["JSON Schema Validator"] + config_parser["Configuration Parser"] + hierarchical_resolver["Hierarchical Resolver"] + end + + subgraph "Configuration Storage" + project_config["Project Configuration"] + runtime_config["Runtime Configuration"] + validation_cache["Validation Cache"] + end + + subgraph "Consumer Systems" + model_processor["Model Processing"] + source_processor["Source Processing"] + test_processor["Test Processing"] + cli_interface["CLI Interface"] + end + + dbt_project --> schema_validator + env_vars --> config_parser + cli_flags --> hierarchical_resolver + profiles --> config_parser + + schema_validator --> project_config + config_parser --> runtime_config + hierarchical_resolver --> validation_cache + + project_config --> model_processor + runtime_config --> source_processor + validation_cache --> test_processor + project_config --> cli_interface +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## Configuration Types and Schema Validation + +The system manages multiple configuration types through a comprehensive JSON schema validation framework. Each configuration type has specific validation rules and processing requirements. + +```mermaid +graph LR + subgraph "Schema Types" + project_schema["dbt_project.yml Schema"] + model_schema["Model Config Schema"] + source_schema["Source Config Schema"] + test_schema["Test Properties Schema"] + exposure_schema["Exposure Config Schema"] + end + + subgraph "Validation Engine" + json_validator["JSONSchemaValidator"] + builtin_validator["BuiltinDataTestValidator"] + deprecation_validator["DeprecationValidator"] + end + + subgraph "Configuration Objects" + ProjectConfig["ProjectConfig"] + ModelConfig["ModelConfig"] + SourceConfig["SourceConfig"] + TestConfig["TestConfig"] + ExposureConfig["ExposureConfig"] + end + + project_schema --> json_validator + model_schema --> builtin_validator + source_schema --> json_validator + test_schema --> builtin_validator + exposure_schema --> deprecation_validator + + json_validator --> ProjectConfig + builtin_validator --> ModelConfig + json_validator --> SourceConfig + builtin_validator --> TestConfig + deprecation_validator --> ExposureConfig +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## Configuration Processing Pipeline + +The configuration processing follows a structured pipeline that validates, parses, and resolves configuration data from multiple sources with proper precedence handling. + +| Processing Stage | Input Sources | Validation Type | Output | +|-----------------|---------------|-----------------|---------| +| Schema Validation | `dbt_project.yml`, model configs | JSON Schema | Validated config objects | +| Hierarchical Resolution | CLI flags, env vars, project configs | Precedence rules | Merged configuration | +| Runtime Application | Merged config, context data | Runtime validation | Active configuration | +| Deprecation Checking | All config sources | Deprecation rules | Warning notifications | + +```mermaid +flowchart TD + input_stage["Configuration Input Stage"] + + subgraph "Input Processing" + read_project["Read dbt_project.yml"] + parse_env["Parse Environment Variables"] + extract_cli["Extract CLI Arguments"] + end + + subgraph "Validation Stage" + schema_check["JSON Schema Validation"] + builtin_check["Builtin Data Test Validation"] + exposure_check["Exposure Config Validation"] + end + + subgraph "Resolution Stage" + hierarchy_merge["Hierarchical Merge"] + conflict_resolve["Conflict Resolution"] + default_apply["Apply Defaults"] + end + + subgraph "Output Stage" + runtime_config["Runtime Configuration"] + validation_errors["Validation Errors"] + deprecation_warnings["Deprecation Warnings"] + end + + input_stage --> read_project + input_stage --> parse_env + input_stage --> extract_cli + + read_project --> schema_check + parse_env --> builtin_check + extract_cli --> exposure_check + + schema_check --> hierarchy_merge + builtin_check --> conflict_resolve + exposure_check --> default_apply + + hierarchy_merge --> runtime_config + conflict_resolve --> validation_errors + default_apply --> deprecation_warnings +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## Configuration Precedence and Hierarchical Resolution + +The system implements a multi-level precedence system where configuration values are resolved based on their source priority and specificity. + +### Configuration Precedence Order + +1. **CLI Flags** - Highest precedence, runtime-specific +2. **Environment Variables** - System-level overrides +3. **Project Configuration** - Project-specific settings in `dbt_project.yml` +4. **Default Values** - Built-in system defaults + +### Hierarchical Configuration Structure + +```mermaid +graph TB + subgraph "Global Level" + global_defaults["Global Defaults"] + system_env["System Environment"] + end + + subgraph "Project Level" + project_yml["dbt_project.yml"] + project_env["Project Environment Variables"] + end + + subgraph "Model Level" + model_config["Model-specific Config"] + model_overrides["Model Overrides"] + end + + subgraph "Runtime Level" + cli_overrides["CLI Flag Overrides"] + runtime_context["Runtime Context"] + end + + global_defaults --> project_yml + system_env --> project_env + project_yml --> model_config + project_env --> model_overrides + model_config --> cli_overrides + model_overrides --> runtime_context +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## JSON Schema Integration and Validation + +The Configuration System integrates deeply with JSON Schema validation to ensure configuration accuracy and provide meaningful error messages for invalid configurations. + +### Schema Validation Components + +- **Built-in Data Test Properties**: Validates test configuration properties against schema definitions +- **Exposure Configuration Validation**: Ensures exposure configurations conform to expected structure +- **Deprecation-Aware Validation**: Provides warnings for deprecated configuration patterns + +### Validation Error Handling + +The system provides structured validation error reporting that includes: +- Schema violation details +- Suggested corrections +- Deprecation warnings with migration guidance +- Context information for debugging + +```mermaid +graph LR + subgraph "Schema Sources" + builtin_schemas["Built-in Schemas"] + custom_schemas["Custom Schemas"] + adapter_schemas["Adapter Schemas"] + end + + subgraph "Validation Process" + schema_loader["Schema Loader"] + validation_engine["Validation Engine"] + error_formatter["Error Formatter"] + end + + subgraph "Validation Results" + valid_config["Valid Configuration"] + validation_errors["Validation Errors"] + deprecation_warnings["Deprecation Warnings"] + end + + builtin_schemas --> schema_loader + custom_schemas --> schema_loader + adapter_schemas --> validation_engine + + schema_loader --> validation_engine + validation_engine --> error_formatter + + error_formatter --> valid_config + error_formatter --> validation_errors + error_formatter --> deprecation_warnings +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## Integration with Other Systems + +The Configuration System serves as a central hub that provides configuration data to all other major systems in dbt-core. + +### System Integration Points + +| Consumer System | Configuration Data Used | Integration Method | +|----------------|------------------------|-------------------| +| Model Processing | Model configs, project settings | Direct config object access | +| Source Processing | Source configs, freshness settings | Configuration injection | +| Test Processing | Test properties, validation rules | Schema-validated configs | +| CLI Interface | Runtime flags, environment settings | Hierarchical resolution | + +### Configuration Distribution Pattern + +The system uses a centralized configuration distribution pattern where validated configuration objects are passed to consuming systems rather than having each system parse configuration independently. \ No newline at end of file diff --git a/docs/autogenerated_docs/5.1-project-configuration-and-schema.md b/docs/autogenerated_docs/5.1-project-configuration-and-schema.md new file mode 100644 index 00000000000..f45946b5c05 --- /dev/null +++ b/docs/autogenerated_docs/5.1-project-configuration-and-schema.md @@ -0,0 +1,288 @@ +# Project Configuration and Schema + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250529-085311.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250529-085311.yaml) +- [.changes/unreleased/Features-20250617-142516.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250617-142516.yaml) +- [.changes/unreleased/Features-20250623-113130.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250623-113130.yaml) + +
+ + + +## Purpose and Scope + +This document covers dbt-core's project-level configuration management system, focusing on the `dbt_project.yml` schema validation, catalog integration configurations, and the underlying JSON schema framework that validates project configurations. This includes the schema definitions for data test properties, exposure configurations, and source/table-level settings. + +For information about hierarchical configuration parsing and nested configuration management, see [Hierarchical Configuration Parsing](#5.2). For CLI-level configuration handling, see [Command Interface and Deprecations](#4.1). + +## dbt_project.yml Schema System Overview + +The dbt project configuration system centers around the `dbt_project.yml` file, which serves as the primary configuration entry point for dbt projects. The system uses JSON schema validation to ensure configuration correctness and provides structured configuration parsing. + +```mermaid +graph TB + subgraph "Configuration Entry Points" + dbt_project["dbt_project.yml"] + model_configs["Model Configurations"] + source_configs["Source Configurations"] + table_configs["Table Configurations"] + end + + subgraph "Schema Validation Layer" + json_validator["JSON Schema Validator"] + builtin_schemas["Builtin Schema Definitions"] + data_test_schema["Data Test Properties Schema"] + exposure_schema["Exposure Config Schema"] + end + + subgraph "Configuration Processing" + config_parser["Configuration Parser"] + catalog_integration["Catalog Integration Config"] + freshness_config["Freshness Configuration"] + end + + subgraph "Validation Output" + validated_config["Validated Configuration"] + deprecation_warnings["Deprecation Warnings"] + schema_errors["Schema Validation Errors"] + end + + dbt_project --> json_validator + model_configs --> json_validator + source_configs --> json_validator + table_configs --> json_validator + + json_validator --> builtin_schemas + json_validator --> data_test_schema + json_validator --> exposure_schema + + builtin_schemas --> config_parser + data_test_schema --> config_parser + exposure_schema --> config_parser + + config_parser --> catalog_integration + config_parser --> freshness_config + + config_parser --> validated_config + json_validator --> deprecation_warnings + json_validator --> schema_errors +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## JSON Schema Validation Framework + +The configuration validation system relies on JSON schemas to enforce structure and data types for various configuration sections. This framework has been enhanced to include builtin data test properties and exposure configurations directly in the `dbt_project.yml` schema definitions. + +### Schema Validation Components + +| Component | Purpose | Key Features | +|-----------|---------|--------------| +| **JSON Schema Validator** | Core validation engine | Type checking, required fields, format validation | +| **Builtin Data Test Properties** | Validates test configurations | Supports standard dbt test properties | +| **Exposure Config Schema** | Validates exposure definitions | Ensures proper exposure configuration structure | +| **Deprecation Detection** | Identifies deprecated configurations | Provides migration guidance | + +```mermaid +graph LR + subgraph "Schema Definitions" + project_schema["dbt_project.yml Schema"] + test_properties["Data Test Properties"] + exposure_config["Exposure Configurations"] + source_schema["Source Schema"] + end + + subgraph "Validation Process" + schema_loader["Schema Loader"] + validator["JSON Validator"] + deprecation_checker["Deprecation Checker"] + end + + subgraph "Configuration Types" + builtin_tests["Builtin Test Configs"] + custom_tests["Custom Test Configs"] + exposure_defs["Exposure Definitions"] + catalog_configs["Catalog Integration Configs"] + end + + project_schema --> schema_loader + test_properties --> schema_loader + exposure_config --> schema_loader + source_schema --> schema_loader + + schema_loader --> validator + validator --> deprecation_checker + + validator --> builtin_tests + validator --> custom_tests + validator --> exposure_defs + validator --> catalog_configs +``` + +Sources: `.changes/unreleased/Features-20250617-142516.yaml` + +## Catalog Integration Configuration + +The catalog integration system has been extended to support additional configuration options, including file format specifications. This allows dbt to better integrate with various catalog systems and data discovery tools. + +### Catalog Configuration Properties + +The catalog integration configuration includes: + +- **File Format Configuration**: Specifies the format for catalog output files +- **Integration Settings**: Controls how dbt interacts with external catalog systems +- **Output Formatting**: Manages the structure and format of catalog data + +```mermaid +graph TD + subgraph "Catalog Configuration" + file_format["file_format Property"] + integration_config["Integration Config"] + output_settings["Output Settings"] + end + + subgraph "Catalog Processing" + catalog_generator["Catalog Generator"] + format_handler["Format Handler"] + integration_manager["Integration Manager"] + end + + subgraph "Output Formats" + json_format["JSON Format"] + yaml_format["YAML Format"] + csv_format["CSV Format"] + custom_format["Custom Format"] + end + + file_format --> catalog_generator + integration_config --> catalog_generator + output_settings --> catalog_generator + + catalog_generator --> format_handler + catalog_generator --> integration_manager + + format_handler --> json_format + format_handler --> yaml_format + format_handler --> csv_format + format_handler --> custom_format +``` + +Sources: `.changes/unreleased/Features-20250529-085311.yaml` + +## Source and Table Configuration Properties + +The configuration system supports freshness tracking properties at both source and table levels. This includes `loaded_at_query` and `loaded_at_field` configurations that enable dbt to determine data freshness automatically. + +### Freshness Configuration Structure + +| Configuration | Level | Purpose | Example Use Case | +|---------------|-------|---------|------------------| +| `loaded_at_query` | Source/Table | Custom SQL query to determine load time | Complex timestamp logic | +| `loaded_at_field` | Source/Table | Field name containing load timestamp | Simple timestamp column | +| **Freshness Thresholds** | Source/Table | Warning and error thresholds | SLA monitoring | +| **Freshness Filters** | Source/Table | Additional filtering conditions | Conditional freshness checks | + +```mermaid +graph TB + subgraph "Configuration Sources" + source_yml["sources.yml"] + dbt_project_yml["dbt_project.yml"] + table_properties["Table Properties"] + end + + subgraph "Freshness Properties" + loaded_at_query["loaded_at_query"] + loaded_at_field["loaded_at_field"] + freshness_thresholds["Freshness Thresholds"] + freshness_filters["Freshness Filters"] + end + + subgraph "Processing Components" + freshness_parser["Freshness Parser"] + query_builder["Query Builder"] + threshold_validator["Threshold Validator"] + end + + subgraph "Runtime Execution" + freshness_check["Freshness Check Execution"] + timestamp_extraction["Timestamp Extraction"] + sla_monitoring["SLA Monitoring"] + end + + source_yml --> loaded_at_query + source_yml --> loaded_at_field + dbt_project_yml --> freshness_thresholds + table_properties --> freshness_filters + + loaded_at_query --> freshness_parser + loaded_at_field --> freshness_parser + freshness_thresholds --> threshold_validator + freshness_filters --> freshness_parser + + freshness_parser --> query_builder + threshold_validator --> freshness_check + query_builder --> timestamp_extraction + freshness_check --> sla_monitoring +``` + +Sources: `.changes/unreleased/Features-20250623-113130.yaml` + +## Data Test and Exposure Configuration Schema + +The schema validation system includes comprehensive support for data test properties and exposure configurations within the `dbt_project.yml` file. This enables more accurate deprecation warnings and better configuration validation. + +### Configuration Validation Flow + +The validation process ensures that both builtin and custom configurations conform to expected schemas: + +1. **Schema Loading**: Load appropriate schema definitions for the configuration type +2. **Property Validation**: Validate individual properties against their schema definitions +3. **Deprecation Checking**: Identify deprecated configuration patterns +4. **Error Reporting**: Provide detailed validation error messages + +```mermaid +graph LR + subgraph "Test Configuration" + builtin_tests["Builtin Data Tests"] + custom_tests["Custom Data Tests"] + test_properties["Test Properties"] + end + + subgraph "Exposure Configuration" + exposure_definitions["Exposure Definitions"] + exposure_properties["Exposure Properties"] + exposure_meta["Exposure Metadata"] + end + + subgraph "Validation Engine" + property_validator["Property Validator"] + schema_matcher["Schema Matcher"] + deprecation_detector["Deprecation Detector"] + end + + subgraph "Validation Results" + valid_config["Valid Configuration"] + deprecation_warnings["Deprecation Warnings"] + validation_errors["Validation Errors"] + end + + builtin_tests --> property_validator + custom_tests --> property_validator + test_properties --> property_validator + + exposure_definitions --> schema_matcher + exposure_properties --> schema_matcher + exposure_meta --> schema_matcher + + property_validator --> deprecation_detector + schema_matcher --> deprecation_detector + + deprecation_detector --> valid_config + deprecation_detector --> deprecation_warnings + property_validator --> validation_errors + schema_matcher --> validation_errors +``` \ No newline at end of file diff --git a/docs/autogenerated_docs/5.2-hierarchical-configuration-parsing.md b/docs/autogenerated_docs/5.2-hierarchical-configuration-parsing.md new file mode 100644 index 00000000000..8ee24fba012 --- /dev/null +++ b/docs/autogenerated_docs/5.2-hierarchical-configuration-parsing.md @@ -0,0 +1,188 @@ +# Hierarchical Configuration Parsing + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Fixes-20250612-145159.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250612-145159.yaml) + +
+ + + +## Purpose and Scope + +This document covers dbt-core's hierarchical configuration parsing system, which manages how configuration values are resolved across multiple levels of the project hierarchy. The system handles configuration precedence, inheritance, and overrides for settings like `store_failures`, test configurations, and other model properties. + +For information about project-level configuration schemas, see [Project Configuration and Schema](#5.1). For general configuration validation, see [Configuration Validation and JSON Schema](#3.1). + +## Overview + +dbt-core implements a hierarchical configuration system that allows configurations to be defined at multiple levels and resolved according to precedence rules. This enables flexible configuration management where settings can be defined broadly at the project level and selectively overridden at more specific levels. + +```mermaid +graph TB + subgraph "Configuration Hierarchy" + PL["dbt_project.yml
(Project Level)"] + DL["Directory Configs
(Directory Level)"] + ML["Model/Test Configs
(Node Level)"] + end + + subgraph "Parsing Engine" + HP["HierarchicalConfigParser"] + CR["ConfigResolver"] + VM["ValueMerger"] + end + + subgraph "Configuration Types" + SF["store_failures"] + TC["TestConfigs"] + MC["ModelConfigs"] + SC["SourceConfigs"] + end + + PL --> HP + DL --> HP + ML --> HP + + HP --> CR + CR --> VM + + VM --> SF + VM --> TC + VM --> MC + VM --> SC +``` + +*Sources: .changes/unreleased/Fixes-20250612-145159.yaml* + +## Configuration Levels and Precedence + +The hierarchical configuration parsing system resolves configurations across multiple levels, with more specific levels taking precedence over general ones: + +| Level | Scope | Configuration Source | Precedence | +|-------|--------|---------------------|------------| +| Project | Global | `dbt_project.yml` | Lowest | +| Directory | Path-based | Directory-level configs | Medium | +| Node | Individual models/tests | In-file configs, schema.yml | Highest | + +### Configuration Resolution Process + +```mermaid +flowchart TD + Start["Configuration Request"] --> LoadProject["Load Project Config
(dbt_project.yml)"] + LoadProject --> LoadDirectory["Load Directory Config
(if applicable)"] + LoadDirectory --> LoadNode["Load Node Config
(model/test specific)"] + LoadNode --> Merge["Merge Configurations
(Hierarchical Resolution)"] + Merge --> Validate["Validate Final Config"] + Validate --> Return["Return Resolved Config"] + + subgraph "Merge Process" + BaseConfig["Base: Project Config"] + OverrideDir["Override: Directory Config"] + OverrideNode["Override: Node Config"] + + BaseConfig --> OverrideDir + OverrideDir --> OverrideNode + end +``` + +*Sources: .changes/unreleased/Fixes-20250612-145159.yaml* + +## Store Failures Configuration Example + +The `store_failures` configuration demonstrates hierarchical parsing behavior. This configuration can be defined at multiple levels and needs proper precedence resolution: + +### Configuration Levels for store_failures + +```mermaid +graph LR + subgraph "Project Level" + ProjSF["dbt_project.yml:
tests:
+store_failures: true"] + end + + subgraph "Model/Test Level" + ModelSF["Individual Test:
config:
store_failures: false"] + end + + subgraph "Resolution" + Resolver["ConfigResolver"] + FinalSF["Final Value:
store_failures: false
(node-level override)"] + end + + ProjSF --> Resolver + ModelSF --> Resolver + Resolver --> FinalSF +``` + +### Common Issues and Fixes + +The hierarchical parsing system has encountered issues with proper precedence resolution, particularly with the `store_failures` configuration: + +**Issue #10165**: Fix for `store_failures` hierarchical config parsing ensures that node-level configurations properly override project-level defaults. + +*Sources: .changes/unreleased/Fixes-20250612-145159.yaml* + +## Implementation Architecture + +The hierarchical configuration parsing system integrates with dbt-core's broader configuration validation framework: + +```mermaid +graph TB + subgraph "Input Sources" + DBT_PROJECT["dbt_project.yml"] + SCHEMA_FILES["schema.yml files"] + MODEL_CONFIGS["Model in-file configs"] + end + + subgraph "Parsing Layer" + PROJECT_PARSER["ProjectConfigParser"] + SCHEMA_PARSER["SchemaParser"] + CONFIG_PARSER["ConfigParser"] + end + + subgraph "Resolution Engine" + HIERARCHY_RESOLVER["HierarchyResolver"] + PRECEDENCE_ENGINE["PrecedenceEngine"] + CONFIG_MERGER["ConfigMerger"] + end + + subgraph "Validation" + JSON_VALIDATOR["JSONSchemaValidator"] + TYPE_CHECKER["TypeChecker"] + CONSTRAINT_VALIDATOR["ConstraintValidator"] + end + + DBT_PROJECT --> PROJECT_PARSER + SCHEMA_FILES --> SCHEMA_PARSER + MODEL_CONFIGS --> CONFIG_PARSER + + PROJECT_PARSER --> HIERARCHY_RESOLVER + SCHEMA_PARSER --> HIERARCHY_RESOLVER + CONFIG_PARSER --> HIERARCHY_RESOLVER + + HIERARCHY_RESOLVER --> PRECEDENCE_ENGINE + PRECEDENCE_ENGINE --> CONFIG_MERGER + + CONFIG_MERGER --> JSON_VALIDATOR + JSON_VALIDATOR --> TYPE_CHECKER + TYPE_CHECKER --> CONSTRAINT_VALIDATOR +``` + +*Sources: .changes/unreleased/Fixes-20250612-145159.yaml* + +## Configuration Merging Strategies + +The system employs different merging strategies depending on the configuration type: + +| Configuration Type | Merge Strategy | Behavior | +|-------------------|----------------|----------| +| Simple Values | Override | Later values completely replace earlier ones | +| Lists | Append/Override | Configurable - either append or replace | +| Dictionaries | Deep Merge | Recursive merging of nested structures | +| Boolean Flags | Override | Direct replacement (e.g., `store_failures`) | + +### Deep Merge Example + +For complex configurations, the system performs deep merging to combine settings from different levels while preserving granular overrides. \ No newline at end of file diff --git a/docs/autogenerated_docs/6-event-and-logging-system.md b/docs/autogenerated_docs/6-event-and-logging-system.md new file mode 100644 index 00000000000..ac6a751a474 --- /dev/null +++ b/docs/autogenerated_docs/6-event-and-logging-system.md @@ -0,0 +1,221 @@ +# Event and Logging System + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250701-164957.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250701-164957.yaml) + +
+ + + +## Purpose and Scope + +The Event and Logging System in dbt-core provides centralized event handling, structured logging, and comprehensive deprecation management across all system components. This system coordinates event emission, log message formatting, and deprecation warning delivery to ensure consistent user communication and system observability. + +For information about configuration validation events, see [Configuration Validation and JSON Schema](#3.1). For details about CLI command deprecation handling, see [Command Interface and Deprecations](#4.1). + +## System Architecture + +The Event and Logging System operates as a cross-cutting concern that integrates with all major dbt-core subsystems to provide unified event handling and user communication. + +### Event and Logging System Architecture + +```mermaid +graph TB + subgraph "Event Sources" + ConfigSys["Configuration System"] + CLISys["CLI System"] + ParseSys["Parser System"] + ExecSys["Execution System"] + end + + subgraph "Event Processing Layer" + EventDispatcher["Event Dispatcher"] + EventTypes["Event Type Registry"] + LogFormatter["Log Formatter"] + end + + subgraph "Deprecation Management" + DeprecationTracker["Deprecation Tracker"] + WarningEmitter["Warning Emitter"] + MigrationGuides["Migration Guides"] + end + + subgraph "Output Channels" + ConsoleLogger["Console Logger"] + FileLogger["File Logger"] + StructuredEvents["Structured Event Output"] + end + + ConfigSys --> EventDispatcher + CLISys --> EventDispatcher + ParseSys --> EventDispatcher + ExecSys --> EventDispatcher + + EventDispatcher --> EventTypes + EventDispatcher --> LogFormatter + EventDispatcher --> DeprecationTracker + + DeprecationTracker --> WarningEmitter + WarningEmitter --> MigrationGuides + + LogFormatter --> ConsoleLogger + LogFormatter --> FileLogger + EventDispatcher --> StructuredEvents +``` + +Sources: System architecture inferred from overall dbt-core system design patterns + +## Event Handling Infrastructure + +The event handling system processes events from across the dbt-core codebase, providing structured logging and user notifications through multiple output channels. + +### Event Processing Flow + +```mermaid +flowchart TD + EventSource["Event Source
(Parser, CLI, Config, etc.)"] --> EventEmission["Event Emission"] + EventEmission --> EventClassification["Event Classification"] + + EventClassification --> InfoEvents["Info Events"] + EventClassification --> WarningEvents["Warning Events"] + EventClassification --> ErrorEvents["Error Events"] + EventClassification --> DeprecationEvents["Deprecation Events"] + + InfoEvents --> LogFormatting["Log Formatting"] + WarningEvents --> LogFormatting + ErrorEvents --> LogFormatting + DeprecationEvents --> DeprecationHandler["Deprecation Handler"] + + DeprecationHandler --> DeprecationRegistry["Deprecation Registry"] + DeprecationHandler --> LogFormatting + + LogFormatting --> ConsoleOutput["Console Output"] + LogFormatting --> FileOutput["File Output"] + LogFormatting --> StructuredOutput["Structured JSON Output"] +``` + +Sources: Event flow patterns inferred from deprecation management requirements + +## Deprecation Management System + +The deprecation management system provides structured handling of deprecated features across dbt-core, ensuring users receive appropriate warnings and migration guidance. + +### Deprecation Categories and Handling + +| Deprecation Category | Example Features | Warning Level | Migration Timeline | +|---------------------|------------------|---------------|-------------------| +| CLI Flags | `--models`, `--model`, `-m` | High Priority | Next Major Version | +| Configuration Properties | `overrides` for sources | Medium Priority | 2-3 Minor Versions | +| Module Imports | `modules.itertools` | Low Priority | Long-term | +| Validation Methods | `GenericJSONSchemaValidationDeprecation` | Technical | Internal Refactoring | + +### Deprecation Warning System + +```mermaid +graph LR + subgraph "Deprecation Sources" + CLIFlags["CLI Flag Usage"] + ConfigProps["Config Property Usage"] + ModuleImports["Module Import Usage"] + ValidationMethods["Validation Method Usage"] + end + + subgraph "Deprecation Processing" + DeprecationDetector["Deprecation Detector"] + WarningGenerator["Warning Generator"] + MigrationAdvice["Migration Advice Generator"] + end + + subgraph "Warning Output" + CLIWarnings["CLI Warnings"] + LogWarnings["Log File Warnings"] + DocumentationLinks["Documentation Links"] + end + + CLIFlags --> DeprecationDetector + ConfigProps --> DeprecationDetector + ModuleImports --> DeprecationDetector + ValidationMethods --> DeprecationDetector + + DeprecationDetector --> WarningGenerator + WarningGenerator --> MigrationAdvice + + WarningGenerator --> CLIWarnings + WarningGenerator --> LogWarnings + MigrationAdvice --> DocumentationLinks +``` + +Sources: [.changes/unreleased/Features-20250701-164957.yaml:1-7](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250701-164957.yaml#L1-L7) + +## Integration with System Components + +The Event and Logging System integrates with all major dbt-core components to provide consistent event handling and user communication. + +### Component Integration Matrix + +| System Component | Event Types | Log Levels | Deprecation Features | +|-----------------|-------------|------------|---------------------| +| Configuration System | Config validation, schema errors | INFO, WARN, ERROR | Property deprecations | +| CLI System | Command execution, flag warnings | INFO, WARN, ERROR | Flag deprecations | +| Parser System | Model parsing, validation | INFO, WARN, ERROR, DEBUG | Syntax deprecations | +| Execution System | Query execution, test results | INFO, WARN, ERROR | Runtime deprecations | +| Dependency System | Version conflicts, updates | INFO, WARN | Dependency deprecations | + +### Cross-System Event Flow + +```mermaid +sequenceDiagram + participant ConfigSys as "Configuration System" + participant EventSys as "Event System" + participant LogSys as "Logging System" + participant User as "User Interface" + + ConfigSys->>EventSys: "Emit deprecation event" + Note right of EventSys: "overrides property detected" + + EventSys->>EventSys: "Classify as deprecation" + EventSys->>LogSys: "Format deprecation warning" + + LogSys->>User: "Display warning message" + LogSys->>User: "Provide migration guidance" + + Note over ConfigSys,User: "Consistent deprecation handling across all systems" +``` + +Sources: Deprecation management patterns inferred from system architecture + +## Event Types and Categorization + +The system maintains a registry of event types that correspond to different operational states and user actions throughout the dbt-core lifecycle. + +### Event Type Hierarchy + +```mermaid +graph TD + RootEvent["dbt-core Events"] --> SystemEvents["System Events"] + RootEvent --> UserEvents["User Events"] + RootEvent --> ValidationEvents["Validation Events"] + RootEvent --> DeprecationEvents["Deprecation Events"] + + SystemEvents --> StartupEvents["Startup Events"] + SystemEvents --> ShutdownEvents["Shutdown Events"] + SystemEvents --> ResourceEvents["Resource Events"] + + UserEvents --> CommandEvents["Command Events"] + UserEvents --> ConfigEvents["Configuration Events"] + UserEvents --> ExecutionEvents["Execution Events"] + + ValidationEvents --> SchemaEvents["Schema Validation Events"] + ValidationEvents --> ModelEvents["Model Validation Events"] + ValidationEvents --> SourceEvents["Source Validation Events"] + + DeprecationEvents --> CLIDeprecations["CLI Deprecations"] + DeprecationEvents --> ConfigDeprecations["Config Deprecations"] + DeprecationEvents --> APIDeprecations["API Deprecations"] +``` + +Sources: Event categorization inferred from system components and deprecation requirements \ No newline at end of file diff --git a/docs/autogenerated_docs/6.1-deprecation-management.md b/docs/autogenerated_docs/6.1-deprecation-management.md new file mode 100644 index 00000000000..ab703022cc2 --- /dev/null +++ b/docs/autogenerated_docs/6.1-deprecation-management.md @@ -0,0 +1,187 @@ +# Deprecation Management + +
+Relevant source files + +The following files were used as context for generating this wiki page: + +- [.changes/unreleased/Features-20250701-164957.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250701-164957.yaml) +- [.changes/unreleased/Features-20250721-173100.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250721-173100.yaml) +- [.changes/unreleased/Features-20250728-115443.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Features-20250728-115443.yaml) +- [.changes/unreleased/Fixes-20250710-170148.yaml](https://github.com/dbt-labs/dbt-core/blob/64b58ec6/.changes/unreleased/Fixes-20250710-170148.yaml) + +
+ + + +This document covers dbt-core's deprecation management system, which handles the lifecycle of deprecated features, warning mechanisms, and migration paths for end users. The system provides structured deprecation warnings, version-based deprecation policies, and integration with the release management process. + +For information about the broader event and logging infrastructure, see [Event and Logging System](#6). For release management and version control, see [Release Process and Version Management](#11.1). + +## Deprecation Types and Lifecycle + +dbt-core uses a structured approach to feature deprecations, with different deprecation types that correspond to different stages in the removal lifecycle. + +```mermaid +graph TD + A[Feature] --> B{Deprecation Decision} + B --> C["Preview Deprecation"] + B --> D["Standard Deprecation"] + B --> E["Breaking Change Deprecation"] + + C --> F["Issue Warning in dbt Cloud
Development Environment"] + D --> G["Issue Warning to All Users"] + E --> H["Issue Warning + Breaking Change Notice"] + + F --> I["Next Major Version
Consider Standard Deprecation"] + G --> J["Future Major Version
Feature Removal"] + E --> K["Next Major Version
Feature Removal"] + + subgraph "Warning Systems" + L["GenericJSONSchemaValidationDeprecation"] + M["CLI Flag Deprecation Warnings"] + N["Configuration Property Warnings"] + end + + F --> L + G --> M + G --> N +``` + +**Sources:** `.changes/unreleased/Fixes-20250710-170148.yaml` + +### Preview Deprecations + +Preview deprecations are used for features that are being evaluated for removal but may still be retained based on user feedback. The `GenericJSONSchemaValidationDeprecation` is implemented as a "preview" deprecation, meaning it primarily affects dbt Cloud development environments rather than all users. + +### Standard Deprecations + +Standard deprecations apply to features that will be removed in future major versions. These generate warnings for all users and are typically maintained for at least one major version before removal. + +## Current Active Deprecations + +The following features are currently in various stages of deprecation: + +| Feature | Type | Status | Target Removal | Issue | +|---------|------|--------|----------------|-------| +| `overrides` property for sources | Standard | Active | Future Major | #11566 | +| Top-level argument properties in generic tests | Standard | Active | Future Major | #11847 | +| `{{ modules.itertools }}` usage | Standard | Active | Future Major | #11725 | +| `GenericJSONSchemaValidationDeprecation` | Preview | Active | TBD | #11814 | + +**Sources:** `.changes/unreleased/Features-20250701-164957.yaml`, `.changes/unreleased/Features-20250721-173100.yaml`, `.changes/unreleased/Features-20250728-115443.yaml`, `.changes/unreleased/Fixes-20250710-170148.yaml` + +## Deprecation Implementation Architecture + +```mermaid +graph TB + subgraph "Change Management Integration" + CF["Change Fragments
(.changes/unreleased/*.yaml)"] + CH["Changie Processing"] + CL["CHANGELOG.md Generation"] + end + + subgraph "Deprecation Warning System" + DW["Deprecation Warnings"] + GD["GenericJSONSchemaValidationDeprecation"] + CLI_W["CLI Flag Warnings"] + CFG_W["Configuration Warnings"] + end + + subgraph "Affected Components" + SRC["Source Configuration
(overrides property)"] + TST["Generic Test Framework
(argument properties)"] + TPL["Template System
(modules.itertools)"] + SCHEMA["JSON Schema Validation"] + end + + subgraph "User-Facing Systems" + CLI["CLI Interface"] + LOGS["Event Logging"] + ERR["Error Messages"] + end + + CF --> CH + CH --> CL + + DW --> GD + DW --> CLI_W + DW --> CFG_W + + SRC --> CFG_W + TST --> CFG_W + TPL --> CLI_W + SCHEMA --> GD + + CLI_W --> CLI + CFG_W --> LOGS + GD --> ERR + + CLI --> LOGS + LOGS --> ERR +``` + +**Sources:** `.changes/unreleased/Features-20250701-164957.yaml`, `.changes/unreleased/Features-20250721-173100.yaml`, `.changes/unreleased/Features-20250728-115443.yaml`, `.changes/unreleased/Fixes-20250710-170148.yaml` + +## Specific Deprecation Cases + +### Source Configuration Overrides + +The `overrides` property for sources has been deprecated as part of configuration system improvements. This deprecation affects source configuration parsing and validation. + +### Generic Test Argument Properties + +Top-level argument properties in generic tests are being deprecated in favor of a more structured approach to test configuration. This impacts the testing framework and test definition parsing. + +### Template System Modules + +The `{{ modules.itertools }}` usage pattern in dbt templates is being deprecated. Users should migrate to alternative approaches for iteration and collection handling in Jinja templates. + +### JSON Schema Validation + +The `GenericJSONSchemaValidationDeprecation` represents a preview deprecation for certain JSON schema validation patterns. This is implemented as a preview deprecation to minimize disruption while the team evaluates the impact. + +## Integration with Release Process + +Deprecation management is closely integrated with dbt-core's release management system through the changie-based changelog automation. Each deprecation generates a change fragment that becomes part of the official release documentation. + +```mermaid +graph LR + subgraph "Deprecation Workflow" + DEP["Deprecation Decision"] + FRAG["Create Change Fragment
(Features-*.yaml)"] + IMPL["Implement Warning System"] + TEST["Validate Deprecation Behavior"] + end + + subgraph "Release Integration" + CHANGIE["Changie Processing"] + CHANGELOG["CHANGELOG.md Update"] + REL["Version Release"] + DOC["Documentation Update"] + end + + DEP --> FRAG + FRAG --> IMPL + IMPL --> TEST + + FRAG --> CHANGIE + CHANGIE --> CHANGELOG + CHANGELOG --> REL + REL --> DOC + + TEST --> CHANGIE +``` + +**Sources:** `.changes/unreleased/Features-20250701-164957.yaml`, `.changes/unreleased/Features-20250721-173100.yaml`, `.changes/unreleased/Features-20250728-115443.yaml`, `.changes/unreleased/Fixes-20250710-170148.yaml` + +## Warning System Implementation + +The deprecation warning system operates through multiple channels to ensure users are informed about deprecated functionality: + +1. **CLI Warnings**: Generated during command execution when deprecated flags or options are used +2. **Configuration Warnings**: Issued during project parsing when deprecated configuration properties are encountered +3. **Runtime Warnings**: Triggered during template rendering or model execution when deprecated patterns are detected +4. **Preview Environment Warnings**: Specific warnings that primarily target development environments in dbt Cloud + +The warning system integrates with dbt-core's event logging infrastructure to ensure consistent message formatting and appropriate log levels for different deprecation types. \ No newline at end of file