dbt-labs · QMalcolm · Apr 2, 2026 · Apr 1, 2026 · Apr 1, 2026 · Apr 1, 2026
@@ -59,6 +59,19 @@ Import order should follow isort conventions:
 3. Third-party
 4. dbt-internal (`dbt`, `dbt_common`, `dbt_adapters`, `dbt_extractor`, `dbt_semantic_interfaces`)
 
+## Architecture Documentation
+
+Before investigating parsing bugs or adding new resource types, read the relevant doc in `docs/arch/`:
+
+| Doc | Covers |
+|---|---|
+| `3_Parsing.md` | Full parse flow, `ManifestLoader`, `SchemaParser`, parser hierarchy |
+| `3.1_Partial_Parsing.md` | Partial parse internals, `PartialParsing` class, file diff and change detection |
+| `3.2_Deferral.md` | State-based deferral |
+| `3.3_Semantic_Models.md` | Semantic model parsing (v1 standalone vs v2 inline), partial parsing edge cases, key files |
+
+These docs describe where things live and how they connect — read them before doing exploratory code search.
+
 ## Key Architectural Conventions
 
 ### Artifact Resources: Import from `dbt.artifacts.resources`, Not Versioned Paths

@@ -0,0 +1,185 @@
+# Semantic Model Parsing
+
+## Overview
+
+Semantic models are first-class resources in dbt-core that expose model data to MetricFlow for metric computation. They define the *entities*, *dimensions*, and *measures* of a model in terms the Semantic Layer can query. Parsing produces `SemanticModel` nodes in the manifest, which are later validated by `dbt_semantic_interfaces`.
+
+## Two Authoring Formats
+
+dbt-core supports two YAML formats for defining semantic models. Understanding the distinction is essential when debugging parsing or partial parsing issues.
+
+### v1: Standalone (top-level `semantic_models:` key)
+
+Defined as an independent entry under a top-level `semantic_models:` key in any schema YAML file:
+
+```yaml
+semantic_models:
+  - name: revenue
+    model: ref('fct_revenue')
+    entities:
+      - name: transaction
+        type: primary
+    dimensions:
+      - name: ds
+        type: time
+        type_params:
+          time_granularity: day
+    measures:
+      - name: revenue
+        agg: sum
+        expr: amount
+```
+
+Parsed by `SemanticModelParser.parse()` in `schema_yaml_readers.py`. The semantic model is a fully independent entry in the YAML; its `model: ref('...')` field links it to the referenced model node via `depends_on`.
+
+### v2: Inline (on the `models:` entry)
+
+Defined directly on a model entry under the `models:` key, with column-level `dimension` and `entity` annotations:
+
+```yaml
+models:
+  - name: fct_revenue
+    semantic_model: true       # or a config dict: {name: custom_sm_name, enabled: true, ...}
+    agg_time_dimension: ds
+    columns:
+      - name: transaction_id
+        entity:
+          name: transaction
+          type: primary
+      - name: ds
+        granularity: day
+        dimension:
+          name: ds
+          type: time
+      - name: revenue
+        # no dimension/entity — becomes a measure candidate
+```
+
+The semantic model is **not** a standalone YAML entry. It is created as a side effect of model patching during `SchemaParser.patch_node_properties()` in `schemas.py`, which calls `SemanticModelParser.parse_v2_semantic_model_from_dbt_model_patch()`. The v2 SM has no entry under `dict_from_yaml["semantic_models"]`.
+
+**Key difference:** v1 SMs are elements of the `semantic_models:` key diff; v2 SMs are a byproduct of the `models:` key diff. This distinction matters for partial parsing (see below).
+
+## Key Files
+
+| File | Role |
+|---|---|
+| `core/dbt/contracts/graph/unparsed.py` | `UnparsedSemanticModel` (v1 contract), `UnparsedSemanticModelConfig` / `UnparsedModelUpdate` (v2 contract) |
+| `core/dbt/parser/schema_yaml_readers.py` | `SemanticModelParser` — `parse()` for v1, `parse_v2_semantic_model_from_dbt_model_patch()` for v2, shared `_parse_semantic_model_helper()` |
+| `core/dbt/parser/schemas.py` | `SchemaParser.patch_node_properties()` — triggers v2 SM creation; `MetricParser.parse_v2_metrics_from_dbt_model_patch()` |
+| `core/dbt/contracts/files.py` | `SchemaSourceFile` — tracks SM unique IDs and metrics per file |
+| `core/dbt/parser/partial.py` | `PartialParsing` — handles SM lifecycle during incremental re-parse |
+| `core/dbt/artifacts/resources/v1/semantic_layer_components.py` | `SemanticModel`, `Dimension`, `Entity`, `Measure` artifact definitions |
+
+## `SchemaSourceFile` Tracking Fields
+
+`SchemaSourceFile` (in `files.py`) maintains per-file lists of parsed resource IDs. For semantic models and metrics:
+
+- **`semantic_models: List[str]`** — unique IDs of all SMs in this file, both v1 and v2. v2 SM unique IDs are appended here when `_parse_semantic_model_helper()` runs.
+- **`node_patches: List[str]`** (alias `ndp`) — unique IDs of model/seed/snapshot nodes patched by this file. A model with `semantic_model: true` will have its model node ID here.
+- **`metrics_from_measures: Dict[str, List[str]]`** — auto-generated metrics keyed by semantic model name. Populated when `create_metric: true` (v1) or v2 simple metrics are generated from measures.
+- **`metrics: List[str]`** — unique IDs of explicitly declared metrics in this file.
+- **`generated_metrics: List[str]`** — legacy field; use `fix_metrics_from_measures()` to migrate to `metrics_from_measures`.
+
+## Parsing Flow
+
+### v1 Standalone
+
+```
+SchemaParser.parse_yaml()
+  └── SemanticModelParser.parse()
+        ├── reads UnparsedSemanticModel from YAML
+        ├── calls _parse_semantic_model_helper()
+        │     └── adds SemanticModel to manifest.semantic_models
+        │     └── appends unique_id to schema_file.semantic_models
+        └── optionally: MetricParser for create_metric measures
+              └── appends to schema_file.metrics_from_measures[sm_name]
+```
+
+### v2 Inline
+
+```
+SchemaParser.parse_yaml()
+  └── ModelPatcher.parse_patch()
+        └── patch_node_properties(node, patch)      [schemas.py]
+              ├── sets node.access, node.version, etc.
+              ├── if semantic_model_enabled:
+              │     SemanticModelParser.parse_v2_semantic_model_from_dbt_model_patch()
+              │       ├── _parse_v2_column_dimensions(patch.columns)
+              │       ├── _parse_v2_column_entities(patch.columns)
+              │       └── _parse_semantic_model_helper(model=f"ref('{patch.name}')", ...)
+              └── MetricParser.parse_v2_metrics_from_dbt_model_patch(patch)
+```
+
+The v2 SM's `model` field is always set to `f"ref('{model_name}')"` — this is the reliable way to identify which model a v2 SM was derived from.
+
+## Partial Parsing Considerations
+
+### v1 SMs — handled correctly
+
+v1 SMs are diffed via the `semantic_models:` key in `handle_schema_file_changes()`. Added/changed/deleted v1 SM entries invoke `delete_schema_semantic_model()`, which removes the SM from the manifest and from `schema_file.semantic_models`, and cleans up `metrics_from_measures`.
+
+### v2 SMs — require special handling (DI-3697)
+
+v2 SMs are **not** represented under `dict_from_yaml["semantic_models"]`, so the normal `semantic_models:` key diff never processes them. When a model entry is changed or deleted, `_delete_schema_mssa_links()` is called, which handles the model node and tests — but historically did not clean up the associated v2 SM.
+
+**The fix (merged in DI-3697):** `_delete_schema_mssa_links()` now calls `_delete_v2_semantic_model_for_model()` for `dict_key == "models"`. This method:
+
+1. Computes `model_ref = f"ref('{model_name}')"` — the string `_parse_semantic_model_helper` stores in `sm.model`
+2. Collects names of v1 SMs from `schema_file.dict_from_yaml["semantic_models"]` to avoid touching them
+3. Iterates `schema_file.semantic_models`, finds entries where `sm.model == model_ref and sm.name not in v1_sm_names`, removes them and cleans up `metrics_from_measures`
+
+**Distinguishing v1 from v2 SMs in the manifest:** A SM in `schema_file.semantic_models` is v2 if its name does **not** appear in `schema_file.dict_from_yaml.get("semantic_models", [])`. Equivalently, its `sm.model` will match `ref('<the_model_name>')`.
+
+### `_schedule_for_parsing` limitation
+
+`schedule_nodes_for_parsing()` can schedule SMs for re-parse when their dependencies change (via `child_map`). However, it uses `_schedule_for_parsing("semantic_models", ...)` which looks up the SM in `schema_file.dict_from_yaml["semantic_models"]` — a lookup that silently fails for v2 SMs. If a v2 SM's children (e.g. saved queries) change and trigger a re-parse of the SM, this path will not find the SM to re-merge. This is a known limitation as of dbt 1.12.
+
+## Testing Patterns
+
+### Test locations
+
+| Test type | Location |
+|---|---|
+| v1 parsing (full parse) | `tests/functional/semantic_models/test_semantic_model_parsing.py` |
+| v1 partial parsing | `tests/functional/semantic_models/test_semantic_model_parsing.py` — `TestSemanticModelPartialParsing*` |
+| v2 parsing (full parse) | `tests/functional/semantic_models/test_semantic_model_v2_parsing.py` |
+| v2 partial parsing | `tests/functional/semantic_models/test_semantic_model_v2_parsing.py` — `TestV2SemanticModel*PartialParsing*` |
+| v2 column-level parsing | `tests/unit/parser/test_v2_column_semantic_parsing.py` |
+| Partial parsing with metrics + SMs | `tests/functional/partial_parsing/test_pp_metrics.py` |
+
+### Functional test pattern for partial parsing
+
+```python
+class TestV2SemanticModelPartialParsingChanged:
+    @pytest.fixture(scope="class")
+    def models(self):
+        return {
+            "schema.yml": some_v2_fixture_yml,
+            "fct_revenue.sql": fct_revenue_sql,
+            "metricflow_time_spine.sql": metricflow_time_spine_sql,
+        }
+
+    def test_partial_parsing_does_not_duplicate(self, project):
+        from dbt.tests.util import write_file
+
+        runner = dbtTestRunner()
+        result = runner.invoke(["parse"])     # full parse
+        assert result.success
+        assert len(result.result.semantic_models) == 1
+
+        write_file(modified_yml, project.project_root, "models", "schema.yml")
+
+        result = runner.invoke(["parse"])     # partial parse
+        assert result.success
+        assert len(result.result.semantic_models) == 1   # not 2
+```
+
+Key: the second `runner.invoke(["parse"])` uses the saved `partial_parse.msgpack` from the first run. Changing the YAML file on disk triggers partial parsing of that file's changed elements.
+
+### Fixtures
+
+Shared YAML and SQL fixtures live in `tests/functional/semantic_models/fixtures.py`. v2 fixtures are named with the `_v2` suffix (e.g. `semantic_model_schema_yml_v2`, `base_schema_yml_v2`). The template fixture `semantic_model_schema_yml_v2_template_for_model_configs` uses a `{semantic_model_value}` placeholder for parameterizing the `semantic_model:` field value.
+
+## See Also
+
+- [Troubleshooting: Semantic Layer Parse Failures](../troubleshooting/semantic_layer_parse_failures.md) — common causes of `dbt parse` errors for semantic models and metrics, and how to improve the error messages they produce.
@@ -0,0 +1,85 @@
+# Troubleshooting: Semantic Layer Parse Failures
+
+This document covers common causes of `dbt parse` failures related to semantic
+models and metrics, and how to fix or improve the errors produced.
+
+## Extra fields on YAML config objects produce vague errors
+
+When a user adds an unrecognised field to a YAML config object (e.g. inside
+`semantic_model:`, a `dimension:`, or a `metric:`), dbt's JSON Schema validator
+rejects it but the default error message is unhelpful — it names the whole
+object rather than the offending key:
+
+```
+Invalid models config given in models/schema.yml @ models: {...} - at path
+['semantic_model']: {...} is not valid under any of the given schemas
+```
+
+**How to improve the error:** Add a `validate()` classmethod to the relevant
+`Unparsed*` dataclass in `core/dbt/contracts/graph/unparsed.py`. Compare
+`cls.__dataclass_fields__` against the incoming `data` dict before calling
+`super().validate(data)`, and raise a `ValidationError` that names the unknown
+field(s) and lists the valid ones. `UnparsedSemanticModelConfig.validate()` is
+the reference implementation.
+
+When adding such a test, use `ContractTestCase.assert_fails_validation_with_message()`
+(in `tests/unit/utils/__init__.py`) to assert both that validation fails *and*
+that the error message is actionable.
+
+If you need a clear PR example, refer to PR12766.
+
+## Union-typed fields produce even more vague errors
+
+Several fields in `unparsed.py` use `Union[SomeConfig, bool, None]` (e.g.
+`UnparsedModelUpdate.semantic_model`). When validation fails on the `SomeConfig`
+branch, JSON Schema exhausts all branches of the `anyOf` and reports failure
+against the union as a whole — giving no indication of which branch failed or
+why:
+
+```
+at path ['semantic_model']: {'enabled': True, 'name': 'purchases', 'description':
+'...'} is not valid under any of the given schemas
+```
+
+**How to improve the error:** The same `validate()` override approach works here.
+By checking the sub-object's fields before `super().validate(data)` runs, the
+specific error fires first and the opaque union failure is never reached.
+
+## Standalone simple metrics must be nested under the model entry
+
+Simple v2 metrics must be written under the model entry (`models[].metrics`),
+not as a top-level `metrics:` key. A top-level `metrics:` key is valid for
+derived, conversion, and cumulative metrics — but **not** for simple ones. Using
+it for a simple metric raises:
+
+```
+simple metrics in v2 YAML must be attached to semantic_model
+```
+
+Move the metrics with type 'simple' to a `metrics:` list to indented under the
+model entry (same level as `columns:`) to fix this:
+
+```yaml
+# Wrong — top-level metrics: key
+models:
+  - name: fct_revenue
+    semantic_model: true
+    columns: ...
+
+metrics:
+  - name: total_revenue   # fails: simple metric cannot be standalone
+    type: simple
+    agg: sum
+    expr: revenue
+
+# Right — metrics nested under the model entry
+models:
+  - name: fct_revenue
+    semantic_model: true
+    columns: ...
+    metrics:
+      - name: total_revenue
+        type: simple
+        agg: sum
+        expr: revenue
+```