Skip to content

Add semantic model parsing architecture doc (docs/arch/3.3_Semantic_Models.md)#12765

Open
theyostalservice wants to merge 3 commits intomainfrom
patricky/di-3697-semantic-model-arch-docs
Open

Add semantic model parsing architecture doc (docs/arch/3.3_Semantic_Models.md)#12765
theyostalservice wants to merge 3 commits intomainfrom
patricky/di-3697-semantic-model-arch-docs

Conversation

@theyostalservice
Copy link
Copy Markdown
Contributor

@theyostalservice theyostalservice commented Apr 1, 2026

Why

docs/arch/ has detailed architecture docs for parsing (3_Parsing.md, 3.1_Partial_Parsing.md) but nothing covering semantic model parsing. Investigating DI-3697 required ~20 minutes of exploratory code reading to reconstruct knowledge that would have taken 2 minutes with a reference doc. Adding this now so future contributors (and AI agents) can orient quickly.

What

  • New: docs/arch/3.3_Semantic_Models.md — covers:

    • v1 standalone vs v2 inline authoring formats and their parsing paths
    • Key files with a table (unparsed.py, schema_yaml_readers.py, schemas.py, files.py, partial.py)
    • SchemaSourceFile tracking fields (semantic_models, node_patches, metrics_from_measures, etc.)
    • Full parsing flow traces for both v1 and v2
    • Partial parsing considerations including the v2 gap fixed in DI-3697 and a known remaining limitation
    • Testing patterns, test locations, and fixture conventions
  • Updated: AGENTS.md — adds an "Architecture Documentation" section at the top pointing to docs/arch/ with a quick-reference table of key docs

  • New - added another doc for troubleshooting SL parsing issues. This is related to a number of user requests, but was generated out of the work done for Improve error message for unknown fields in semantic_model config #12766 .

Refs

Drafted by claude-sonnet-4-6 under the direction of @theyostalservice

… AGENTS.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cla-bot cla-bot bot added the cla:yes label Apr 1, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.49%. Comparing base (eee9587) to head (36a9727).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12765      +/-   ##
==========================================
+ Coverage   91.41%   91.49%   +0.07%     
==========================================
  Files         203      203              
  Lines       25844    25945     +101     
==========================================
+ Hits        23626    23739     +113     
+ Misses       2218     2206      -12     
Flag Coverage Δ
integration 88.38% <ø> (+0.09%) ⬆️
unit 65.75% <ø> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 65.75% <ø> (+0.19%) ⬆️
Integration Tests 88.38% <ø> (+0.09%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

theyostalservice and others added 2 commits April 1, 2026 16:01
@theyostalservice theyostalservice marked this pull request as ready for review April 1, 2026 23:15
@theyostalservice theyostalservice requested a review from a team as a code owner April 1, 2026 23:15
@QMalcolm QMalcolm reopened this Apr 1, 2026
Copy link
Copy Markdown
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the right track! Thank you for putting this together. One thing we should consider before moving forward is what is the status of v1 specification. Supported but not encouraged? That is we shouldn't break it, but it is not where improvements/changes shoudl happen.


## Overview

Semantic models are first-class resources in dbt-core that expose model data to MetricFlow for metric computation. They define the *entities*, *dimensions*, and *measures* of a model in terms the Semantic Layer can query. Parsing produces `SemanticModel` nodes in the manifest, which are later validated by `dbt_semantic_interfaces`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which are later validated by dbt_semantic_interfaces

Soon to be out of date 😂 No change needed here yet, just found it entertaining

Comment on lines +13 to +33
Defined as an independent entry under a top-level `semantic_models:` key in any schema YAML file:

```yaml
semantic_models:
- name: revenue
model: ref('fct_revenue')
entities:
- name: transaction
type: primary
dimensions:
- name: ds
type: time
type_params:
time_granularity: day
measures:
- name: revenue
agg: sum
expr: amount
```

Parsed by `SemanticModelParser.parse()` in `schema_yaml_readers.py`. The semantic model is a fully independent entry in the YAML; its `model: ref('...')` field links it to the referenced model node via `depends_on`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is v1 deprecated? I.e. do we want to no longer encourage the authoring of v1 metrics? If so we should probably note that in this file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it with a note. The answer is that V2 YAML should be the default in all things going forward, but there are several specific situations where v1 supports things v2 does not, and we are not able to deprecate v1 at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants