SAO doc improvements #8234

luna-bianca · 2025-11-28T13:07:18Z

What are you changing in this pull request and why?

Slack thread 1
Slack thread 2

Previews:

Checklist

The changes in this PR meet the docs style guide/fundamentals required for all content.
Applied the proper versioning rules if the content is for specific dbt version(s): (version a whole page or version a block of content
The content in this PR requires a dbt release note, so I added one to the release notes page.).

🚀 Deployment available! Here are the direct links to the updated files:

vercel · 2025-11-28T13:07:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
docs-getdbt-com	Ready	Preview	Jan 12, 2026 3:36pm

luna-bianca · 2025-11-28T14:28:53Z

website/docs/docs/deploy/state-aware-setup.md

+You can use the following optional parameters to customize your state-aware orchestration:

- `loaded_at_query`: Define a custom freshness condition in SQL to account for partial loading or streaming data.
+|Parameter | Description | Allowed values | Supports Jinja |


Converted the parameter descriptions to a table format

reubenmc

Thanks @luna-bianca! @evabgood and I just added some feedback. Things are starting to look great!

reubenmc · 2026-01-08T23:05:21Z

website/docs/best-practices/materializations/materializations-guide-4-incremental-models.md

 - 🕐 For example if most of our records for `2022-01-30` come in the raw schema of our warehouse on the morning of `2022-01-31`, but a handful don’t get loaded til `2022-02-02`, how might we tackle that? There will already be `max(updated_at)` timestamps of `2022-01-31` in the warehouse, filtering out those late records. **They’ll never make it to our model.**
 - 🪟 To mitigate this, we can add a **lookback window** to our **cutoff** point. By **subtracting a few days** from the `max(updated_at)`, we would capture any late data within the window of what we subtracted.
 - 👯 As long as we have a **`unique_key` defined in our config**, we’ll simply update existing rows and avoid duplication. We process more data this way, but in a fixed way, and it keeps our model hewing closer to the source data.
+- If you're using state-aware orchestration, make sure its freshness detection logic accounts for late-arriving data. By default, dbt uses warehouse metadata, which is updated whenever new rows arrive, even if their event timestamps are in the past. However, if you configure a `loaded_at_field` or `loaded_at_query` that uses an event timestamp (for example, `event_date`), late-arriving data may not increase the `loaded_at` value. In this case, state-aware orchestration may skip rebuilding the incremental model, even though your lookback window would normally pick up those records. To ensure late-arriving data is detected, configure your `loaded_at_field` or `loaded_at_query` to align with the same lookback window used in your incremental filter.


I feel like this needs to be split into three cases as it's currently confusing. These are suggestions so please edit!

Using State-aware orchestration with Incremental Models

By default, SAO uses dbt warehouse metadata to determine source freshness. This means that dbt will consider a source to have new data whenever a new row arrives. This could lead to running your models more often than ideal.

To avoid this issue, you can instead tell dbt exactly which field to look at for freshess by configuing a loaded_at_field for a specific column or a loaded_at_query with custom SQL (LINK TO DOCS ON LOADED AT OPTIONS).

Even with a loaded_at_field or loaded_at_query, late arriving records may have an earlier event timestamp. To ensure late-arriving data is detected, configure your loaded_at_field or loaded_at_query to align with the same lookback window used in your incremental filter.

Added new section here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-b64f84159e25ae2681d388f4b7ebf20e8b3c52b3c07f767345ad1b4dfa60fe62R152

reubenmc · 2026-01-08T23:09:34Z

website/docs/docs/deploy/state-aware-about.md

+- Every macro, variable, or templated logic is resolved before state-aware orchestration checks for changes.
+- If you use dynamic content (for example, `{{ run_started_at }}`), state-aware orchestration may detect that as a change even if the “static” SQL template hasn’t changed. This may result in more frequent model rebuilds.
+- Any change to a macro definition or templated logic will be treated as a code change, even if the underlying data or SQL structure remains the same.
+- If you want to leave comments in your source code but don’t want to trigger rebuilds, it is recommended to use regular SQL comments (for example, `-- This is a single-line comment in SQL`) in your query. State-aware orchestration ignores comment-only changes; such annotations will not force model rebuilds across the DAG.


This is currently true, however this should change in a couple of weeks, so it's probably not worth updating right now. Instead, this should (once it goes out) be added to reflect the new behavior.

https://www.notion.so/dbtlabs/Code-changes-for-non-deterministic-SQL-2a4bb38ebda7807386f6ee38e5b0f892?source=copy_link

Detecting code changes

We first look for changes in the pre-rendered SQL (like Mantle/Core does)

iff there is a change, we look at the post-complied SQL (with whitespace and comments stripped out like we do for Fusion currently)

Removed Detecting code changes section for now

Sounds good

reubenmc · 2026-01-08T23:10:42Z

website/docs/docs/deploy/state-aware-about.md

+
+### Handling concurrent jobs
+
+If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice &mdash; once per job.


Clarify: only if something has changed though. If nothing has changes, then the second job will simply reuse model_ab

Revised here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-ad798a159c003c98c28f29456ba1d0e295b58d33c976f5ed18c07c567f822080R45

reubenmc · 2026-01-08T23:13:34Z

website/docs/docs/deploy/state-aware-about.md

+
+If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice &mdash; once per job.
+
+Under state-aware orchestration, each job independently evaluates whether a model needs rebuilding based on the model’s compiled code and upstream data state. It does not enforce a single build per model across different jobs.


I don't like this. This is really more like:

Under state-aware orchestration, all job read and write from the same shared state and build a model only when either the code or data state has changed. This means that each job individually evaulates whether a model needs rebuilding based on the model’s compiled code and upstream data state.

Could also add: If you want to prevent a job from being built too frequently even when the code or data state has changed, you can slow down any model by using the build_after config (LINK TO DOCS ON HOW TO DO THIS).

Revised here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-ad798a159c003c98c28f29456ba1d0e295b58d33c976f5ed18c07c567f822080R47

And added build_after paragraph here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-ad798a159c003c98c28f29456ba1d0e295b58d33c976f5ed18c07c567f822080R54

reubenmc · 2026-01-08T23:18:59Z

website/docs/faqs/Runs/sao-difference-core.md

+- Upstream data changes at runtime and model-level freshness settings
+- Shared state across jobs
+
+This helps avoid unnecessary rebuilds when underlying source files changed without changing the compiled logic, while still rebuilding when upstream data changes require it.


To add: While Core did these for a single run in a single job, SAO with Fusion does this in real-time across every job in the enviroment to manage state and ensure you're not building any models when things haven't changed, no matter which job a model is built in.

Added here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-d514ef78bdf493d2f966e24d724d591a1ee87ed9cc136859768a85d6bbe8597fR20

reubenmc

This looks great! Thanks @luna-bianca! Good to merge this anytime. cc: @evabgood for vis.

reubenmc · 2026-01-10T00:28:41Z

website/docs/docs/deploy/state-aware-about.md

+What happens when jobs overlap:
+
+- If both jobs reach the same model at exactly the same time, one job waits until the other finishes. This is to prevent collisions in the data warehouse when two jobs try to build the same model at the same time.
+- After the first job finishes, the second job still checks whether a rebuild for the model is needed. The job may choose to reuse the existing result or perform another build, depending on changes detected.


I might not use the language "may choose" here, as it's really more like a rule.

How about something like:
After the first job finishes building the model, the second job still checks whether a rebuild for the model is needed. If there are new data or code changes to incorporate, the model will be built, while if there are no changes and building it will produce the same result, the model will be reused.

website/docs/docs/deploy/state-aware-about.md

mirnawong1 · 2026-01-12T11:38:06Z

website/docs/docs/deploy/state-aware-setup.md

+|Parameter | Description | Allowed values | Supports Jinja |
+|----------|-------------| -------------- | -------------- |
+| `loaded_at_field` | Specifies a specific column to use from the data. | Name of timestamp column. For example, `created_at`, `"CAST(created_at AS TIMESTAMP)"`. | ✅ |
+| `loaded_at_query` | Defines a custom freshness condition in SQL to account for partial loading or streaming data. | SQL string. For example, `"select {{ current_timestamp() }}"`. | ✅ |


for loaded_at_query - does the sql string need to be wrapped in quotes? how does it support multilines?

Added multi-line example here: aca2786

Co-authored-by: Mirna Wong <[email protected]>

kbrock91 · 2026-01-12T14:26:08Z

website/docs/docs/deploy/state-aware-setup.md

+
+### Handling late-arriving data 
+
+If your incremental models use a lookback window to capture late-arriving data, make sure your freshness logic aligns with that window.


should we link to here to connect the two concepts? https://docs-getdbt-com-git-sao-doc-improvements-dbt-labs.vercel.app/best-practices/materializations/4-incremental-models#late-arriving-facts

Added a link here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-0ba6ab0a56e1a8a488289ad4eb3fc6d107fa061dd06e319a98c1769ff51ab0d4R154

kbrock91 · 2026-01-12T14:27:30Z

website/docs/docs/deploy/state-aware-setup.md

+
+When you use a `loaded_at_field` or `loaded_at_query`, state-aware orchestration uses that value to determine whether new data has arrived. When the `loaded_at` value reflects an event timestamp (for example, `event_date`), late-arriving records may not update this value if the event occurred in the past. In these cases, state-aware orchestration may not trigger a rebuild, even though your incremental model’s lookback window would normally include those rows.
+
+To ensure late-arriving data is detected by state-aware orchestration, your `loaded_at_field` or `loaded_at_query` should align with the same lookback window used in your incremental filter. See the following sample values for `loaded_at_field` and `loaded_at_query`:


should we show what the corresponding incremental filter would look like?

Added sample here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-0ba6ab0a56e1a8a488289ad4eb3fc6d107fa061dd06e319a98c1769ff51ab0d4R163-R179

kbrock91 · 2026-01-12T14:28:24Z

website/docs/docs/deploy/state-aware-setup.md

+<TabItem value="loaded_at_query" label="loaded_at_query">
+
+```yaml
+loaded_at_query: |


can we keep the same example from the incremental guide page? should it be {{this}} instead of source_table?

select max(ingested_at) from {{ this }} where ingested_at >= current_timestamp - interval '3 days'

Changed here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-0ba6ab0a56e1a8a488289ad4eb3fc6d107fa061dd06e319a98c1769ff51ab0d4R187

kbrock91 · 2026-01-12T14:29:04Z

website/docs/docs/deploy/state-aware-setup.md

+<TabItem value="loaded_at_field" label="loaded_at_field">
+
+```yaml
+loaded_at_field: ingested_at


this doesnt account for the lookback period? i think the only really valid way to handle this is with the loaded_at_query, no?

Oh, sorry to have missed that! Modified the docs to instruct users to use loaded_at_query.

SAO doc improvements

fa396d9

github-actions bot added the content Improvements or additions to content label Nov 28, 2025

vercel bot deployed to Preview November 28, 2025 13:09 View deployment

Edits

e22a2b6

vercel bot deployed to Preview November 28, 2025 13:29 View deployment

luna-bianca commented Nov 28, 2025

View reviewed changes

Update state-aware-setup.md

2e0a572

vercel bot deployed to Preview November 28, 2025 14:50 View deployment

More info

f358e31

vercel bot deployed to Preview December 4, 2025 12:31 View deployment

Merge branch 'current' into SAO-doc-improvements

740f1b9

vercel bot deployed to Preview December 4, 2025 12:53 View deployment

luna-bianca marked this pull request as ready for review December 4, 2025 17:10

luna-bianca requested a review from a team as a code owner December 4, 2025 17:10

luna-bianca requested a review from reubenmc December 4, 2025 17:11

Add late-arriving data info

53aeeaf

vercel bot deployed to Preview December 15, 2025 13:35 View deployment

Update state-aware-setup.md

ef09f5b

vercel bot deployed to Preview December 15, 2025 14:25 View deployment

Merge branch 'current' into SAO-doc-improvements

60ea3b4

vercel bot deployed to Preview December 17, 2025 15:51 View deployment

Merge branch 'current' into SAO-doc-improvements

7e356e0

vercel bot deployed to Preview December 17, 2025 17:48 View deployment

Merge branch 'current' into SAO-doc-improvements

451e1fb

vercel bot deployed to Preview December 18, 2025 11:32 View deployment

Merge branch 'current' into SAO-doc-improvements

cb640d8

vercel bot deployed to Preview January 6, 2026 16:24 View deployment

reubenmc reviewed Jan 8, 2026

View reviewed changes

Apply comments from Eva and Reuben

0c06e69

vercel bot deployed to Preview January 9, 2026 15:10 View deployment

luna-bianca and others added 2 commits January 9, 2026 23:11

Update state-aware-setup.md

aac82aa

Merge branch 'current' into SAO-doc-improvements

11dfef3

vercel bot deployed to Preview January 9, 2026 15:17 View deployment

Merge branch 'current' into SAO-doc-improvements

b20620f

vercel bot deployed to Preview January 9, 2026 15:25 View deployment

luna-bianca requested a review from evabgood January 9, 2026 15:29

Merge branch 'current' into SAO-doc-improvements

15defd7

vercel bot deployed to Preview January 9, 2026 16:40 View deployment

reubenmc approved these changes Jan 10, 2026

View reviewed changes

Apply Reuben's comment

8f60d4d

vercel bot deployed to Preview January 12, 2026 10:17 View deployment

Merge branch 'current' into SAO-doc-improvements

2bfc490

vercel bot deployed to Preview January 12, 2026 11:10 View deployment

mirnawong1 reviewed Jan 12, 2026

View reviewed changes

website/docs/docs/deploy/state-aware-about.md Outdated Show resolved Hide resolved

mirnawong1 reviewed Jan 12, 2026

View reviewed changes

website/docs/docs/deploy/state-aware-about.md Outdated Show resolved Hide resolved

mirnawong1 reviewed Jan 12, 2026

View reviewed changes

mirnawong1 approved these changes Jan 12, 2026

View reviewed changes

luna-bianca and others added 3 commits January 12, 2026 22:17

Update website/docs/docs/deploy/state-aware-about.md

a1e1190

Co-authored-by: Mirna Wong <[email protected]>

Update website/docs/docs/deploy/state-aware-about.md

4249b04

Co-authored-by: Mirna Wong <[email protected]>

Merge branch 'current' into SAO-doc-improvements

153bcc1

vercel bot deployed to Preview January 12, 2026 14:23 View deployment

kbrock91 reviewed Jan 12, 2026

View reviewed changes

Add multiline example

aca2786

vercel bot deployed to Preview January 12, 2026 14:58 View deployment

Address Katherine's comments

b6d035e

vercel bot deployed to Preview January 12, 2026 15:36 View deployment

luna-bianca merged commit 5234489 into current Jan 13, 2026
9 checks passed

luna-bianca deleted the SAO-doc-improvements branch January 13, 2026 10:46


		### Handling concurrent jobs

		If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice — once per job.


		If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice — once per job.

		Under state-aware orchestration, each job independently evaluates whether a model needs rebuilding based on the model’s compiled code and upstream data state. It does not enforce a single build per model across different jobs.


		### Handling late-arriving data

		If your incremental models use a lookback window to capture late-arriving data, make sure your freshness logic aligns with that window.


		When you use a `loaded_at_field` or `loaded_at_query`, state-aware orchestration uses that value to determine whether new data has arrived. When the `loaded_at` value reflects an event timestamp (for example, `event_date`), late-arriving records may not update this value if the event occurred in the past. In these cases, state-aware orchestration may not trigger a rebuild, even though your incremental model’s lookback window would normally include those rows.

		To ensure late-arriving data is detected by state-aware orchestration, your `loaded_at_field` or `loaded_at_query` should align with the same lookback window used in your incremental filter. See the following sample values for `loaded_at_field` and `loaded_at_query`:

SAO doc improvements #8234

SAO doc improvements #8234

Conversation

luna-bianca commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are you changing in this pull request and why?

Checklist

Uh oh!

vercel bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reubenmc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reubenmc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

luna-bianca commented Nov 28, 2025 •

edited by github-actions bot

Loading

vercel bot commented Nov 28, 2025 •

edited

Loading