Skip to content

Commit 57f20a8

Browse files
authored
Merge pull request #19 from DataRecce/feature/drc-1392-explain-4-special-diffs-and-link-to-source-code
Add SQL execution explanations for special diff types
2 parents cc885fc + 1b6d84f commit 57f20a8

File tree

1 file changed

+26
-1
lines changed

1 file changed

+26
-1
lines changed

docs/features/lineage.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,8 @@ The node details panel shows information about a node, such as node type, schema
7575
Schema Diff shows added, removed, and renamed columns. Click a model in the Lineage Diff to open the node details and view the Schema Diff.
7676

7777
!!! Note
78-
Schema Diff requires `catalog.json` in both environments.
78+
79+
Schema Diff requires `catalog.json` in both environments.
7980

8081
<figure markdown>
8182
![Recce Schema Diff](../assets/images/features/schema-diff.gif){: .shadow}
@@ -130,6 +131,12 @@ View mismatched values at the row level by clicking the `show mismatched values`
130131

131132
![](../assets/images/features/value-diff-detail.gif){: .shadow}
132133

134+
#### SQL Execution
135+
136+
Value Diff generates SQL queries using Jinja templates to compare data between your base and current environments. The queries perform a FULL OUTER JOIN on primary keys to identify added, removed, and mismatched records.
137+
138+
You can review the exact SQL templates in the [ValueDiffTask class](https://github.com/DataRecce/recce/blob/main/recce/tasks/valuediff.py#L80).
139+
133140
### Profile Diff
134141

135142
Profile Diff compares the basic statistic (e.g. count, distinct count, min, max, average) for each column in models between two environments.
@@ -138,6 +145,12 @@ Profile Diff compares the basic statistic (e.g. count, distinct count, min, max,
138145
2. Click the `Expore Change` button.
139146
3. Click `Profile Diff`.
140147

148+
#### SQL Execution
149+
150+
Profile Diff generates SQL queries using Jinja templates to calculate statistical measures for each column in your models. The queries analyze data distribution, null values, uniqueness, and numerical statistics.
151+
152+
You can review the exact [SQL templates](https://github.com/DataRecce/recce/blob/main/recce/tasks/profile.py#L14).
153+
141154
<figure markdown>
142155
![Recce Profile Diff](../assets/images/features/profile-diff.png)
143156
<figcaption>Profile Diff</figcaption>
@@ -186,6 +199,12 @@ A Histogram Diff can be generated in two ways.
186199
<figcaption>Generate a Recce Histogram Diff from the column options</figcaption>
187200
</figure>
188201

202+
#### SQL Execution
203+
204+
Histogram Diff generates SQL queries to create distribution histograms for numeric and date columns. The queries use binning strategies to group values and count occurrences in each bin, supporting both integer and floating-point data types.
205+
206+
You can review the exact SQL generation functions in the [HistogramDiffTask class](https://github.com/DataRecce/recce/blob/main/recce/tasks/histogram.py#L160).
207+
189208
### Top-K Diff
190209

191210
Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.
@@ -217,6 +236,12 @@ A Top-K Diff can be generated in two ways.
217236
<figcaption>Generate a Recce Top-K Diff </figcaption>
218237
</figure>
219238

239+
#### SQL Execution
240+
241+
Top-K Diff generates SQL queries using FULL OUTER JOIN to compare the most frequent values in categorical columns between environments. The queries group by column values and count occurrences to identify the top K categories.
242+
243+
You can review the exact SQL templates in the [TopKDiffTask class](https://github.com/DataRecce/recce/blob/main/recce/tasks/top_k.py#L15).
244+
220245
## Multi-Node Selection
221246

222247
Multiple nodes can be selected in the Lineage DAG. This enables actions to be performed on multiple nodes at the same time such as Row Count Diff, or Value Diff.

0 commit comments

Comments
 (0)