You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Value Diff generates SQL queries using Jinja templates to compare data between your base and current environments. The queries perform a FULL OUTER JOIN on primary keys to identify added, removed, and mismatched records.
137
+
138
+
You can review the exact SQL templates in the [ValueDiffTask class](https://github.com/DataRecce/recce/blob/main/recce/tasks/valuediff.py#L80).
139
+
133
140
### Profile Diff
134
141
135
142
Profile Diff compares the basic statistic (e.g. count, distinct count, min, max, average) for each column in models between two environments.
Profile Diff generates SQL queries using Jinja templates to calculate statistical measures for each column in your models. The queries analyze data distribution, null values, uniqueness, and numerical statistics.
151
+
152
+
You can review the exact [SQL templates](https://github.com/DataRecce/recce/blob/main/recce/tasks/profile.py#L14).
@@ -186,6 +199,12 @@ A Histogram Diff can be generated in two ways.
186
199
<figcaption>Generate a Recce Histogram Diff from the column options</figcaption>
187
200
</figure>
188
201
202
+
#### SQL Execution
203
+
204
+
Histogram Diff generates SQL queries to create distribution histograms for numeric and date columns. The queries use binning strategies to group values and count occurrences in each bin, supporting both integer and floating-point data types.
205
+
206
+
You can review the exact SQL generation functions in the [HistogramDiffTask class](https://github.com/DataRecce/recce/blob/main/recce/tasks/histogram.py#L160).
207
+
189
208
### Top-K Diff
190
209
191
210
Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.
@@ -217,6 +236,12 @@ A Top-K Diff can be generated in two ways.
217
236
<figcaption>Generate a Recce Top-K Diff </figcaption>
218
237
</figure>
219
238
239
+
#### SQL Execution
240
+
241
+
Top-K Diff generates SQL queries using FULL OUTER JOIN to compare the most frequent values in categorical columns between environments. The queries group by column values and count occurrences to identify the top K categories.
242
+
243
+
You can review the exact SQL templates in the [TopKDiffTask class](https://github.com/DataRecce/recce/blob/main/recce/tasks/top_k.py#L15).
244
+
220
245
## Multi-Node Selection
221
246
222
247
Multiple nodes can be selected in the Lineage DAG. This enables actions to be performed on multiple nodes at the same time such as Row Count Diff, or Value Diff.
0 commit comments