Skip to content

Commit 0d60f9a

Browse files
equality test can now compare columns with different names
The compare_columns now optionally accepts a list of two column names. This allows the column names to be different between base model and other model when comparing two columns. The body of the macro is also refactored to make it more DRY, and reduce convoluting the different steps and capabilities. 1. First, we gather a set of numeric column names that we need to round. This is referenced later. 2. Next, we build two lists of column names that we want to compare: one list per model. If compare_columns was given, we build the lists from that. Otherwise, read column names from the first model, filter the excluded column names, and use the result for both models, as we did before this commit. 3. Now that we have lists of column names, we can build comma-separated lists for each model. At this point, we do the number rounding expression if the column name is found in the set from step 1. Note this refactoring also cleans up some weird inconsistencies that resulted from duplication of logic. For example, suppose a model's column name was uppercase, and your database is case-sensitive. If you did not specify a "precision" argument, then you need to provide an upper-case "compare_columns" argument. However, if you _did_ specify a "precision" argument, then the "compare-columns" argument was _NOT_ case-sensitive. It's pretty unexpected to have the case sensitivity of one argument be dependent on a seemingly-unrelated argument.
1 parent 2285509 commit 0d60f9a

File tree

5 files changed

+123
-49
lines changed

5 files changed

+123
-49
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,18 @@ models:
145145
compare_model: ref('other_table_name')
146146
exclude_columns:
147147
- third_column
148+
149+
# if the columns to be compared have different names, you can match them up like this
150+
- name: model_name_different_names
151+
tests:
152+
- dbt_utils.equality:
153+
compare_model: ref('other_table_name')
154+
compare_columns:
155+
- first_column
156+
# This will compare `model_name_different_names.second_column_in_model`
157+
# and `other_table_name.second_column_in_other_table`
158+
- [second_column_in_model, second_column_in_other_table]
159+
precision: 4
148160
```
149161
150162
### expression_is_true ([source](macros/generic_tests/expression_is_true.sql))
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
col_a,tbl_a_col_b,tbl_a_col_float
2+
1,1,1.100005
3+
1,2,1.200005
4+
2,3,1.300005
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
tbl_b_col_b,col_a,tbl_b_col_float
2+
1,1,1.100006
3+
2,1,1.200007
4+
3,2,1.300008

integration_tests/models/generic_tests/schema.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,28 @@ seeds:
164164
exclude_columns:
165165
- col_c
166166

167+
- name: data_test_equality_different_column_names_a
168+
data_tests:
169+
- dbt_utils.equality:
170+
compare_model: ref('data_test_equality_different_column_names_b')
171+
compare_columns:
172+
- col_a
173+
- [tbl_a_col_b, tbl_b_col_b]
174+
- dbt_utils.equality:
175+
compare_model: ref('data_test_equality_different_column_names_b')
176+
compare_columns:
177+
- col_a
178+
- [tbl_a_col_float, tbl_b_col_float]
179+
precision: 4
180+
- dbt_utils.equality:
181+
compare_model: ref('data_test_equality_different_column_names_b')
182+
compare_columns:
183+
- col_a
184+
- [tbl_a_col_float, tbl_b_col_float]
185+
precision: 8
186+
error_if: "<1" #sneaky way to ensure that the test is returning failing rows
187+
warn_if: "<0"
188+
167189
- name: data_test_equality_floats_a
168190
data_tests:
169191
# test precision only

macros/generic_tests/equality.sql

Lines changed: 81 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -63,42 +63,15 @@
6363
{%- endif -%}
6464

6565
{% if compare_columns_set != compare_model_columns_set %}
66-
{{ exceptions.raise_compiler_error(compare_model ~" has less columns than " ~ model ~ ", please ensure they have the same columns or use the `compare_columns` or `exclude_columns` arguments to subset them.") }}
66+
{{ exceptions.raise_compiler_error(compare_model ~" has different columns than " ~ model ~ ", please ensure they have the same columns or use the `compare_columns` or `exclude_columns` arguments to subset them.") }}
6767
{% endif %}
6868

6969

7070
{% endif %}
7171

72-
{%- if not precision -%}
73-
{%- if not compare_columns -%}
74-
{#
75-
You cannot get the columns in an ephemeral model (due to not existing in the information schema),
76-
so if the user does not provide an explicit list of columns we must error in the case it is ephemeral
77-
#}
78-
{%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
79-
{%- set compare_columns = adapter.get_columns_in_relation(model)-%}
80-
81-
{%- if exclude_columns -%}
82-
{#-- Lower case ignore columns for easier comparison --#}
83-
{%- set exclude_columns = exclude_columns | map("lower") | list %}
84-
85-
{# Filter out the excluded columns #}
86-
{%- set include_columns = [] %}
87-
{%- for column in compare_columns -%}
88-
{%- if column.name | lower not in exclude_columns -%}
89-
{% do include_columns.append(column) %}
90-
{%- endif %}
91-
{%- endfor %}
92-
93-
{%- set compare_columns = include_columns | map(attribute='quoted') %}
94-
{%- else -%} {# Compare columns provided #}
95-
{%- set compare_columns = compare_columns | map(attribute='quoted') %}
96-
{%- endif -%}
97-
{%- endif -%}
98-
99-
{% set compare_cols_csv = compare_columns | join(', ') %}
100-
101-
{% else %} {# Precision required #}
72+
{# If testing with precision, then find out which columns in the main input model are numeric #}
73+
{%- set numeric_columns = {} -%}
74+
{%- if precision -%}
10275
{#-
10376
If rounding is required, we need to get the types, so it cannot be ephemeral even if they provide column names
10477
-#}
@@ -107,23 +80,82 @@
10780

10881
{% set columns_list = [] %}
10982
{%- for col in columns -%}
110-
{%- if (
111-
(col.name|lower in compare_columns|map('lower') or not compare_columns) and
112-
(col.name|lower not in exclude_columns|map('lower') or not exclude_columns)
113-
) -%}
114-
{# Databricks double type is not picked up by any number type checks in dbt #}
115-
{%- if col.is_float() or col.is_numeric() or col.data_type == 'double' -%}
116-
{# Cast is required due to postgres not having round for a double precision number #}
117-
{%- do columns_list.append('round(cast(' ~ col.quoted ~ ' as ' ~ dbt.type_numeric() ~ '),' ~ precision ~ ') as ' ~ col.quoted) -%}
118-
{%- else -%} {# Non-numeric type #}
119-
{%- do columns_list.append(col.quoted) -%}
120-
{%- endif -%}
121-
{% endif %}
83+
{# Databricks double type is not picked up by any number type checks in dbt #}
84+
{%- if col.is_float() or col.is_numeric() or col.data_type == 'double' -%}
85+
{#- Lower case the column name for easier case-insensitive comparison -#}
86+
{%- do numeric_columns.update({col.name|lower: true}) -%}
87+
{#- Also include the quoted version, since we may see it as well. -#}
88+
{%- do numeric_columns.update({col.quoted|lower: true}) -%}
89+
{%- endif -%}
12290
{%- endfor -%}
91+
{%- endif -%}
12392

124-
{% set compare_cols_csv = columns_list | join(', ') %}
93+
{# If compare_columns is provided, sort any given arrays into lists of columns for each model #}
94+
{%- if compare_columns -%}
95+
{%- set compare_columns__model = [] %}
96+
{%- set compare_columns__compare_model = [] %}
97+
98+
{%- for column in compare_columns -%}
99+
{%- if column is string -%}
100+
{# A simple string was given. Assume the same column name in both models. #}
101+
{%- do compare_columns__model.append(column) -%}
102+
{%- do compare_columns__compare_model.append(column) -%}
103+
{%- elif column is iterable and column | length == 2 -%}
104+
{%- do compare_columns__model.append(column[0]) -%}
105+
{%- do compare_columns__compare_model.append(column[1]) -%}
106+
{%- else -%}
107+
{{ exceptions.raise_compiler_error("compare_columns must be a string or a list of 2 strings") }}
108+
{%- endif -%}
109+
{%- endfor -%}
110+
{%- else -%}
111+
{#
112+
You cannot get the columns in an ephemeral model (due to not existing in the information schema),
113+
so if the user does not provide an explicit list of columns we must error in the case it is ephemeral
114+
#}
115+
{%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
116+
{%- set model_columns = adapter.get_columns_in_relation(model)-%}
125117

126-
{% endif %}
118+
{%- if exclude_columns -%}
119+
{#-- Lower case ignore columns for easier comparison --#}
120+
{%- set exclude_columns = exclude_columns | map("lower") | list %}
121+
{%- endif -%}
122+
123+
{# Filter out the excluded columns #}
124+
{%- set include_columns = [] %}
125+
{%- for column in model_columns -%}
126+
{%- if (not exclude_columns) or (column.name | lower not in exclude_columns) -%}
127+
{% do include_columns.append(column) %}
128+
{%- endif %}
129+
{%- endfor %}
130+
131+
{# Assume same column names in the comparison model, since no alternates were given using compare_columns. #}
132+
{%- set compare_columns__model = include_columns | map(attribute='quoted') | list %}
133+
{%- set compare_columns__compare_model = compare_columns__model %}
134+
{%- endif -%}
135+
136+
{# Build comma-delimited lists of column names for each input model. Round numeric types as needed. #}
137+
{%- set compare_columns_csv = [] -%}
138+
{%- set numeric_column_indexes_in_first_model = [] -%}
139+
{%- for this_model_compare_columns in [compare_columns__model, compare_columns__compare_model] -%}
140+
{%- set columns_list = [] %}
141+
{%- set is_first_model = loop.first -%}
142+
143+
{%- for this_compare_column in this_model_compare_columns -%}
144+
{# NOTE: We assume any numeric columns in the first model are also numeric in the second model #}
145+
{%- if (is_first_model and this_compare_column|lower in numeric_columns) or (loop.index0 in numeric_column_indexes_in_first_model) -%}
146+
{# Cast is required due to postgres not having round for a double precision number #}
147+
{%- do columns_list.append('round(cast(' ~ this_compare_column ~ ' as ' ~ dbt.type_numeric() ~ '),' ~ precision ~ ') as ' ~ this_compare_column) -%}
148+
149+
{%- if is_first_model -%}
150+
{%- do numeric_column_indexes_in_first_model.append(loop.index0) -%}
151+
{%- endif -%}
152+
{%- else -%} {# Non-numeric type #}
153+
{%- do columns_list.append(this_compare_column) -%}
154+
{%- endif -%}
155+
{%- endfor -%}
156+
157+
{%- do compare_columns_csv.append(columns_list | join(', ')) -%}
158+
{%- endfor -%}
127159

128160
with a as (
129161

@@ -139,17 +171,17 @@ b as (
139171

140172
a_minus_b as (
141173

142-
select {{compare_cols_csv}} from a
174+
select {{compare_columns_csv[0]}} from a
143175
{{ dbt.except() }}
144-
select {{compare_cols_csv}} from b
176+
select {{compare_columns_csv[1]}} from b
145177

146178
),
147179

148180
b_minus_a as (
149181

150-
select {{compare_cols_csv}} from b
182+
select {{compare_columns_csv[1]}} from b
151183
{{ dbt.except() }}
152-
select {{compare_cols_csv}} from a
184+
select {{compare_columns_csv[0]}} from a
153185

154186
),
155187

0 commit comments

Comments
 (0)