Skip to content

Commit 84a984f

Browse files
authored
Add optional flags argument to regex tests (#247)
* feat: add optional flags argument to regex tests * fix: disable flags tests on BigQuery * fix: downgrade regex config errors to warnings
1 parent bd7746d commit 84a984f

File tree

7 files changed

+81
-22
lines changed

7 files changed

+81
-22
lines changed

README.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -600,7 +600,10 @@ tests:
600600
### [expect_column_values_to_match_regex](macros/schema_tests/string_matching/expect_column_values_to_match_regex.sql)
601601

602602
Expect column entries to be strings that match a given regular expression. Valid matches can be found anywhere in the string, for example "[at]+" will identify the following strings as expected: "cat", "hat", "aa", "a", and "t", and the following strings as unexpected: "fish", "dog".
603-
Optionally, `is_raw` indicates the `regex` pattern is a "raw" string and should be escaped. The default is `False`.
603+
604+
Optional (keyword) arguments:
605+
- `is_raw` indicates the `regex` pattern is a "raw" string and should be escaped. The default is `False`.
606+
- `flags` is a string of one or more characters that are passed to the regex engine as flags (or parameters). Allowed flags are adapter-specific. A common flag is `i`, for case-insensitive matching. The default is no flags.
604607

605608
*Applies to:* Column
606609

@@ -610,12 +613,16 @@ tests:
610613
regex: "[at]+"
611614
row_condition: "id is not null" # (Optional)
612615
is_raw: True # (Optional)
616+
flags: i # (Optional)
613617
```
614618

615619
### [expect_column_values_to_not_match_regex](macros/schema_tests/string_matching/expect_column_values_to_not_match_regex.sql)
616620

617621
Expect column entries to be strings that do NOT match a given regular expression. The regex must not match any portion of the provided string. For example, "[at]+" would identify the following strings as expected: "fish”, "dog”, and the following as unexpected: "cat”, "hat”.
618-
Optionally, `is_raw` indicates the `regex` pattern is a "raw" string and should be escaped. The default is `False`.
622+
623+
Optional (keyword) arguments:
624+
- `is_raw` indicates the `regex` pattern is a "raw" string and should be escaped. The default is `False`.
625+
- `flags` is a string of one or more characters that are passed to the regex engine as flags (or parameters). Allowed flags are adapter-specific. A common flag is `i`, for case-insensitive matching. The default is no flags.
619626

620627
*Applies to:* Column
621628

@@ -625,12 +632,16 @@ tests:
625632
regex: "[at]+"
626633
row_condition: "id is not null" # (Optional)
627634
is_raw: True # (Optional)
635+
flags: i # (Optional)
628636
```
629637

630638
### [expect_column_values_to_match_regex_list](macros/schema_tests/string_matching/expect_column_values_to_match_regex_list.sql)
631639

632640
Expect the column entries to be strings that can be matched to either any of or all of a list of regular expressions. Matches can be anywhere in the string.
633-
Optionally, `is_raw` indicates the `regex` patterns are "raw" strings and should be escaped. The default is `False`.
641+
642+
Optional (keyword) arguments:
643+
- `is_raw` indicates the `regex` pattern is a "raw" string and should be escaped. The default is `False`.
644+
- `flags` is a string of one or more characters that are passed to the regex engine as flags (or parameters). Allowed flags are adapter-specific. A common flag is `i`, for case-insensitive matching. The default is no flags.
634645

635646
*Applies to:* Column
636647

@@ -641,12 +652,16 @@ tests:
641652
match_on: any # (Optional. Default is 'any', which applies an 'OR' for each regex. If 'all', it applies an 'AND' for each regex.)
642653
row_condition: "id is not null" # (Optional)
643654
is_raw: True # (Optional)
655+
flags: i # (Optional)
644656
```
645657

646658
### [expect_column_values_to_not_match_regex_list](macros/schema_tests/string_matching/expect_column_values_to_not_match_regex_list.sql)
647659

648660
Expect the column entries to be strings that do not match any of a list of regular expressions. Matches can be anywhere in the string.
649-
Optionally, `is_raw` indicates the `regex` patterns are "raw" strings and should be escaped. The default is `False`.
661+
662+
Optional (keyword) arguments:
663+
- `is_raw` indicates the `regex` pattern is a "raw" string and should be escaped. The default is `False`.
664+
- `flags` is a string of one or more characters that are passed to the regex engine as flags (or parameters). Allowed flags are adapter-specific. A common flag is `i`, for case-insensitive matching. The default is no flags.
650665

651666
*Applies to:* Column
652667

@@ -657,6 +672,7 @@ tests:
657672
match_on: any # (Optional. Default is 'any', which applies an 'OR' for each regex. If 'all', it applies an 'AND' for each regex.)
658673
row_condition: "id is not null" # (Optional)
659674
is_raw: True # (Optional)
675+
flags: i # (Optional)
660676
```
661677

662678
### [expect_column_values_to_match_like_pattern](macros/schema_tests/string_matching/expect_column_values_to_match_like_pattern.sql)

integration_tests/models/schema_tests/schema.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,26 @@ models:
66
tests:
77
- dbt_expectations.expect_column_values_to_match_regex:
88
regex: "@[^.]*"
9+
- dbt_expectations.expect_column_values_to_match_regex:
10+
regex: "[A-Z]"
11+
flags: i
12+
config:
13+
enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift' ] }}"
914
- dbt_expectations.expect_column_values_to_not_match_regex:
1015
regex: "&[^.]*"
16+
- dbt_expectations.expect_column_values_to_not_match_regex:
17+
regex: "[A-Z]"
1118
- dbt_expectations.expect_column_values_to_match_regex_list:
1219
regex_list: ["@[^.]*", "&[^.]*"]
20+
- dbt_expectations.expect_column_values_to_match_regex_list:
21+
regex_list: ["[A-G]", "[H-Z]"]
22+
flags: i
23+
config:
24+
enabled: "{{ target.type in ['postgres', 'snowflake', 'redshift' ] }}"
1325
- dbt_expectations.expect_column_values_to_not_match_regex_list:
1426
regex_list: ["@[^.]*", "&[^.]*"]
27+
- dbt_expectations.expect_column_values_to_not_match_regex_list:
28+
regex_list: ["[A-G]", "[H-Z]"]
1529
- dbt_expectations.expect_column_values_to_match_like_pattern:
1630
like_pattern: "%@%"
1731
- dbt_expectations.expect_column_values_to_not_match_like_pattern:

macros/regex/regexp_instr.sql

Lines changed: 35 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,58 @@
1-
{% macro regexp_instr(source_value, regexp, position=1, occurrence=1, is_raw=False) %}
1+
{% macro regexp_instr(source_value, regexp, position=1, occurrence=1, is_raw=False, flags="") %}
22

33
{{ adapter.dispatch('regexp_instr', 'dbt_expectations')(
4-
source_value, regexp, position, occurrence, is_raw
4+
source_value, regexp, position, occurrence, is_raw, flags
55
) }}
66

77
{% endmacro %}
88

9-
{% macro default__regexp_instr(source_value, regexp, position, occurrence, is_raw) %}
9+
{% macro default__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
10+
{# unclear if other databases support raw strings or flags #}
11+
{% if is_raw or flags %}
12+
{{ exceptions.warn(
13+
"is_raw and flags options are not supported for this adapter "
14+
~ "and are being ignored."
15+
) }}
16+
{% endif %}
1017
regexp_instr({{ source_value }}, '{{ regexp }}', {{ position }}, {{ occurrence }})
1118
{% endmacro %}
1219

1320
{# Snowflake uses $$...$$ to escape raw strings #}
14-
{% macro snowflake__regexp_instr(source_value, regexp, position, occurrence, is_raw) %}
21+
{% macro snowflake__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
1522
{%- set regexp = "$$" ~ regexp ~ "$$" if is_raw else "'" ~ regexp ~ "'" -%}
16-
regexp_instr({{ source_value }}, {{ regexp }}, {{ position }}, {{ occurrence }})
23+
{% if flags %}{{ dbt_expectations._validate_flags(flags, 'cimes') }}{% endif %}
24+
regexp_instr({{ source_value }}, {{ regexp }}, {{ position }}, {{ occurrence }}, 0, '{{ flags }}')
1725
{% endmacro %}
1826

1927
{# BigQuery uses "r" to escape raw strings #}
20-
{% macro bigquery__regexp_instr(source_value, regexp, position, occurrence, is_raw) %}
28+
{% macro bigquery__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
29+
{% if flags %}
30+
{{ exceptions.warn(
31+
"The flags option is not supported for BigQuery and is being ignored."
32+
) }}
33+
{% endif %}
2134
{%- set regexp = "r'" ~ regexp ~ "'" if is_raw else "'" ~ regexp ~ "'" -%}
2235
regexp_instr({{ source_value }}, {{ regexp }}, {{ position }}, {{ occurrence }})
2336
{% endmacro %}
2437

2538
{# Postgres does not need to escape raw strings #}
26-
{% macro postgres__regexp_instr(source_value, regexp, position, occurrence, is_raw) %}
27-
array_length((select regexp_matches({{ source_value }}, '{{ regexp }}')), 1)
39+
{% macro postgres__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
40+
{% if flags %}{{ dbt_expectations._validate_flags(flags, 'bcegimnpqstwx') }}{% endif %}
41+
array_length((select regexp_matches({{ source_value }}, '{{ regexp }}', '{{ flags }}')), 1)
2842
{% endmacro %}
2943

3044
{# Unclear what Redshift does to escape raw strings #}
31-
{% macro redshift__regexp_instr(source_value, regexp, position, occurrence, is_raw) %}
32-
regexp_instr({{ source_value }}, '{{ regexp }}', {{ position }}, {{ occurrence }})
45+
{% macro redshift__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
46+
{% if flags %}{{ dbt_expectations._validate_flags(flags, 'ciep') }}{% endif %}
47+
regexp_instr({{ source_value }}, '{{ regexp }}', {{ position }}, {{ occurrence }}, 0, '{{ flags }}')
3348
{% endmacro %}
49+
50+
{% macro _validate_flags(flags, alphabet) %}
51+
{% for flag in flags %}
52+
{% if flag not in alphabet %}
53+
{{ exceptions.raise_compiler_error(
54+
"flag " ~ flag ~ " not in list of allowed flags for this adapter: " ~ alphabet | join(", ")
55+
) }}
56+
{% endif %}
57+
{% endfor %}
58+
{% endmacro %}

macros/schema_tests/string_matching/expect_column_values_to_match_regex.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
{% test expect_column_values_to_match_regex(model, column_name,
22
regex,
33
row_condition=None,
4-
is_raw=False
4+
is_raw=False,
5+
flags=""
56
) %}
67

78
{% set expression %}
8-
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw) }} > 0
9+
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} > 0
910
{% endset %}
1011

1112
{{ dbt_expectations.expression_is_true(model,

macros/schema_tests/string_matching/expect_column_values_to_match_regex_list.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22
regex_list,
33
match_on="any",
44
row_condition=None,
5-
is_raw=False
5+
is_raw=False,
6+
flags=""
67
) %}
78

89
{% set expression %}
910
{% for regex in regex_list %}
10-
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw) }} > 0
11+
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} > 0
1112
{%- if not loop.last %}
1213
{{ " and " if match_on == "all" else " or "}}
1314
{% endif -%}

macros/schema_tests/string_matching/expect_column_values_to_not_match_regex.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
{% test expect_column_values_to_not_match_regex(model, column_name,
22
regex,
33
row_condition=None,
4-
is_raw=False
4+
is_raw=False,
5+
flags=""
56
) %}
67

78
{% set expression %}
8-
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw) }} = 0
9+
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} = 0
910
{% endset %}
1011

1112
{{ dbt_expectations.expression_is_true(model,

macros/schema_tests/string_matching/expect_column_values_to_not_match_regex_list.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22
regex_list,
33
match_on="any",
44
row_condition=None,
5-
is_raw=False
5+
is_raw=False,
6+
flags=""
67
) %}
78

89
{% set expression %}
910
{% for regex in regex_list %}
10-
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw) }} = 0
11+
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} = 0
1112
{%- if not loop.last %}
1213
{{ " and " if match_on == "all" else " or "}}
1314
{% endif -%}

0 commit comments

Comments
 (0)