Skip to content

Elementary test tries to cast metric_value to numeric(28,6) which fails on our larger datatypes #1778

@jan-benisek

Description

@jan-benisek

Describe the bug
When running Elementary tests on a model

dbt test -s foobar --vars "{'elementary_enabled': true}" -t prod

We get an error:

Numeric data overflow (scale float to decimal)

To Reproduce
When running the dbt command above, I tried to localise which queries cause it.
I saw in dbt.log in this query:

[0m09:56:43.769788 [debug] [Thread-2 (]: On test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6: /* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "bigcorp", "target_name": "prod", "node_id": "test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6"} */

    with anomaly_scores as (
        select
            id,
...

when column_name is not null then 
    'In column ' || column_name || ', the last ' || metric_name || ' value is ' || cast(round(cast(metric_value as numeric(28,6)), 3) as varchar(4096)) ||
    '. The average for this metric is ' || cast(round(cast(training_avg as numeric(28,6)), 3) as varchar(4096)) || '.'

        else null
    end as anomaly_description
...

This is the problematic part: cast(round(cast(metric_value as numeric(28,6)), 3).

I saw that Elementary runs some inserts and in one of the subqueries, it runs variance(cast("custom_int" as float)).

 insert into "prod"."elementary"."test_656a1159e6_elementary_all_columns_anomalies_foobar___metrics__tmp_20250110085236933131"
        with tmp_table as (

...

column_monitors as (select
                        edr_bucket_start as bucket_start,
                        edr_bucket_end as bucket_end,

    datediff(
        hour,
        edr_bucket_start,
        edr_bucket_end
        )
 as bucket_duration_hours,
                    cast('custom_int' as varchar(4096)) as edr_column_name,

    coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
 as null_count,

      round(cast(cast(
    coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
 as float) / nullif(cast(count(*) as float), 0) * 100.0 as numeric(28,6)), 3)

 as null_percent,null  as not_null_percent,max(cast("custom_int" as float)) as max,min(cast("custom_int" as float)) as min,avg(cast("custom_int" as float)) as average,
    coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int" as float) = 0 then 1 else 0 end), 0)
 as zero_count,

      round(cast(cast(
    coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int" as float) = 0 then 1 else 0 end), 0)
 as float) / nullif(cast(count(*) as float), 0) * 100.0 as numeric(28,6)), 3)

 as zero_percent,null  as not_zero_percent,stddev(cast("custom_int" as float)) as standard_deviation,variance(cast("custom_int" as float)) as variance,null  as max_length,null  as min_length,null  as average_length,null  as missing_count,null  as missing_percent,null  as count_true,null  as count_false,null  as not_missing_percent,null  as sum
                from filtered_monitored_table
                    left join buckets on (edr_bucket_start = start_bucket_in_data)

                    group by 1,2,3,4

That variance, in our case, yields very a large number:

select variance(cast("custom_int" as float)) from production.foobart where ts >= '2024-12-31' and ts <= '2025-01-10';
-- 3636013502843636000000

Which cannot be cast to numeric(28,6)) later.

Expected behavior
It should work with larger data types.

Screenshots
N/A

Environment (please complete the following information):

  • Elementary CLI (edr) version: 0.15.1
  • Elementary dbt package version: 0.16.0
  • dbt version you're using: 1.8.6
  • Data warehouse: Redshift
  • Infrastructure details: NA

Additional context
NA

Would you be willing to contribute a fix for this issue?
Unlikely, unless it is easy?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions