-
Notifications
You must be signed in to change notification settings - Fork 67
Description
What type of enhancement is this?
API improvement
What subsystems and features will be improved?
Query executor
What does the enhancement do?
Currently, a StatsSummary1D object includes information to make it easy to compute sum, average and count operations (among others).
For example:
postgres=# SELECT m1 FROM test;
m1
----
1
2
3
(3 rows)
postgres=# WITH stats AS (
SELECT
stats_agg(m1) AS stats_m1
FROM test
)
SELECT
average(rollup(stats_m1)) AS avg_m1,
sum(rollup(stats_m1)) AS sum_m1,
num_vals(rollup(stats_m1)) AS count_m1
FROM stats;
avg_m1 | sum_m1 | count_m1
--------+--------+----------
2 | 6 | 3
(1 row)
Would it make sense to include min/max values in StatsSummary1D?
These would be exposed by accessor functions (e.g. max(StatsSummary1D) and min(StatsSummary1D)), allowing users to leverage a single call to stats_agg to compute all the typical aggregation operations: sum, count, avg, min and max.
Thanks,
Diego
Implementation challenges
From what I can see, a StatsSummary1D has a version field, along with n, sx, sx2, sx3 and sx4 that store count, sum, variance/skewness (?) and avg:
postgres=# WITH stats AS (
SELECT
stats_agg(m1) AS stats_m1
FROM test
)
SELECT * FROM stats;
stats_m1
----------------------------------------
(version:1,n:3,sx:6,sx2:2,sx3:0,sx4:2)
(1 row)
Perhaps, adding two more fields sx5 and sx6 to compute the min/max values, respectively? It seems changes are relative to this file: stats1d.rs.
In addition, version field could be used to display a NULL or NaN for version=1 and the actual values if version=2 (for backwards-compatibility).