-
Notifications
You must be signed in to change notification settings - Fork 566
Description
Backend
VL (Velox)
Bug description
Spark sql fragment that has 35 aggregate functions:
SELECT
supply_id,
mt_main_poi_id,
SUM(CASE WHEN partition_date = '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN trade_coupon_cnt ELSE 0 END) / 7.0 AS trade_coupon_cnt_7d_average,
SUM(CASE WHEN partition_date = '2025-08-02' THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN trade_order_cnt ELSE 0 END) / 7.0 AS trade_order_cnt_7d_average,
SUM(CASE WHEN partition_date = '2025-08-02' THEN gtv ELSE 0 END) AS gtv_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN gtv ELSE 0 END) AS gtv_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' THEN gtv ELSE 0 END) AS gtv_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' THEN gtv ELSE 0 END) AS gtv_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' THEN gtv ELSE 0 END) AS gtv_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' THEN gtv ELSE 0 END) AS gtv_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN gtv ELSE 0 END) / 7.0 AS gtv_7d_average,
SUM(CASE WHEN partition_date = '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN order_fact_pay_amt ELSE 0 END) / 7.0 AS order_fact_pay_amt_7d_average,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date = '2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_1d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN '2025-07-27' AND '2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_7d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN '2025-07-20' AND '2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_14d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN '2025-07-04' AND '2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_30d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN '2025-06-04' AND '2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_60d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN '2025-05-05' AND '2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_90d
FROM
mart_aiosupply.daocan_app_aggr_supply_poi_deal_cube_d_extnl_view
WHERE
partition_date <= '2025-08-02'
AND partition_date >= '2025-05-05'
AND dim_type = 'poi#deal#type'
AND product_type IN (1, 2)
GROUP BY
supply_id,
mt_main_poi_id
vanilla spark aggregate metrics:
HashAggregate
spill size total (min, med, max (stageId: taskId))
0.0 B (0.0 B, 0.0 B, 0.0 B (stage 6.0: task 15017))
time in aggregation build total (min, med, max (stageId: taskId))
192.60 h (6.1 m, 11.6 m, 19.6 m (stage 6.0: task 15015))
peak memory total (min, med, max (stageId: taskId))
965.3 GiB (987.5 MiB, 988.4 MiB, 988.4 MiB (stage 6.0: task 15322))
number of output rows: 3,440,719,610
avg hash probe bucket list iters (min, med, max (stageId: taskId)):
(1.6, 1.6, 1.6 (stage 6.0: task 15017))
gluten aggregate metrics that are 3.5x total time compared to vanilla:
RegularHashAggregateExecTransformer
time of extraction total (min, med, max (stageId: taskId))
49.0 m (23 ms, 205 ms, 517 ms (stage 2.0: task 966))
number of memory allocations: 614,976,915
number of output vectors: 44,714,272
number of spilled bytes total (min, med, max (stageId: taskId))
1864.2 GiB (0.0 B, 131.6 MiB, 141.5 MiB (stage 2.0: task 491))
number of output rows: 182,915,536,712
number of output bytes total (min, med, max (stageId: taskId))
101.6 TiB (1179.8 MiB, 7.2 GiB, 7.6 GiB (stage 2.0: task 491))
number of spilled files: 612,736
time of aggregation total (min, med, max (stageId: taskId))
675.75 h (5.6 s, 2.8 m, 8.3 m (stage 2.0: task 9949))
peak memory bytes total (min, med, max (stageId: taskId))
53.4 TiB (1624.6 MiB, 3.8 GiB, 4.3 GiB (stage 2.0: task 12750))
number of spilled rows: 182,905,342,963
cpu wall time count: 478,741,203
number of spilled partitions: 115,360
extraction cpu wall time count: 402,803,446
Gluten version
Gluten-1.3
Spark version
Spark-3.5.x
Spark configurations
No response
System information
No response