Skip to content

Commit 3ead998

Browse files
committed
Fix: Prevent excessive sampling on QEs by restricting ComputeExtStatisticsRows to QD
In `do_analyze_rel`, the function `ComputeExtStatisticsRows` calculates the minimum number of sample rows needed for extended statistics (e.g., dependencies, ndistinct). This calculation is only meaningful and required on the Query Dispatcher (QD), since only the QD is responsible for coordinating the final extended statistics generation. Previously, all segments (including QEs) executed this logic, resulting in excessive sampling. For large tables, this caused the QD to receive more rows than it can handle, leading to the error: ERROR: too many sample rows received from gp_acquire_sample_rows
1 parent 39ef2bd commit 3ead998

File tree

4 files changed

+42
-2
lines changed

4 files changed

+42
-2
lines changed

src/backend/commands/analyze.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -717,8 +717,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
717717
* statistics target. So we may need to sample more rows and then build
718718
* the statistics with enough detail.
719719
*/
720-
minrows = ComputeExtStatisticsRows(onerel, attr_cnt, vacattrstats);
721-
720+
if (IS_QD_OR_SINGLENODE())
721+
minrows = ComputeExtStatisticsRows(onerel, attr_cnt, vacattrstats);
722+
else
723+
minrows = 0;
722724

723725
if (targrows < minrows)
724726
targrows = minrows;

src/test/regress/expected/stats_ext.out

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3240,3 +3240,16 @@ NOTICE: drop cascades to 2 other objects
32403240
DETAIL: drop cascades to table tststats.priv_test_tbl
32413241
drop cascades to view tststats.priv_test_view
32423242
DROP USER regress_stats_user1;
3243+
-- test analyze with extended statistics
3244+
CREATE TABLE tbl_issue1293 (col1 int, col2 int);
3245+
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'col1' as the Apache Cloudberry data distribution key for this table.
3246+
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
3247+
INSERT INTO tbl_issue1293
3248+
SELECT i / 10000, i / 100000
3249+
FROM generate_series(1, 1000000) s(i);
3250+
ANALYZE tbl_issue1293;
3251+
-- Create extended statistics on col1, col2
3252+
CREATE STATISTICS s1 (dependencies) ON col1, col2 FROM tbl_issue1293;
3253+
-- Trigger extended stats collection
3254+
ANALYZE tbl_issue1293;
3255+
DROP TABLE tbl_issue1293;

src/test/regress/expected/stats_ext_optimizer.out

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3275,3 +3275,16 @@ NOTICE: drop cascades to 2 other objects
32753275
DETAIL: drop cascades to table tststats.priv_test_tbl
32763276
drop cascades to view tststats.priv_test_view
32773277
DROP USER regress_stats_user1;
3278+
-- test analyze with extended statistics
3279+
CREATE TABLE tbl_issue1293 (col1 int, col2 int);
3280+
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'col1' as the Apache Cloudberry data distribution key for this table.
3281+
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
3282+
INSERT INTO tbl_issue1293
3283+
SELECT i / 10000, i / 100000
3284+
FROM generate_series(1, 1000000) s(i);
3285+
ANALYZE tbl_issue1293;
3286+
-- Create extended statistics on col1, col2
3287+
CREATE STATISTICS s1 (dependencies) ON col1, col2 FROM tbl_issue1293;
3288+
-- Trigger extended stats collection
3289+
ANALYZE tbl_issue1293;
3290+
DROP TABLE tbl_issue1293;

src/test/regress/sql/stats_ext.sql

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1651,3 +1651,15 @@ DROP FUNCTION op_leak(int, int);
16511651
RESET SESSION AUTHORIZATION;
16521652
DROP SCHEMA tststats CASCADE;
16531653
DROP USER regress_stats_user1;
1654+
1655+
-- test analyze with extended statistics
1656+
CREATE TABLE tbl_issue1293 (col1 int, col2 int);
1657+
INSERT INTO tbl_issue1293
1658+
SELECT i / 10000, i / 100000
1659+
FROM generate_series(1, 1000000) s(i);
1660+
ANALYZE tbl_issue1293;
1661+
-- Create extended statistics on col1, col2
1662+
CREATE STATISTICS s1 (dependencies) ON col1, col2 FROM tbl_issue1293;
1663+
-- Trigger extended stats collection
1664+
ANALYZE tbl_issue1293;
1665+
DROP TABLE tbl_issue1293;

0 commit comments

Comments
 (0)