Commit 23aef44
fast-analyze: implement fast ANALYZE for append-optimized tables
Prior to this patch, GPDB ANALYZE on large AO/CO tables is a time-consuming process.
This is because PostgreSQL's two-stage sampling method didn't work well on AO/CO tables.
GPDB had to unpack all varblocks till to the target tuples, which could easily result in
almost full table scanning if sampling tuples fall around the end of the table.
Denis Smirnov <sd@picodata.io> 's PR greenplum-db#11190 introduced
a `logical` block concept containing fixed number of tuples to support PG's two-stage sampling
mechanism, also it sped up fetching target tuples by skipping uncompression of varblock content.
Thanks for Denis Smirnov's great contribution!
Also, thanks for Ashwin Agrawal <aashwin@vmware.com> 's advice on leveraging AO Block Directory
to locate the target sample row without scanning unnecessary varblocks, which brings another
significant performance improvement with caching warmed up.
In addition,
- GPDB has AO/CO specific feature that storing total tuple count in an auxiliary table
which could be easily obtained without too much overhead.
- GPDB has `fetch` facilities support finding varblock based on AOTupleId without
uncompressing unnecessary varblocks.
Based on above works and properties, we re-implemented AO/CO ANALYZE sampling by combining
Knuth's Algorithm S and varblock skipping in this patch, to address the time-consuming problem.
We didn't impelment two-stage sampling for AO/CO as the total size of data set (total tuple count)
could be known in advance hence Algorithm S is sufficient to satisfy the sampling requirement.
Special thanks Zhenghua Lyu (https://kainwen.com/) for detail analysis of Algorithm S:
[Analysis of Algorithm S](https://kainwen.com/2022/11/06/analysis-of-algorithm-s) and
follow up [discussion](https://stackoverflow.com/questions/74345921/performance-comparsion-algorithm-s-and-algorithm-z?noredirect=1#comment131292564_74345921)
Here is a simple example to show the optimization effect:
[AO with compression, with Fast Analyze enabled]
create table ao (a int, b inet, c inet) with (appendonly=true, orientation=row, compresstype=zlib, compresslevel=3);
insert into ao select i, (select ((i%255)::text || '.' || (i%255)::text || '.' || (i%255)::text || '.' ||
(i%255)::text))::inet, (select ((i%255)::text || '.' || (i%255)::text || '.' || (i%255)::text || '.' ||
(i%255)::text))::inet from generate_series(1,10000000)i;
insert into ao select * from ao;
insert into ao select * from ao;
insert into ao select * from ao;
insert into ao select * from ao;
insert into ao select * from ao;
insert into ao select * from ao;
insert into ao select * from ao;
select count(*) from ao;
count
------------
1280000000
(1 row)
gpadmin=# analyze ao;
ANALYZE
Time: 2814.939 ms (00:02.815)
gpadmin=#
[with block directory and caching warmed]
gpadmin=# analyze ao;
ANALYZE
Time: 1605.342 ms (00:01.605)
gpadmin=#
[Legacy Analyze]
gpadmin=# analyze ao;
ANALYZE
Time: 59711.905 ms (00:59.712)
gpadmin=#
[Heap without compression]
create table heap (a int, b inet, c inet);
insert same data set
gpadmin=# analyze heap;
ANALYZE
Time: 2087.694 ms (00:02.088)
gpadmin=#
Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Reviewed by: Ashwin Agrawal, Soumyadeep Chakraborty, Zhenglong Li, Qing Ma1 parent d5967fd commit 23aef44
File tree
22 files changed
+2912
-212
lines changed- src
- backend
- access
- aocs
- appendonly
- heap
- commands
- utils
- datumstream
- misc
- include
- access
- cdb
- commands
- utils
- test/isolation2
- expected
- input/uao
- output/uao
- sql
22 files changed
+2912
-212
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
1623 | 1624 | | |
1624 | 1625 | | |
1625 | 1626 | | |
1626 | | - | |
1627 | | - | |
1628 | | - | |
1629 | | - | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
1630 | 1634 | | |
1631 | 1635 | | |
1632 | 1636 | | |
1633 | 1637 | | |
1634 | 1638 | | |
1635 | 1639 | | |
1636 | 1640 | | |
1637 | | - | |
1638 | | - | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
1639 | 1649 | | |
1640 | | - | |
1641 | | - | |
1642 | | - | |
1643 | | - | |
1644 | | - | |
1645 | | - | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
1646 | 1657 | | |
1647 | | - | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
1648 | 1681 | | |
1649 | | - | |
1650 | | - | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
1651 | 1685 | | |
1652 | | - | |
1653 | | - | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
1654 | 1691 | | |
1655 | | - | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
1656 | 1695 | | |
1657 | 1696 | | |
1658 | | - | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
1659 | 1713 | | |
1660 | 1714 | | |
1661 | 1715 | | |
| |||
2588 | 2642 | | |
2589 | 2643 | | |
2590 | 2644 | | |
| 2645 | + | |
2591 | 2646 | | |
2592 | 2647 | | |
2593 | 2648 | | |
| |||
2602 | 2657 | | |
2603 | 2658 | | |
2604 | 2659 | | |
2605 | | - | |
2606 | 2660 | | |
2607 | 2661 | | |
2608 | 2662 | | |
| |||
0 commit comments