-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Open
Labels
component/statisticssig/plannerSIG: PlannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.The issue or PR belongs to an enhancement.
Description
Enhancement
Currently, when we use the analyze command to collect statistics. There are several problems we have met, especially for large tables:
- Analyze is slow. Since analyze needs to scan the full table, it may take hours even days to finish the analyze job for large tables.
- Analyze may consume much resource. Some users may increase concurrency(like
tidb_build_stats_concurrencyandtidb_distsql_scan_concurrency) to speed up analyze. However, that may consume lots of cpu/mem/io for tikv(when scanning the table and sampling) and lots of cpu/mem for tidb(when merging samples and building stats). - When the table has many columns or some columns have large sizes(like text/blob/json type columns), the samples may take up lots of mem. When merging samples and building stats in tidb, tidb may OOM or analyze may be killed by global mem control mechanism. Maybe we can give up collecting statistics for some columns whose stats are barely used such as json columns.
- The execution of analyze is not fault-tolerant. If one analyze request to some region fails(maybe due to region unavailable or other reasons), the whole analyze job would fail and we need to rerun the analyze job from the very beginning. It is unfriendly to users.
Here is the related issue in tikv repo:
tikv/tikv#14231
Tasks
Use faster murmur3 hash function for FMSketch calculation
Reduce encoding cost
- coprocessor: avoid unnecessary vec allocation in collect_column_stats tikv/tikv#14280
- coprocessor: reuse EvalContext in collect_column_stats tikv/tikv#14376
- coprocessor: avoid unnecessary encode when collect_column_stats tikv/tikv#14365
Avoid FMSketch calculation for single-column index
- statistics: avoid fmsketch calculation for single-column index #41931
- coprocessor: avoid fmsketch calculation for single-column index tikv/tikv#14345
Sample-based NDV calculation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component/statisticssig/plannerSIG: PlannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.The issue or PR belongs to an enhancement.