Improve Analyze Performance and Stability

## Enhancement

Currently, when we use the analyze command to collect statistics. There are several problems we have met, especially for large tables:

- Analyze is slow. Since analyze needs to scan the full table, it may take hours even days to finish the analyze job for large tables. 
- Analyze may consume much resource. Some users may increase concurrency(like `tidb_build_stats_concurrency` and `tidb_distsql_scan_concurrency`) to speed up analyze. However, that may consume lots of cpu/mem/io for tikv(when scanning the table and sampling) and lots of cpu/mem for tidb(when merging samples and building stats). 
- When the table has many columns or some columns have large sizes(like text/blob/json type columns), the samples may take up lots of mem. When merging samples and building stats in tidb, tidb may OOM or analyze may be killed by global mem control mechanism. Maybe we can give up collecting statistics for some columns whose stats are barely used such as json columns.
- The execution of analyze is not fault-tolerant. If one analyze request to some region fails(maybe due to region unavailable or other reasons), the whole analyze job would fail and we need to rerun the analyze job from the very beginning. It is unfriendly to users.

Here is the related issue in tikv repo:
https://github.com/tikv/tikv/issues/14231 

### Tasks


#### Use faster murmur3 hash function for FMSketch calculation
- [x] https://github.com/tikv/tikv/pull/14204

#### Reduce encoding cost
- [x] https://github.com/tikv/tikv/pull/14280
- [x] https://github.com/tikv/tikv/pull/14376
- [ ] https://github.com/tikv/tikv/pull/14365

#### Avoid FMSketch calculation for single-column index
- [x] https://github.com/pingcap/tidb/pull/41931
- [x] https://github.com/tikv/tikv/pull/14345

#### Sample-based NDV calculation
- [ ] https://github.com/pingcap/tidb/pull/41940
- [ ] https://github.com/pingcap/tipb/pull/294


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Analyze Performance and Stability #41930

Enhancement

Tasks

Use faster murmur3 hash function for FMSketch calculation

Reduce encoding cost

Avoid FMSketch calculation for single-column index

Sample-based NDV calculation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Analyze Performance and Stability #41930

Description

Enhancement

Tasks

Use faster murmur3 hash function for FMSketch calculation

Reduce encoding cost

Avoid FMSketch calculation for single-column index

Sample-based NDV calculation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions