-
Notifications
You must be signed in to change notification settings - Fork 797
refactor: analyze table #18514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: analyze table #18514
Conversation
Docker Image for PR
|
Docker Image for PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
NDV (Number of Distinct Values) Statistics Accuracy Comparison ReportData Source and Methodology
Complete NDV Accuracy Comparison Table
Key Findings and AnalysisSignificant Performance Differences Detected1. tpcds_100 Shows Superior Accuracy Overall
2. tpcds_100_v2 Shows Mixed Performance
Critical Problem Areas in tpcds_100_v2
Areas Where tpcds_100_v2 Performs Better
Statistical Quality Impact
Conclusion
|
let prev_snapshot_id = snapshot.prev_snapshot_id.map(|(id, _)| id); | ||
if Some(table_statistics.snapshot_id) == prev_snapshot_id { | ||
return Ok(PipelineBuildResult::create()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we compare with the current snapshot id?
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Summary:
This PR improves the
ANALYZE TABLE
process by directly merging pre-collected block-level HyperLogLog (HLL) data, instead of relying on query-based calculations.Previously,
HLL
statistics were calculated dynamically duringANALYZE TABLE
through queries.Now, the block-level HLL data is directly used to merge statistics, reducing calculation time.
Reduces the cost of
ANALYZE TABLE
by leveraging existing statistics, enhancing performance, especially for large tables.Tests
Type of change
This change is