-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
A levi.delta_doctor(delta_table) command could be a nice way for users to help identify issues in their Delta table that could cause slow query performance.
There are several known problems that can cause poor performance of Delta tables:
- too many small files
- large files
- file stats not being collected on the right columns/file stats missing for certain files
- tables that are over-partitioned
- tables that are not Z ORDERed
- tables that should have constraints, but do not
The levi.delta_doctor(delta_table) could return a string with the following warnings:
- SmallFileWarning: Your table contains 456 files with less than 1MB of data and you could consider optimizing to compact the small files
- LargeFileWarning: Your table contains 32 files with more than 1.5GB of data. You should split up these files.
- FileStatsWarning: You are only collecting stats for col1 and col2 in some files.
We should make it really easy for users to see if there are any obvious problems in their Delta table. We will ideally give them really easy solutions to fix these problems as well!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels