Search before asking
Description
For HMS external tables, Doris may estimate table row count by listing Hive files when HMS table parameters do not contain row count and enable_get_row_count_from_file_list is enabled.
Currently, this row-count estimation path may read Hive partition and file metadata without filling Doris' Hive external metadata cache. In a normal query planning flow, the scan planning phase still needs the same partition and file metadata later, so Doris can read the same HMS/file metadata twice in one query planning process.
This is inefficient for Hive tables with many partitions or files, especially when HMS access is expensive.
Expected behavior:
- Query planning should be able to reuse Hive metadata fetched during row-count estimation.
- Non-query metadata display paths, such as
SHOW TABLE STATUS, SHOW STATS, or information_schema.tables, should still avoid filling heavy Hive metadata caches just for displaying cached row count.
Solution
Introduce a separate row-count loading mode for query planning and metadata display paths.
ExternalTable.getRowCount() should load row count in a query-planning mode that may fill external metadata cache.
ExternalTable.getCachedRowCount() and display-oriented paths should keep the lightweight behavior and avoid filling heavy metadata cache.
HMSExternalTable should choose cached or non-cached Hive metadata APIs when estimating row count from file list.
Are you willing to submit PR?
Code of Conduct
Search before asking
Description
For HMS external tables, Doris may estimate table row count by listing Hive files when HMS table parameters do not contain row count and
enable_get_row_count_from_file_listis enabled.Currently, this row-count estimation path may read Hive partition and file metadata without filling Doris' Hive external metadata cache. In a normal query planning flow, the scan planning phase still needs the same partition and file metadata later, so Doris can read the same HMS/file metadata twice in one query planning process.
This is inefficient for Hive tables with many partitions or files, especially when HMS access is expensive.
Expected behavior:
SHOW TABLE STATUS,SHOW STATS, orinformation_schema.tables, should still avoid filling heavy Hive metadata caches just for displaying cached row count.Solution
Introduce a separate row-count loading mode for query planning and metadata display paths.
ExternalTable.getRowCount()should load row count in a query-planning mode that may fill external metadata cache.ExternalTable.getCachedRowCount()and display-oriented paths should keep the lightweight behavior and avoid filling heavy metadata cache.HMSExternalTableshould choose cached or non-cached Hive metadata APIs when estimating row count from file list.Are you willing to submit PR?
Code of Conduct