Skip to content

[Enhancement] Avoid duplicate Hive metadata loading when estimating HMS external table row count #63694

@foxtail463

Description

@foxtail463

Search before asking

  • I had searched in the issues and found no similar issues.

Description

For HMS external tables, Doris may estimate table row count by listing Hive files when HMS table parameters do not contain row count and enable_get_row_count_from_file_list is enabled.

Currently, this row-count estimation path may read Hive partition and file metadata without filling Doris' Hive external metadata cache. In a normal query planning flow, the scan planning phase still needs the same partition and file metadata later, so Doris can read the same HMS/file metadata twice in one query planning process.

This is inefficient for Hive tables with many partitions or files, especially when HMS access is expensive.

Expected behavior:

  • Query planning should be able to reuse Hive metadata fetched during row-count estimation.
  • Non-query metadata display paths, such as SHOW TABLE STATUS, SHOW STATS, or information_schema.tables, should still avoid filling heavy Hive metadata caches just for displaying cached row count.

Solution

Introduce a separate row-count loading mode for query planning and metadata display paths.

  • ExternalTable.getRowCount() should load row count in a query-planning mode that may fill external metadata cache.
  • ExternalTable.getCachedRowCount() and display-oriented paths should keep the lightweight behavior and avoid filling heavy metadata cache.
  • HMSExternalTable should choose cached or non-cached Hive metadata APIs when estimating row count from file list.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions