Skip to content

Conversation

@yuqi1129
Copy link
Contributor

@yuqi1129 yuqi1129 commented Dec 9, 2025

What changes were proposed in this pull request?

This pull request introduces major performance and functionality improvements to the Metalake and table metadata management system. The most significant change is the addition of a Caffeine-based cache for Metalake entities, which reduces redundant storage access and improves efficiency. Additionally, it adds a new method for querying tables by their fully qualified names and optimizes database schema indexing for faster lookups.

Metalake caching and management improvements

  • Introduced a static Caffeine cache (metalakeCache) in MetalakeManager to store BaseMetalake objects, with automatic expiry and scheduled cleanup. This cache is now used in metalakeInUse, loadMetalake, and is invalidated on alter, drop, enable, and disable operations to ensure consistency. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Table metadata querying enhancements

  • Added a new method selectTableByFullQualifiedName to TableMetaMapper, its SQL provider, and the service layer, allowing retrieval of table metadata using metalake, catalog, schema, and table names directly, improving query flexibility and performance. [1] [2] [3] [4] [5]

Database schema and indexing optimizations

  • Added composite indexes (idx_name_da) on the metalake_name, catalog_name, schema_name, and table_name columns (with deleted_at) in both H2 and MySQL schema scripts, significantly improving query performance for lookups by name. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Why are the changes needed?

To improve performance

Fix: #9429

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Existing test

Copilot AI review requested due to automatic review settings December 9, 2025 10:01
@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 9, 2025

Before

image

After

image

profiler

20251209-173554.html

@yuqi1129 yuqi1129 marked this pull request as draft December 9, 2025 10:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces performance optimizations for table metadata queries through database indexing and a new query method, plus adds Caffeine-based caching for Metalake entities. However, there are several critical issues that must be addressed before merging.

Key changes:

  • Added composite indexes on (name, deleted_at) columns across metalake, catalog, schema, and table metadata tables
  • Introduced selectTableByFullQualifiedName method to query tables using fully qualified names with JOIN operations
  • Added Caffeine cache for BaseMetalake objects in MetalakeManager with automatic expiry and cleanup

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/h2/schema-1.1.0-h2.sql Adds composite indexes on name and deleted_at columns; contains critical bug on line 66
scripts/h2/upgrade-1.0.0-to-1.1.0-h2.sql Migration script to add indexes for existing H2 databases
scripts/mysql/schema-1.1.0-mysql.sql Adds composite indexes on name and deleted_at columns; contains critical SQL syntax error on line 83
scripts/mysql/upgrade-1.0.0-to-1.1.0-mysql.sql Migration script to add indexes for existing MySQL databases
scripts/postgresql/schema-1.1.0-postgresql.sql Adds composite indexes on name and deleted_at columns
scripts/postgresql/upgrade-1.0.0-to-1.1.0-postgresql.sql Migration script to add indexes for existing PostgreSQL databases
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java Implements Caffeine cache for metalakes; has error handling and resource cleanup issues
core/src/main/java/org/apache/gravitino/storage/relational/mapper/TableMetaMapper.java Adds new mapper method for querying by fully qualified name
core/src/main/java/org/apache/gravitino/storage/relational/mapper/TableMetaSQLProviderFactory.java Adds factory method for the new query
core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java Implements JOIN-based query; contains critical SQL injection vulnerability
core/src/main/java/org/apache/gravitino/storage/relational/service/TableMetaService.java Updates table lookup to use new optimized query method
Comments suppressed due to low confidence (1)

core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:90

  • The close() method does not clean up the metalakeCache or shut down the scheduled executor service used by the cache's scheduler. This can lead to resource leaks when the MetalakeManager is closed. Add metalakeCache.invalidateAll() and shutdown the scheduler's executor service in the close method.
  @Override
  public void close() {
    // do nothing
  }

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 9, 2025

With 30000 tables under two catalogs

image

@yuqi1129 yuqi1129 self-assigned this Dec 23, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 34 changed files in this pull request and generated 9 comments.

Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more comments.

would you like to review again?

@yuqi1129 yuqi1129 merged commit e964783 into apache:main Jan 9, 2026
26 checks passed
@yuqi1129 yuqi1129 deleted the issue_9429 branch January 9, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize Lance REST describe interface.

3 participants