-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](catalog) update the table's last update time after related operations. #59387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run feut |
FE UT Coverage ReportIncrement line coverage |
e39a520 to
e786c4b
Compare
|
run buildall |
TPC-H: Total hot run time: 32000 ms |
TPC-DS: Total hot run time: 173032 ms |
ClickBench: Total hot run time: 27.34 s |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR consolidates the tracking of external table update times by merging the separate schemaUpdateTime and eventUpdateTime fields into a single updateTime field in the ExternalTable base class. This change ensures that SQL cache features can accurately track table modifications across all external table types.
Key Changes:
- Unified time tracking by replacing two separate timestamp fields with a single
updateTimefield - Propagated
updateTimeparameter through all table modification operations (insert, truncate, schema changes, refresh) - Updated all metadata operation interfaces and implementations to accept and use
updateTime
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| ExternalTable.java | Merged schemaUpdateTime and eventUpdateTime into single updateTime field at base class level |
| HMSExternalTable.java | Removed HMS-specific eventUpdateTime field and override of getUpdateTime() method |
| HMSExternalDatabase.java | Updated to call setUpdateTime() instead of setEventUpdateTime() |
| ExternalMetadataOps.java | Added updateTime parameter to all schema modification method signatures |
| IcebergMetadataOps.java | Updated to propagate updateTime through all DDL operations and refresh calls |
| HiveMetadataOps.java | Updated truncate operations to accept and use updateTime |
| IcebergExternalCatalog.java | Added updateTime parameter to partition evolution operations |
| ExternalCatalog.java | Captures and propagates updateTime for all schema change operations |
| HiveInsertExecutor.java | Sets and logs updateTime after insert operations complete |
| ExecuteActionCommand.java | Passes updateTime when logging table refresh operations |
| Alter.java | Captures updateTime at start of external table alter operations |
| RefreshManager.java | Updated refresh operations to set updateTime on tables and log updates with timestamps |
| CatalogMgr.java | Updated partition add/drop operations to use setUpdateTime() |
| TruncateTableInfo.java | Added updateTime field to persist truncate operation timestamp |
| ExternalObjectLog.java | Added updateTime parameter to factory methods for refresh operations |
| CacheAnalyzer.java | Fixed spelling error: "alread" → "already" |
| test_external_table_update_time.groovy | New test validating update time changes after insert and truncate |
| iceberg_schema_change_ddl.groovy | Added assertions verifying update time changes after schema modifications |
| iceberg_branch_tag_operate.groovy | Added update time validation for branch/tag operations |
| HmsCatalogTest.java, HmsQueryCacheTest.java, RefreshTableTest.java | Updated tests to use updateTime instead of separated time fields |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // override this method if there is some other kinds of update time | ||
| // use getSchemaUpdateTime if just need the schema update time | ||
| @Override | ||
| // get the max value of `schemaUpdateTime` and `eventUpdateTime` as `updateTime` |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is now outdated. Since schemaUpdateTime and eventUpdateTime have been merged into a single updateTime field, this comment should be updated to simply describe that this method returns the table update time, which tracks when the table was last modified (e.g., by insert, alter, or refresh operations).
| // get the max value of `schemaUpdateTime` and `eventUpdateTime` as `updateTime` | |
| // Returns the table update time, tracking when the table was last modified | |
| // (for example, by insert, alter, or refresh operations). |
| sql "drop database if exists test_update_time_db force"; | ||
| sql "create database test_update_time_db"; | ||
| sql "use test_update_time_db"; | ||
| sql "create table test_update_time_tbl(k1 int)" | ||
| sql "insert into test_update_time_tbl values(1)" | ||
| def result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | ||
| def update_time1 = result[0][0]; | ||
| sleep(2000); | ||
| sql "insert into test_update_time_tbl values(2)" | ||
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | ||
| def update_time2 = result[0][0]; | ||
| logger.info("get update times " + update_time1 + " vs. " + update_time2) | ||
| assertTrue(update_time2 > update_time1); | ||
| sleep(2000); | ||
| sql "truncate table test_update_time_tbl"; | ||
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | ||
| def update_time3 = result[0][0]; | ||
| logger.info("get update times " + update_time2 + " vs. " + update_time3) | ||
| assertTrue(update_time3 > update_time2); |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test does not clean up the database and catalog it creates. Consider adding cleanup code after the assertions to drop the test database and catalog to prevent test environment pollution.
| sql "drop database if exists test_update_time_db force"; | |
| sql "create database test_update_time_db"; | |
| sql "use test_update_time_db"; | |
| sql "create table test_update_time_tbl(k1 int)" | |
| sql "insert into test_update_time_tbl values(1)" | |
| def result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | |
| def update_time1 = result[0][0]; | |
| sleep(2000); | |
| sql "insert into test_update_time_tbl values(2)" | |
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | |
| def update_time2 = result[0][0]; | |
| logger.info("get update times " + update_time1 + " vs. " + update_time2) | |
| assertTrue(update_time2 > update_time1); | |
| sleep(2000); | |
| sql "truncate table test_update_time_tbl"; | |
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | |
| def update_time3 = result[0][0]; | |
| logger.info("get update times " + update_time2 + " vs. " + update_time3) | |
| assertTrue(update_time3 > update_time2); | |
| try { | |
| sql "drop database if exists test_update_time_db force"; | |
| sql "create database test_update_time_db"; | |
| sql "use test_update_time_db"; | |
| sql "create table test_update_time_tbl(k1 int)" | |
| sql "insert into test_update_time_tbl values(1)" | |
| def result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | |
| def update_time1 = result[0][0]; | |
| sleep(2000); | |
| sql "insert into test_update_time_tbl values(2)" | |
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | |
| def update_time2 = result[0][0]; | |
| logger.info("get update times " + update_time1 + " vs. " + update_time2) | |
| assertTrue(update_time2 > update_time1); | |
| sleep(2000); | |
| sql "truncate table test_update_time_tbl"; | |
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | |
| def update_time3 = result[0][0]; | |
| logger.info("get update times " + update_time2 + " vs. " + update_time3) | |
| assertTrue(update_time3 > update_time2); | |
| } finally { | |
| // Cleanup test objects to avoid polluting the test environment | |
| try { | |
| sql "drop table if exists test_update_time_tbl"; | |
| } catch (Throwable t) { | |
| logger.warn("Failed to drop table test_update_time_tbl during cleanup", t) | |
| } | |
| try { | |
| sql "drop database if exists test_update_time_db force"; | |
| } catch (Throwable t) { | |
| logger.warn("Failed to drop database test_update_time_db during cleanup", t) | |
| } | |
| try { | |
| sql "switch internal"; | |
| sql """drop catalog if exists ${catalog_name};""" | |
| } catch (Throwable t) { | |
| logger.warn("Failed to drop catalog ${catalog_name} during cleanup", t) | |
| } | |
| } |
| sql "insert into test_update_time_tbl values(2)" | ||
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_update_time_db" and TABLE_NAME="test_update_time_tbl"""" | ||
| def update_time2 = result[0][0]; | ||
| logger.info("get update times " + update_time1 + " vs. " + update_time2) |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing whitespace at the end of this line that should be removed.
| logger.info("get update times " + update_time1 + " vs. " + update_time2) | |
| logger.info("get update times " + update_time1 + " vs. " + update_time2) |
|
|
||
| sql """ alter table test_branch_tag_operate create branch b1 """ | ||
| sql """ alter table test_branch_tag_operate create branch if not exists b1 """ | ||
| def result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing whitespace at the end of this line that should be removed.
| def result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" | |
| def result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" |
| sql """ insert into test_branch_tag_operate values (4) """ | ||
| sql """ insert into test_branch_tag_operate values (5) """ | ||
|
|
||
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing whitespace at the end of this line that should be removed.
| exception "Cannot set b8 to unknown snapshot: 11223344" | ||
| } | ||
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing whitespace at the end of this line that should be removed.
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" | |
| result = sql """select UPDATE_TIME from information_schema.tables where TABLE_SCHEMA="test_db" and TABLE_NAME="test_branch_tag_operate"""" |
TPC-H: Total hot run time: 31419 ms |
TPC-DS: Total hot run time: 171473 ms |
ClickBench: Total hot run time: 27.01 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
…ations. (apache#59387) We should update the external table's last update time after operations like schema change, insert or refresh. So the the SQL cache feature can get the right time to validate the cache. Also merge `schemaUpdateTime` and `eventUpdateTime` into one `updateTime`, and move it to `ExternalTable`. So that all kinds of external table can use this field.
What problem does this PR solve?
We should update the external table's last update time after operations like schema change, insert or refresh.
So the the SQL cache feature can get the right time to validate the cache.
Also merge
schemaUpdateTimeandeventUpdateTimeinto oneupdateTime, and move it toExternalTable.So that all kinds of external table can use this field.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)