Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Jan 9, 2026

What problem does this PR solve?

Description

Changes

This PR refactors the Iceberg metadata cache structure to improve code organization and adds comprehensive test cases for table cache behavior.

Main Changes

1. Refactored IcebergMetadataCache

  • Introduced IcebergTableCacheValue to encapsulate table-related metadata
  • Removed redundant snapshotListCache and snapshotCache
  • Merged snapshot information into IcebergTableCacheValue with lazy loading
  • Simplified cache structure from 3 separate caches to 2: tableCache and viewCache

Before:

private LoadingCache<IcebergMetadataCacheKey, List<Snapshot>> snapshotListCache;
private LoadingCache<IcebergMetadataCacheKey, Table> tableCache;
private LoadingCache<IcebergMetadataCacheKey, IcebergSnapshotCacheValue> snapshotCache;

After:

private LoadingCache<IcebergMetadataCacheKey, IcebergTableCacheValue> tableCache;
private LoadingCache<IcebergMetadataCacheKey, View> viewCache;

2. Lazy Loading for Snapshot Cache

  • Snapshot cache is now loaded on-demand through IcebergTableCacheValue.getSnapshotCacheValue()
  • Reduced unnecessary memory footprint for queries that don't require snapshot information
  • Snapshot information is mainly used for MTMV scenarios

3. Simplified Cache API

  • getIcebergTable(): Returns the Table object directly from IcebergTableCacheValue
  • getSnapshotCache(): Returns snapshot cache value with lazy loading
  • getSnapshotList(): Returns snapshot list from the Table object

4. Test Cases

  • Added comprehensive test case test_iceberg_table_cache to verify cache behavior
  • Tests cover both cache-enabled and cache-disabled scenarios
  • Validated external modifications (INSERT, DELETE, UPDATE, schema changes) are properly handled

Benefits

Aspect Improvement
Memory Usage Reduced by eliminating duplicate caching of snapshot information
Code Structure Cleaner with single IcebergTableCacheValue instead of multiple separate caches
Performance Better with lazy loading of snapshot cache only when needed
Maintainability Simpler cache management logic

Test Results

  • Added regression test: test_iceberg_table_cache.groovy
  • Tests validate cache behavior with TTL and external modifications
  • Verified cache invalidation works correctly with REFRESH TABLE
  • Test scenarios include:
    • DML operations (INSERT, DELETE, UPDATE, INSERT OVERWRITE)
    • Schema changes (ADD/DROP/RENAME COLUMN, ALTER COLUMN TYPE)
    • Partition evolution (ADD/DROP/REPLACE PARTITION FIELD)

Related Files

Core Changes:

  • IcebergMetadataCache.java - Refactored cache structure
  • IcebergTableCacheValue.java - New class to encapsulate table metadata
  • IcebergExternalCatalog.java - Updated cache-related configurations

Tests:

  • test_iceberg_table_cache.groovy - Comprehensive cache behavior tests
  • Suite.groovy - Updated getSparkIcebergContainerName() implementation

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223 suxiaogang223 marked this pull request as draft January 9, 2026 08:14
@suxiaogang223
Copy link
Contributor Author

run external

@suxiaogang223 suxiaogang223 marked this pull request as ready for review January 12, 2026 08:48
@suxiaogang223 suxiaogang223 changed the title Refact meta cache [improvement](iceberg) Refactor Iceberg metadata cache structure and add table cache test cases Jan 12, 2026
@suxiaogang223 suxiaogang223 changed the title [improvement](iceberg) Refactor Iceberg metadata cache structure and add table cache test cases [enhance](iceberg) Refactor Iceberg metadata cache structure and add table cache test cases Jan 12, 2026
@suxiaogang223
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/36) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 32189 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1574f3a0840831ea973580b5275b27182d07b2bf, data reload: false

------ Round 1 ----------------------------------
q1	17636	4183	4086	4086
q2	2077	377	245	245
q3	10116	1275	724	724
q4	10208	855	309	309
q5	7538	2107	1841	1841
q6	187	168	135	135
q7	911	811	671	671
q8	9273	1414	1159	1159
q9	5100	4604	4559	4559
q10	6839	1852	1436	1436
q11	537	277	275	275
q12	735	756	590	590
q13	17802	3842	3078	3078
q14	297	298	274	274
q15	609	517	510	510
q16	667	684	647	647
q17	676	764	578	578
q18	6747	6583	6973	6583
q19	1355	996	669	669
q20	451	404	269	269
q21	3285	2654	2506	2506
q22	1170	1126	1045	1045
Total cold run time: 104216 ms
Total hot run time: 32189 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4395	4248	4257	4248
q2	324	411	333	333
q3	2299	2747	2449	2449
q4	1516	1815	1484	1484
q5	4601	4340	4422	4340
q6	224	177	129	129
q7	1958	1941	1816	1816
q8	2570	2413	2394	2394
q9	7135	7364	7264	7264
q10	2508	2691	2307	2307
q11	579	495	464	464
q12	701	723	561	561
q13	3349	3817	3040	3040
q14	268	289	266	266
q15	525	490	482	482
q16	616	674	616	616
q17	1099	1321	1362	1321
q18	7417	7489	7337	7337
q19	808	774	777	774
q20	1896	1974	1872	1872
q21	4485	4339	4196	4196
q22	1096	1020	995	995
Total cold run time: 50369 ms
Total hot run time: 48688 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173580 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1574f3a0840831ea973580b5275b27182d07b2bf, data reload: false

query5	4444	598	436	436
query6	334	231	222	222
query7	4218	466	269	269
query8	330	258	246	246
query9	8770	2715	2657	2657
query10	508	354	336	336
query11	15270	15196	14803	14803
query12	173	112	111	111
query13	1261	477	368	368
query14	6214	2967	2704	2704
query14_1	2632	2581	2635	2581
query15	206	194	175	175
query16	1012	462	468	462
query17	1112	690	588	588
query18	2507	450	353	353
query19	229	228	198	198
query20	126	117	115	115
query21	224	145	125	125
query22	3994	3948	3896	3896
query23	16088	15712	15492	15492
query23_1	15825	15645	15676	15645
query24	7694	1600	1175	1175
query24_1	1185	1186	1213	1186
query25	580	521	451	451
query26	1240	281	165	165
query27	2754	481	311	311
query28	4515	2151	2142	2142
query29	796	569	476	476
query30	315	247	211	211
query31	799	639	560	560
query32	82	77	76	76
query33	554	346	300	300
query34	909	886	521	521
query35	718	784	681	681
query36	921	925	824	824
query37	133	95	79	79
query38	2740	2667	2641	2641
query39	797	756	729	729
query39_1	721	726	747	726
query40	229	136	118	118
query41	74	68	69	68
query42	106	103	102	102
query43	446	433	412	412
query44	1324	729	717	717
query45	192	188	181	181
query46	854	950	598	598
query47	1370	1456	1308	1308
query48	319	341	248	248
query49	638	419	353	353
query50	647	280	216	216
query51	3752	3799	3749	3749
query52	106	116	111	111
query53	304	327	274	274
query54	304	268	266	266
query55	81	77	72	72
query56	343	299	278	278
query57	1036	983	954	954
query58	267	246	254	246
query59	2187	2247	2036	2036
query60	316	321	298	298
query61	165	157	153	153
query62	420	352	312	312
query63	294	261	266	261
query64	4860	1316	994	994
query65	3835	3776	3749	3749
query66	1428	420	311	311
query67	15460	16105	15120	15120
query68	6029	992	713	713
query69	517	353	312	312
query70	1076	958	927	927
query71	418	294	276	276
query72	6008	3420	3527	3420
query73	774	721	300	300
query74	8871	8891	8623	8623
query75	2817	2831	2450	2450
query76	3456	1050	633	633
query77	534	377	274	274
query78	9874	9836	9158	9158
query79	1584	944	598	598
query80	704	572	463	463
query81	513	257	226	226
query82	449	147	110	110
query83	272	258	239	239
query84	257	118	101	101
query85	902	512	465	465
query86	433	302	318	302
query87	2872	2850	2763	2763
query88	3224	2230	2231	2230
query89	399	351	345	345
query90	2148	156	158	156
query91	175	169	141	141
query92	80	64	68	64
query93	1495	904	530	530
query94	637	319	297	297
query95	582	381	296	296
query96	580	452	214	214
query97	2371	2391	2331	2331
query98	239	200	202	200
query99	578	594	490	490
Total cold run time: 254239 ms
Total hot run time: 173580 ms

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/36) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants