Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Dec 26, 2025

What problem does this PR solve?

Hudi Docker Environment

This directory contains the Docker Compose configuration for setting up a Hudi test environment with Spark, Hive Metastore, MinIO (S3-compatible storage), and PostgreSQL.

Components

  • Spark: Apache Spark 3.5.7 for processing Hudi tables
  • Hive Metastore: Starburst Hive Metastore for table metadata management
  • PostgreSQL: Database backend for Hive Metastore
  • MinIO: S3-compatible object storage for Hudi data files

Important Configuration Parameters

Container UID

  • Parameter: CONTAINER_UID in custom_settings.env
  • Default: doris--
  • Note: Must be set to a unique value to avoid conflicts with other Docker environments
  • Example: CONTAINER_UID="doris--bender--"

Port Configuration (hudi.env.tpl)

  • HIVE_METASTORE_PORT: Port for Hive Metastore Thrift service (default: 19083)
  • MINIO_API_PORT: MinIO S3 API port (default: 19100)
  • MINIO_CONSOLE_PORT: MinIO web console port (default: 19101)
  • SPARK_UI_PORT: Spark web UI port (default: 18080)

MinIO Credentials (hudi.env.tpl)

  • MINIO_ROOT_USER: MinIO access key (default: minio)
  • MINIO_ROOT_PASSWORD: MinIO secret key (default: minio123)
  • HUDI_BUCKET: S3 bucket name for Hudi data (default: datalake)

Version Compatibility

⚠️ Important: Hadoop versions must match Spark's built-in Hadoop version

  • Spark Version: 3.5.7 (uses Hadoop 3.3.4) - default build for Hudi 1.0.2
  • Hadoop AWS Version: 3.3.4 (matching Spark's Hadoop)
  • Hudi Bundle Version: 1.0.2 Spark 3.5 bundle (default build, matches Spark 3.5.7, matches Doris's Hudi version to avoid versionCode compatibility issues)
  • AWS SDK v1 Version: 1.12.262 (required for Hadoop 3.3.4 S3A support, 1.12.x series)
  • PostgreSQL JDBC Version: 42.7.1 (compatible with Hive Metastore)
  • Hudi 1.0.x Compatibility: Supports Spark 3.5.x (default), 3.4.x, and 3.3.x

JAR Dependencies (hudi.env.tpl)

All JAR file versions and URLs are configurable:

  • HUDI_BUNDLE_VERSION / HUDI_BUNDLE_URL: Hudi Spark bundle
  • HADOOP_AWS_VERSION / HADOOP_AWS_URL: Hadoop S3A filesystem support
  • AWS_SDK_BUNDLE_VERSION / AWS_SDK_BUNDLE_URL: AWS Java SDK Bundle v1 (required for Hadoop 3.3.4 S3A support, 1.12.x series)

Note: hadoop-common is already included in Spark's built-in Hadoop distribution, so it's not configured here.

  • POSTGRESQL_JDBC_VERSION / POSTGRESQL_JDBC_URL: PostgreSQL JDBC driver

Starting the Environment

# Start Hudi environment
./docker/thirdparties/run-thirdparties-docker.sh -c hudi

# Stop Hudi environment
./docker/thirdparties/run-thirdparties-docker.sh -c hudi --stop

Adding Data

⚠️ Important: To ensure data consistency after Docker restarts, only use SQL scripts to add data. Data added through spark-sql interactive shell is temporary and will not persist after container restart.

Using SQL Scripts

Add new SQL files in scripts/create_preinstalled_scripts/hudi/ directory:

  • Files are executed in alphabetical order (e.g., 01_config_and_database.sql, 02_create_user_activity_log_tables.sql, etc.)
  • Use descriptive names with numeric prefixes to control execution order
  • Use environment variable substitution: ${HIVE_METASTORE_URIS} and ${HUDI_BUCKET}
  • Data created through SQL scripts will persist after Docker restart

Example: Create 08_create_custom_table.sql:

USE regression_hudi;

CREATE TABLE IF NOT EXISTS my_hudi_table (
  id BIGINT,
  name STRING,
  created_at TIMESTAMP
) USING hudi
TBLPROPERTIES (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'created_at',
  hoodie.datasource.hive_sync.enable = 'true',
  hoodie.datasource.hive_sync.metastore.uris = '${HIVE_METASTORE_URIS}',
  hoodie.datasource.hive_sync.mode = 'hms'
)
LOCATION 's3a://${HUDI_BUCKET}/warehouse/regression_hudi/my_hudi_table';

INSERT INTO my_hudi_table VALUES
  (1, 'Alice', TIMESTAMP '2024-01-01 10:00:00'),
  (2, 'Bob', TIMESTAMP '2024-01-02 11:00:00');

After adding SQL files, restart the container to execute them:

docker restart doris--hudi-spark

Creating Hudi Catalog in Doris

After starting the Hudi Docker environment, you can create a Hudi catalog in Doris to access Hudi tables:

-- Create Hudi catalog
CREATE CATALOG IF NOT EXISTS hudi_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://<externalEnvIp>:19083',
    's3.endpoint' = 'http://<externalEnvIp>:19100',
    's3.access_key' = 'minio',
    's3.secret_key' = 'minio123',
    's3.region' = 'us-east-1',
    'use_path_style' = 'true'
);

-- Switch to Hudi catalog
SWITCH hudi_catalog;

-- Use database
USE regression_hudi;

-- Show tables
SHOW TABLES;

-- Query Hudi table
SELECT * FROM user_activity_log_cow_partition LIMIT 10;

Configuration Parameters:

  • hive.metastore.uris: Hive Metastore Thrift service address (default port: 19083)
  • s3.endpoint: MinIO S3 API endpoint (default port: 19100)
  • s3.access_key: MinIO access key (default: minio)
  • s3.secret_key: MinIO secret key (default: minio123)
  • s3.region: S3 region (default: us-east-1)
  • use_path_style: Use path-style access for MinIO (required: true)

Replace <externalEnvIp> with your actual external environment IP address (e.g., 127.0.0.1 for localhost).

Debugging with Spark SQL

⚠️ Note: The methods below are for debugging purposes only. Data created through spark-sql interactive shell will not persist after Docker restart. To add persistent data, use SQL scripts as described in the "Adding Data" section.

1. Connect to Spark Container

docker exec -it doris--hudi-spark bash

2. Start Spark SQL Interactive Shell

/opt/spark/bin/spark-sql \
  --master local[*] \
  --name hudi-debug \
  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
  --conf spark.sql.catalogImplementation=hive \
  --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
  --conf spark.sql.warehouse.dir=s3a://datalake/warehouse

3. Common Debugging Commands

-- Show databases
SHOW DATABASES;

-- Use database
USE regression_hudi;

-- Show tables
SHOW TABLES;

-- Describe table structure
DESCRIBE EXTENDED user_activity_log_cow_partition;

-- Query data
SELECT * FROM user_activity_log_cow_partition LIMIT 10;

-- Check Hudi table properties
SHOW TBLPROPERTIES user_activity_log_cow_partition;

-- View Spark configuration
SET -v;

-- Check Hudi-specific configurations
SET hoodie.datasource.write.hive_style_partitioning;

4. View Spark Web UI

Access Spark Web UI at: http://localhost:18080 (or configured SPARK_UI_PORT)

5. Check Container Logs

# View Spark container logs
docker logs doris--hudi-spark --tail 100 -f

# View Hive Metastore logs
docker logs doris--hudi-metastore --tail 100 -f

# View MinIO logs
docker logs doris--hudi-minio --tail 100 -f

6. Verify S3 Data

# Access MinIO console
# URL: http://localhost:19101 (or configured MINIO_CONSOLE_PORT)
# Username: minio (or MINIO_ROOT_USER)
# Password: minio123 (or MINIO_ROOT_PASSWORD)

# Or use MinIO client
docker exec -it doris--hudi-minio-mc mc ls myminio/datalake/warehouse/regression_hudi/

Troubleshooting

Container Exits Immediately

  • Check logs: docker logs doris--hudi-spark
  • Verify SUCCESS file exists: docker exec doris--hudi-spark test -f /opt/hudi-scripts/SUCCESS
  • Ensure Hive Metastore is running: docker ps | grep metastore

ClassNotFoundException Errors

  • Verify JAR files are downloaded: docker exec doris--hudi-spark ls -lh /opt/hudi-cache/
  • Check JAR versions match Spark's Hadoop version (3.3.4)
  • Review hudi.env.tpl for correct version numbers

S3A Connection Issues

  • Verify MinIO is running: docker ps | grep minio
  • Check MinIO credentials in hudi.env.tpl
  • Test S3 connection: docker exec doris--hudi-minio-mc mc ls myminio/

Hive Metastore Connection Issues

  • Check Metastore is ready: docker logs doris--hudi-metastore | grep "Metastore is ready"
  • Verify PostgreSQL is running: docker ps | grep metastore-db
  • Test connection: docker exec doris--hudi-metastore-db pg_isready -U hive

File Structure

hudi/
├── hudi.yaml.tpl          # Docker Compose template
├── hudi.env.tpl           # Environment variables template
├── scripts/
│   ├── init.sh            # Initialization script
│   ├── create_preinstalled_scripts/
│   │   └── hudi/          # SQL scripts (01_config_and_database.sql, 02_create_user_activity_log_tables.sql, ...)
│   └── SUCCESS            # Initialization marker (generated)
└── cache/                 # Downloaded JAR files (generated)

Notes

  • All generated files (.yaml, .env, cache/, SUCCESS) are ignored by Git
  • SQL scripts support environment variable substitution using ${VARIABLE_NAME} syntax
  • Hadoop version compatibility is critical - must match Spark's built-in version
  • Container keeps running after initialization for healthcheck and debugging

Notes

  • All generated files (.yaml, .env, cache/, SUCCESS) are ignored by Git
  • SQL scripts support environment variable substitution using ${VARIABLE_NAME} syntax
  • Hadoop version compatibility is critical - must match Spark's built-in version
  • Container keeps running after initialization for healthcheck and debugging

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run external

1 similar comment
@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32220 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b59e762d8b76c660d7307580d91ed961134fe695, data reload: false

------ Round 1 ----------------------------------
q1	17700	4343	4109	4109
q2	2150	354	245	245
q3	10132	1264	731	731
q4	10234	897	319	319
q5	7571	2009	1971	1971
q6	184	166	137	137
q7	949	801	660	660
q8	9276	1338	1139	1139
q9	5208	4808	4675	4675
q10	6901	1798	1450	1450
q11	496	313	276	276
q12	691	783	609	609
q13	17782	3831	3083	3083
q14	282	292	278	278
q15	581	506	502	502
q16	689	693	620	620
q17	713	713	614	614
q18	6709	6311	6959	6311
q19	1136	1085	639	639
q20	418	392	270	270
q21	3348	2747	2499	2499
q22	1142	1083	1102	1083
Total cold run time: 104292 ms
Total hot run time: 32220 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4383	4275	4244	4244
q2	337	415	326	326
q3	2266	2786	2449	2449
q4	1395	1805	1390	1390
q5	4621	4393	4298	4298
q6	213	175	128	128
q7	1984	1906	1793	1793
q8	2511	2347	2303	2303
q9	7093	7181	6987	6987
q10	2517	2757	2159	2159
q11	537	465	436	436
q12	681	729	578	578
q13	3333	3799	3131	3131
q14	291	301	271	271
q15	525	503	493	493
q16	611	640	614	614
q17	1100	1196	1300	1196
q18	7728	7408	7340	7340
q19	900	866	895	866
q20	1949	2040	1875	1875
q21	4660	4371	4099	4099
q22	1080	1054	988	988
Total cold run time: 50715 ms
Total hot run time: 47964 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172661 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b59e762d8b76c660d7307580d91ed961134fe695, data reload: false

query5	4410	590	436	436
query6	332	226	205	205
query7	4207	465	267	267
query8	362	259	236	236
query9	8745	2645	2622	2622
query10	509	369	322	322
query11	15213	15076	14851	14851
query12	174	115	114	114
query13	1291	509	389	389
query14	6233	2981	2698	2698
query14_1	2619	2599	2611	2599
query15	198	192	177	177
query16	997	475	443	443
query17	1114	712	588	588
query18	2482	450	347	347
query19	230	224	197	197
query20	129	123	115	115
query21	214	141	119	119
query22	3898	4094	4023	4023
query23	15880	15757	15446	15446
query23_1	15592	15735	15542	15542
query24	7380	1596	1195	1195
query24_1	1210	1183	1239	1183
query25	547	436	389	389
query26	1272	267	159	159
query27	2763	447	295	295
query28	4502	2194	2178	2178
query29	793	521	423	423
query30	311	244	210	210
query31	775	635	593	593
query32	76	66	69	66
query33	523	321	284	284
query34	902	873	531	531
query35	745	754	711	711
query36	840	853	818	818
query37	128	94	74	74
query38	2706	2675	2627	2627
query39	781	748	730	730
query39_1	727	714	727	714
query40	211	133	114	114
query41	65	62	63	62
query42	103	102	100	100
query43	445	457	427	427
query44	1317	761	758	758
query45	185	184	177	177
query46	867	969	610	610
query47	1363	1469	1318	1318
query48	312	328	257	257
query49	637	446	326	326
query50	635	279	219	219
query51	3829	3789	3723	3723
query52	104	111	102	102
query53	302	325	265	265
query54	280	263	257	257
query55	80	75	69	69
query56	284	291	294	291
query57	1056	1014	953	953
query58	260	243	252	243
query59	2179	2195	1992	1992
query60	315	323	298	298
query61	166	161	163	161
query62	393	369	306	306
query63	304	263	271	263
query64	5040	1322	1037	1037
query65	3831	3677	3782	3677
query66	1468	445	329	329
query67	14954	14959	15531	14959
query68	3720	1047	741	741
query69	498	363	317	317
query70	1072	946	825	825
query71	369	304	276	276
query72	6295	3490	3394	3394
query73	785	740	314	314
query74	8762	8775	8585	8585
query75	2851	2822	2458	2458
query76	3884	1087	652	652
query77	520	377	281	281
query78	9635	9829	9120	9120
query79	1256	973	605	605
query80	1514	578	481	481
query81	533	258	225	225
query82	409	145	110	110
query83	350	252	233	233
query84	253	115	108	108
query85	936	517	471	471
query86	394	319	319	319
query87	2826	2833	2761	2761
query88	3212	2280	2277	2277
query89	381	342	324	324
query90	1837	154	147	147
query91	168	167	143	143
query92	66	71	60	60
query93	1129	915	561	561
query94	635	334	279	279
query95	574	330	353	330
query96	580	470	214	214
query97	2311	2337	2254	2254
query98	226	198	201	198
query99	583	588	517	517
Total cold run time: 250187 ms
Total hot run time: 172661 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b59e762d8b76c660d7307580d91ed961134fe695, data reload: false

query1	0.05	0.04	0.05
query2	0.09	0.04	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.28	0.27	0.26
query6	1.15	0.68	0.64
query7	0.03	0.02	0.03
query8	0.05	0.04	0.04
query9	0.56	0.49	0.49
query10	0.55	0.54	0.54
query11	0.15	0.12	0.11
query12	0.16	0.12	0.13
query13	0.61	0.60	0.59
query14	0.98	0.96	0.99
query15	0.80	0.78	0.81
query16	0.42	0.39	0.42
query17	1.06	1.03	1.06
query18	0.23	0.22	0.22
query19	1.90	1.86	1.84
query20	0.02	0.02	0.01
query21	15.43	0.29	0.14
query22	4.84	0.05	0.05
query23	15.87	0.28	0.10
query24	0.95	0.53	1.54
query25	0.09	0.08	0.05
query26	0.14	0.13	0.13
query27	0.06	0.08	0.06
query28	5.01	1.06	0.88
query29	12.58	3.93	3.16
query30	0.28	0.13	0.11
query31	2.82	0.62	0.38
query32	3.23	0.55	0.47
query33	3.00	2.98	3.02
query34	16.56	5.16	4.44
query35	4.50	4.54	4.51
query36	0.65	0.50	0.50
query37	0.12	0.07	0.06
query38	0.08	0.04	0.03
query39	0.04	0.02	0.03
query40	0.17	0.15	0.14
query41	0.08	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 97.53 s
Total hot run time: 27.14 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31872 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7322ac72d675f6188e8a4d81785c9abad680a493, data reload: false

------ Round 1 ----------------------------------
q1	17632	4258	4042	4042
q2	2074	356	236	236
q3	10142	1278	706	706
q4	10202	849	317	317
q5	7517	2056	1877	1877
q6	185	167	138	138
q7	940	766	677	677
q8	9265	1419	1120	1120
q9	4883	4572	4639	4572
q10	6836	1815	1473	1473
q11	509	314	295	295
q12	702	748	603	603
q13	17790	3826	3046	3046
q14	281	307	275	275
q15	585	513	498	498
q16	711	686	628	628
q17	683	835	489	489
q18	6536	6357	6769	6357
q19	1282	1098	617	617
q20	432	439	278	278
q21	3231	2592	2573	2573
q22	1201	1102	1055	1055
Total cold run time: 103619 ms
Total hot run time: 31872 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4396	4221	4226	4221
q2	327	440	347	347
q3	2342	2793	2454	2454
q4	1436	1822	1392	1392
q5	4560	4498	4340	4340
q6	223	178	132	132
q7	1990	1957	1811	1811
q8	2499	2416	2373	2373
q9	7151	7155	7341	7155
q10	2330	2731	2309	2309
q11	544	494	470	470
q12	746	783	617	617
q13	3471	3834	3087	3087
q14	272	277	265	265
q15	535	498	491	491
q16	608	651	607	607
q17	1089	1291	1363	1291
q18	7444	7218	7413	7218
q19	794	812	843	812
q20	1937	1946	1799	1799
q21	4548	4262	4157	4157
q22	1065	1041	980	980
Total cold run time: 50307 ms
Total hot run time: 48328 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172810 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7322ac72d675f6188e8a4d81785c9abad680a493, data reload: false

query5	4431	581	456	456
query6	328	233	204	204
query7	4212	464	269	269
query8	341	258	235	235
query9	8774	2604	2654	2604
query10	515	367	320	320
query11	15329	15120	14886	14886
query12	165	120	113	113
query13	1285	499	393	393
query14	6105	2927	2655	2655
query14_1	2591	2562	2589	2562
query15	201	203	170	170
query16	1004	460	436	436
query17	1081	644	548	548
query18	2482	422	346	346
query19	224	218	206	206
query20	121	117	116	116
query21	212	144	119	119
query22	4000	4091	3885	3885
query23	16061	15709	15564	15564
query23_1	15551	15550	15476	15476
query24	7463	1568	1176	1176
query24_1	1204	1182	1195	1182
query25	566	494	437	437
query26	1240	264	163	163
query27	2765	457	289	289
query28	4597	2123	2102	2102
query29	790	555	453	453
query30	312	241	224	224
query31	781	621	539	539
query32	76	73	69	69
query33	555	343	302	302
query34	900	877	524	524
query35	752	810	704	704
query36	859	870	803	803
query37	130	94	83	83
query38	2752	2660	2670	2660
query39	778	753	731	731
query39_1	743	733	720	720
query40	225	138	119	119
query41	78	71	68	68
query42	109	103	104	103
query43	455	470	412	412
query44	1320	764	737	737
query45	190	185	180	180
query46	840	959	597	597
query47	1458	1518	1381	1381
query48	307	328	244	244
query49	632	427	347	347
query50	652	301	208	208
query51	3772	3816	3733	3733
query52	108	109	98	98
query53	296	325	272	272
query54	294	281	264	264
query55	81	75	69	69
query56	300	304	305	304
query57	1061	1030	962	962
query58	281	263	264	263
query59	2143	2075	2126	2075
query60	330	320	345	320
query61	165	163	162	162
query62	396	361	337	337
query63	301	263	267	263
query64	4930	1308	946	946
query65	3645	3754	3732	3732
query66	1413	410	298	298
query67	15131	15075	15049	15049
query68	7718	982	702	702
query69	505	350	309	309
query70	1060	910	923	910
query71	370	298	264	264
query72	6067	3442	3481	3442
query73	762	719	307	307
query74	8790	8782	8670	8670
query75	2839	2812	2474	2474
query76	3854	1070	644	644
query77	535	354	280	280
query78	9687	9856	9102	9102
query79	1358	916	582	582
query80	616	557	469	469
query81	494	265	230	230
query82	218	148	109	109
query83	267	252	239	239
query84	255	123	95	95
query85	883	501	457	457
query86	387	323	324	323
query87	2852	2816	2705	2705
query88	3853	2203	2199	2199
query89	392	360	338	338
query90	2241	151	149	149
query91	176	175	148	148
query92	78	68	60	60
query93	1518	881	532	532
query94	578	308	291	291
query95	567	317	353	317
query96	559	465	201	201
query97	2303	2423	2297	2297
query98	216	202	192	192
query99	608	583	495	495
Total cold run time: 254953 ms
Total hot run time: 172810 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7322ac72d675f6188e8a4d81785c9abad680a493, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.12	0.11
query5	0.27	0.27	0.25
query6	1.14	0.68	0.65
query7	0.03	0.02	0.03
query8	0.05	0.04	0.04
query9	0.57	0.50	0.51
query10	0.56	0.55	0.54
query11	0.15	0.09	0.09
query12	0.15	0.11	0.11
query13	0.61	0.58	0.59
query14	0.96	0.95	0.95
query15	0.80	0.78	0.77
query16	0.40	0.42	0.43
query17	1.06	1.04	1.04
query18	0.23	0.21	0.22
query19	1.96	1.85	1.86
query20	0.03	0.02	0.01
query21	15.43	0.27	0.14
query22	5.32	0.06	0.04
query23	16.09	0.28	0.10
query24	0.95	0.37	0.61
query25	0.11	0.09	0.08
query26	0.14	0.13	0.13
query27	0.09	0.09	0.04
query28	4.08	1.06	0.88
query29	12.59	3.96	3.20
query30	0.28	0.14	0.12
query31	2.81	0.64	0.40
query32	3.23	0.58	0.46
query33	3.04	3.10	3.08
query34	16.76	5.12	4.48
query35	4.58	4.52	4.46
query36	0.65	0.48	0.51
query37	0.11	0.06	0.07
query38	0.08	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.14	0.14
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 97.71 s
Total hot run time: 27.14 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31668 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a6ba48326b0f23f448071e83b62c324b51ab6acc, data reload: false

------ Round 1 ----------------------------------
q1	17661	4290	4068	4068
q2	2019	372	245	245
q3	10142	1282	711	711
q4	10234	887	323	323
q5	7530	2111	1891	1891
q6	198	186	146	146
q7	935	794	658	658
q8	9271	1487	1226	1226
q9	5070	4651	4647	4647
q10	6803	1808	1403	1403
q11	522	293	296	293
q12	701	716	565	565
q13	17823	3889	3117	3117
q14	283	288	267	267
q15	591	507	507	507
q16	683	672	619	619
q17	691	850	538	538
q18	6851	6320	6361	6320
q19	1119	985	630	630
q20	411	363	245	245
q21	3119	2430	2286	2286
q22	1034	988	963	963
Total cold run time: 103691 ms
Total hot run time: 31668 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4109	4049	4112	4049
q2	347	405	333	333
q3	2070	2624	2223	2223
q4	1297	1738	1331	1331
q5	4150	3988	4013	3988
q6	219	176	135	135
q7	1887	1817	1892	1817
q8	2632	2450	2421	2421
q9	7277	7120	7091	7091
q10	2551	2697	2398	2398
q11	578	476	468	468
q12	725	754	655	655
q13	3663	4008	3445	3445
q14	292	360	330	330
q15	561	518	601	518
q16	770	672	624	624
q17	1149	1349	1395	1349
q18	8195	7584	7571	7571
q19	892	829	921	829
q20	2082	2157	2006	2006
q21	4815	4523	4578	4523
q22	1106	1081	1054	1054
Total cold run time: 51367 ms
Total hot run time: 49158 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172068 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a6ba48326b0f23f448071e83b62c324b51ab6acc, data reload: false

query5	4402	604	445	445
query6	334	221	206	206
query7	4209	452	260	260
query8	325	263	228	228
query9	8739	2656	2662	2656
query10	504	373	312	312
query11	15348	15217	14687	14687
query12	176	114	118	114
query13	1255	514	372	372
query14	6032	2973	2721	2721
query14_1	2661	2569	2583	2569
query15	211	193	173	173
query16	1000	462	460	460
query17	1104	651	569	569
query18	2413	425	330	330
query19	221	221	189	189
query20	122	113	113	113
query21	213	143	128	128
query22	3838	4115	3845	3845
query23	15967	15611	15356	15356
query23_1	15355	15570	15262	15262
query24	7420	1512	1172	1172
query24_1	1195	1161	1165	1161
query25	549	451	415	415
query26	1237	270	160	160
query27	2765	471	298	298
query28	4563	2175	2153	2153
query29	852	558	462	462
query30	316	243	213	213
query31	796	619	552	552
query32	83	71	70	70
query33	546	349	293	293
query34	900	885	532	532
query35	760	795	701	701
query36	838	850	771	771
query37	131	97	81	81
query38	2768	2683	2640	2640
query39	797	766	734	734
query39_1	725	723	703	703
query40	222	136	119	119
query41	75	67	69	67
query42	103	102	107	102
query43	477	475	406	406
query44	1337	728	721	721
query45	185	188	175	175
query46	883	978	585	585
query47	1444	1421	1376	1376
query48	319	335	242	242
query49	621	416	334	334
query50	653	278	204	204
query51	3797	3793	3756	3756
query52	106	108	101	101
query53	294	337	272	272
query54	303	264	263	263
query55	82	78	72	72
query56	305	300	303	300
query57	1040	1005	894	894
query58	261	262	252	252
query59	2087	2158	2065	2065
query60	334	329	302	302
query61	189	185	211	185
query62	376	345	300	300
query63	296	259	267	259
query64	5063	1284	999	999
query65	3747	3674	3696	3674
query66	1462	422	301	301
query67	15116	15790	15096	15096
query68	8252	988	704	704
query69	509	338	307	307
query70	1074	914	954	914
query71	365	295	270	270
query72	6150	3462	3456	3456
query73	764	729	299	299
query74	8791	8694	8570	8570
query75	2842	2810	2463	2463
query76	3802	1061	649	649
query77	520	371	277	277
query78	9740	10080	9110	9110
query79	1233	892	588	588
query80	603	562	482	482
query81	509	262	222	222
query82	207	144	108	108
query83	260	249	241	241
query84	262	110	100	100
query85	933	516	462	462
query86	398	324	319	319
query87	2893	2878	2709	2709
query88	3142	2226	2190	2190
query89	392	337	332	332
query90	2197	153	143	143
query91	164	162	141	141
query92	77	69	58	58
query93	1567	885	535	535
query94	571	318	295	295
query95	574	321	297	297
query96	583	468	202	202
query97	2300	2359	2275	2275
query98	234	195	198	195
query99	569	566	512	512
Total cold run time: 254420 ms
Total hot run time: 172068 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.77 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a6ba48326b0f23f448071e83b62c324b51ab6acc, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.04	0.04
query3	0.25	0.09	0.09
query4	1.61	0.12	0.11
query5	0.27	0.28	0.26
query6	1.15	0.66	0.64
query7	0.03	0.02	0.03
query8	0.05	0.04	0.05
query9	0.57	0.51	0.51
query10	0.56	0.55	0.54
query11	0.15	0.10	0.09
query12	0.14	0.11	0.11
query13	0.60	0.59	0.58
query14	0.95	0.96	0.95
query15	0.79	0.77	0.76
query16	0.40	0.39	0.40
query17	1.05	1.07	1.02
query18	0.23	0.21	0.22
query19	1.89	1.81	1.90
query20	0.02	0.02	0.01
query21	15.47	0.24	0.13
query22	5.39	0.05	0.05
query23	16.07	0.28	0.10
query24	1.52	0.24	0.18
query25	0.12	0.06	0.06
query26	0.14	0.14	0.13
query27	0.08	0.07	0.08
query28	4.17	1.06	0.88
query29	12.57	3.97	3.22
query30	0.27	0.13	0.12
query31	2.85	0.66	0.41
query32	3.25	0.58	0.45
query33	2.98	3.02	3.03
query34	16.79	5.12	4.44
query35	4.50	4.46	4.55
query36	0.66	0.49	0.48
query37	0.10	0.07	0.06
query38	0.07	0.04	0.04
query39	0.04	0.03	0.03
query40	0.17	0.14	0.13
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 98.24 s
Total hot run time: 26.77 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31533 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9477770d023185264488663053f27e3d9234d502, data reload: false

------ Round 1 ----------------------------------
q1	17589	4259	5122	4259
q2	2051	359	244	244
q3	10131	1272	752	752
q4	10218	848	311	311
q5	7521	2096	1788	1788
q6	186	169	136	136
q7	922	801	658	658
q8	9280	1434	1146	1146
q9	4902	4628	4600	4600
q10	6733	1764	1367	1367
q11	527	304	308	304
q12	724	731	596	596
q13	17796	3789	3107	3107
q14	294	297	271	271
q15	595	515	509	509
q16	713	693	630	630
q17	678	739	569	569
q18	6735	6319	6200	6200
q19	1230	982	601	601
q20	416	363	260	260
q21	3007	2374	2288	2288
q22	1030	996	937	937
Total cold run time: 103278 ms
Total hot run time: 31533 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4142	4037	4026	4026
q2	336	398	314	314
q3	2051	2566	2204	2204
q4	1311	1734	1292	1292
q5	4112	3984	4026	3984
q6	210	169	128	128
q7	1854	1770	1674	1674
q8	2860	2543	2423	2423
q9	7154	7261	7183	7183
q10	2658	2701	2265	2265
q11	537	473	463	463
q12	708	760	675	675
q13	3737	4125	3417	3417
q14	279	298	271	271
q15	562	518	526	518
q16	687	668	620	620
q17	1120	1282	1329	1282
q18	8116	7785	7849	7785
q19	891	842	872	842
q20	2021	2149	1989	1989
q21	4868	4518	4305	4305
q22	1106	1093	1040	1040
Total cold run time: 51320 ms
Total hot run time: 48700 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172799 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9477770d023185264488663053f27e3d9234d502, data reload: false

query5	5047	585	457	457
query6	344	241	233	233
query7	4239	459	275	275
query8	355	255	246	246
query9	8775	2669	2694	2669
query10	573	366	315	315
query11	15122	15101	15793	15101
query12	209	128	132	128
query13	1296	517	403	403
query14	6906	3187	2802	2802
query14_1	2759	2730	2709	2709
query15	210	209	186	186
query16	996	512	480	480
query17	1263	759	596	596
query18	2772	441	348	348
query19	221	226	201	201
query20	137	112	116	112
query21	210	133	114	114
query22	3909	4281	3925	3925
query23	16009	15408	15351	15351
query23_1	15540	15386	15320	15320
query24	7257	1555	1171	1171
query24_1	1198	1167	1209	1167
query25	559	515	380	380
query26	1223	265	143	143
query27	2744	435	283	283
query28	4486	2134	2126	2126
query29	777	519	434	434
query30	310	238	202	202
query31	796	624	548	548
query32	68	63	61	61
query33	539	319	278	278
query34	897	869	516	516
query35	724	777	718	718
query36	883	850	799	799
query37	124	92	71	71
query38	2703	2790	2609	2609
query39	770	773	705	705
query39_1	701	711	704	704
query40	211	130	111	111
query41	62	58	58	58
query42	101	101	100	100
query43	424	467	398	398
query44	1305	732	726	726
query45	184	179	175	175
query46	828	974	588	588
query47	1425	1428	1297	1297
query48	349	339	248	248
query49	595	437	332	332
query50	630	269	201	201
query51	3765	3736	3717	3717
query52	106	104	94	94
query53	290	326	267	267
query54	274	251	273	251
query55	78	72	70	70
query56	290	303	290	290
query57	993	1015	943	943
query58	280	249	256	249
query59	2032	2164	2073	2073
query60	310	312	286	286
query61	164	157	160	157
query62	371	365	322	322
query63	294	265	260	260
query64	4960	1304	1016	1016
query65	3743	3680	3671	3671
query66	1409	408	304	304
query67	14849	14972	15012	14972
query68	8421	983	723	723
query69	499	348	320	320
query70	1064	946	971	946
query71	376	306	285	285
query72	6083	3380	3448	3380
query73	775	736	307	307
query74	8782	8773	8502	8502
query75	2885	2792	2451	2451
query76	4054	1061	648	648
query77	667	381	279	279
query78	9669	9911	9128	9128
query79	1535	849	590	590
query80	606	589	486	486
query81	494	262	227	227
query82	395	148	112	112
query83	261	256	237	237
query84	250	134	105	105
query85	928	541	471	471
query86	354	327	285	285
query87	2842	2840	2778	2778
query88	4117	2246	2221	2221
query89	382	354	325	325
query90	2195	152	151	151
query91	175	175	144	144
query92	80	68	63	63
query93	1428	932	536	536
query94	558	320	301	301
query95	582	332	371	332
query96	596	471	218	218
query97	2361	2363	2272	2272
query98	230	198	204	198
query99	595	579	527	527
Total cold run time: 257065 ms
Total hot run time: 172799 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9477770d023185264488663053f27e3d9234d502, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.05	0.05
query3	0.27	0.09	0.09
query4	1.61	0.11	0.11
query5	0.27	0.27	0.26
query6	1.15	0.66	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.04
query9	0.56	0.50	0.50
query10	0.55	0.53	0.55
query11	0.15	0.10	0.10
query12	0.15	0.11	0.11
query13	0.61	0.58	0.58
query14	0.96	0.93	0.94
query15	0.80	0.77	0.79
query16	0.40	0.40	0.39
query17	1.09	1.04	1.04
query18	0.23	0.22	0.21
query19	1.93	1.89	1.88
query20	0.01	0.01	0.02
query21	15.44	0.25	0.14
query22	4.99	0.06	0.05
query23	15.81	0.29	0.10
query24	1.69	0.83	0.87
query25	0.12	0.06	0.08
query26	0.13	0.14	0.13
query27	0.09	0.07	0.08
query28	5.78	1.08	0.88
query29	12.56	3.92	3.14
query30	0.27	0.14	0.11
query31	2.83	0.64	0.40
query32	3.25	0.57	0.45
query33	3.00	2.98	3.11
query34	16.91	5.19	4.50
query35	4.60	4.45	4.44
query36	0.66	0.50	0.49
query37	0.10	0.07	0.06
query38	0.07	0.04	0.04
query39	0.04	0.03	0.03
query40	0.17	0.14	0.14
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 99.67 s
Total hot run time: 27.46 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run external

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 7, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31913 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 628a78b2f3d1c6286ab5a044285b4ab5b497b955, data reload: false

------ Round 1 ----------------------------------
q1	17572	4192	4038	4038
q2	2025	353	245	245
q3	10145	1252	717	717
q4	10228	931	328	328
q5	7571	2111	1903	1903
q6	187	175	140	140
q7	950	809	655	655
q8	9268	1411	1139	1139
q9	4791	4634	4567	4567
q10	6794	1807	1397	1397
q11	508	304	276	276
q12	674	747	568	568
q13	17782	3831	3107	3107
q14	294	300	268	268
q15	574	513	511	511
q16	688	658	636	636
q17	682	810	502	502
q18	6427	6405	6696	6405
q19	1146	1023	605	605
q20	416	369	284	284
q21	3249	2662	2575	2575
q22	1124	1162	1047	1047
Total cold run time: 103095 ms
Total hot run time: 31913 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4343	4375	4239	4239
q2	334	433	306	306
q3	2268	2772	2454	2454
q4	1399	1833	1387	1387
q5	4572	4324	4275	4275
q6	213	170	131	131
q7	2001	1912	2040	1912
q8	2582	2513	2323	2323
q9	7134	7229	7090	7090
q10	2540	2660	2259	2259
q11	524	471	457	457
q12	703	777	615	615
q13	3370	3766	3091	3091
q14	267	290	252	252
q15	527	481	478	478
q16	621	671	601	601
q17	1081	1197	1240	1197
q18	7275	7356	7090	7090
q19	824	767	798	767
q20	1890	1957	1829	1829
q21	4539	4286	4182	4182
q22	1093	1035	970	970
Total cold run time: 50100 ms
Total hot run time: 47905 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172115 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 628a78b2f3d1c6286ab5a044285b4ab5b497b955, data reload: false

query5	4426	576	435	435
query6	339	242	255	242
query7	4249	465	260	260
query8	338	261	269	261
query9	8783	2627	2636	2627
query10	490	371	337	337
query11	15229	14938	14955	14938
query12	180	116	112	112
query13	1251	481	381	381
query14	6261	2961	2777	2777
query14_1	2666	2636	2674	2636
query15	205	195	174	174
query16	998	463	443	443
query17	1119	682	569	569
query18	2554	453	340	340
query19	228	227	198	198
query20	124	118	117	117
query21	218	145	121	121
query22	3865	3851	3708	3708
query23	15932	15537	15355	15355
query23_1	15407	15654	15451	15451
query24	7363	1547	1179	1179
query24_1	1156	1150	1178	1150
query25	551	461	418	418
query26	1222	266	161	161
query27	2738	453	291	291
query28	4469	2133	2111	2111
query29	841	542	471	471
query30	301	248	208	208
query31	772	638	555	555
query32	79	69	66	66
query33	552	339	303	303
query34	884	864	531	531
query35	741	793	748	748
query36	855	875	851	851
query37	137	99	81	81
query38	2650	2691	2642	2642
query39	758	743	729	729
query39_1	702	712	724	712
query40	219	128	112	112
query41	65	60	63	60
query42	104	105	99	99
query43	425	443	409	409
query44	1317	725	713	713
query45	186	182	176	176
query46	845	951	586	586
query47	1403	1350	1386	1350
query48	312	315	235	235
query49	623	412	319	319
query50	629	278	196	196
query51	3765	3849	3764	3764
query52	106	106	92	92
query53	291	331	281	281
query54	289	264	241	241
query55	76	74	71	71
query56	284	290	290	290
query57	1013	995	902	902
query58	264	254	244	244
query59	2196	2043	2044	2043
query60	313	312	298	298
query61	154	156	159	156
query62	407	341	321	321
query63	297	263	272	263
query64	5056	1319	994	994
query65	3738	3720	3714	3714
query66	1405	436	307	307
query67	14597	14854	14731	14731
query68	8331	983	696	696
query69	502	358	325	325
query70	1102	981	970	970
query71	385	307	276	276
query72	6175	3429	3451	3429
query73	758	720	299	299
query74	8721	8750	8540	8540
query75	2883	2805	2454	2454
query76	3909	1068	635	635
query77	687	369	281	281
query78	9707	9599	9132	9132
query79	1593	904	586	586
query80	622	587	469	469
query81	502	258	234	234
query82	499	140	112	112
query83	263	256	240	240
query84	252	123	101	101
query85	959	498	441	441
query86	412	331	320	320
query87	2871	2858	2695	2695
query88	4266	2229	2238	2229
query89	385	361	325	325
query90	2161	157	149	149
query91	174	170	137	137
query92	83	68	64	64
query93	1515	898	522	522
query94	587	312	304	304
query95	561	329	304	304
query96	561	481	211	211
query97	2356	2357	2282	2282
query98	220	198	192	192
query99	630	599	496	496
Total cold run time: 256052 ms
Total hot run time: 172115 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 628a78b2f3d1c6286ab5a044285b4ab5b497b955, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.04	0.05
query3	0.26	0.10	0.09
query4	1.61	0.12	0.11
query5	0.27	0.26	0.25
query6	1.15	0.66	0.66
query7	0.04	0.03	0.03
query8	0.05	0.04	0.04
query9	0.56	0.50	0.50
query10	0.56	0.55	0.55
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.59	0.59
query14	0.94	0.95	0.95
query15	0.79	0.77	0.78
query16	0.41	0.40	0.40
query17	1.06	1.08	1.08
query18	0.24	0.22	0.21
query19	1.84	1.71	1.76
query20	0.02	0.01	0.01
query21	15.44	0.29	0.15
query22	4.97	0.05	0.04
query23	15.73	0.28	0.09
query24	1.04	0.74	0.53
query25	0.09	0.06	0.05
query26	0.14	0.13	0.13
query27	0.08	0.05	0.08
query28	4.03	1.05	0.87
query29	12.58	3.95	3.18
query30	0.28	0.13	0.12
query31	2.81	0.68	0.40
query32	3.28	0.56	0.46
query33	2.98	3.02	3.02
query34	16.98	5.14	4.47
query35	4.54	4.48	4.50
query36	0.66	0.50	0.49
query37	0.10	0.07	0.07
query38	0.08	0.05	0.04
query39	0.04	0.02	0.03
query40	0.16	0.15	0.13
query41	0.08	0.04	0.03
query42	0.04	0.04	0.04
query43	0.04	0.04	0.03
Total cold run time: 97.01 s
Total hot run time: 27.14 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@morningman morningman merged commit 4a0e4af into apache:master Jan 7, 2026
27 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 7, 2026
### What problem does this PR solve?

# Hudi Docker Environment

This directory contains the Docker Compose configuration for setting up
a Hudi test environment with Spark, Hive Metastore, MinIO (S3-compatible
storage), and PostgreSQL.

## Components

- **Spark**: Apache Spark 3.5.7 for processing Hudi tables
- **Hive Metastore**: Starburst Hive Metastore for table metadata
management
- **PostgreSQL**: Database backend for Hive Metastore
- **MinIO**: S3-compatible object storage for Hudi data files

## Important Configuration Parameters

### Container UID
- **Parameter**: `CONTAINER_UID` in `custom_settings.env`
- **Default**: `doris--`
- **Note**: Must be set to a unique value to avoid conflicts with other
Docker environments
- **Example**: `CONTAINER_UID="doris--bender--"`

### Port Configuration (`hudi.env.tpl`)
- `HIVE_METASTORE_PORT`: Port for Hive Metastore Thrift service
(default: 19083)
- `MINIO_API_PORT`: MinIO S3 API port (default: 19100)
- `MINIO_CONSOLE_PORT`: MinIO web console port (default: 19101)
- `SPARK_UI_PORT`: Spark web UI port (default: 18080)

### MinIO Credentials (`hudi.env.tpl`)
- `MINIO_ROOT_USER`: MinIO access key (default: `minio`)
- `MINIO_ROOT_PASSWORD`: MinIO secret key (default: `minio123`)
- `HUDI_BUCKET`: S3 bucket name for Hudi data (default: `datalake`)

### Version Compatibility
⚠️ **Important**: Hadoop versions must match Spark's built-in Hadoop
version
- **Spark Version**: 3.5.7 (uses Hadoop 3.3.4) - default build for Hudi
1.0.2
- **Hadoop AWS Version**: 3.3.4 (matching Spark's Hadoop)
- **Hudi Bundle Version**: 1.0.2 Spark 3.5 bundle (default build,
matches Spark 3.5.7, matches Doris's Hudi version to avoid versionCode
compatibility issues)
- **AWS SDK v1 Version**: 1.12.262 (required for Hadoop 3.3.4 S3A
support, 1.12.x series)
- **PostgreSQL JDBC Version**: 42.7.1 (compatible with Hive Metastore)
- **Hudi 1.0.x Compatibility**: Supports Spark 3.5.x (default), 3.4.x,
and 3.3.x

### JAR Dependencies (`hudi.env.tpl`)
All JAR file versions and URLs are configurable:
- `HUDI_BUNDLE_VERSION` / `HUDI_BUNDLE_URL`: Hudi Spark bundle
- `HADOOP_AWS_VERSION` / `HADOOP_AWS_URL`: Hadoop S3A filesystem support
- `AWS_SDK_BUNDLE_VERSION` / `AWS_SDK_BUNDLE_URL`: AWS Java SDK Bundle
v1 (required for Hadoop 3.3.4 S3A support, 1.12.x series)

**Note**: `hadoop-common` is already included in Spark's built-in Hadoop
distribution, so it's not configured here.
- `POSTGRESQL_JDBC_VERSION` / `POSTGRESQL_JDBC_URL`: PostgreSQL JDBC
driver

## Starting the Environment

```bash
# Start Hudi environment
./docker/thirdparties/run-thirdparties-docker.sh -c hudi

# Stop Hudi environment
./docker/thirdparties/run-thirdparties-docker.sh -c hudi --stop
```

## Adding Data

⚠️ **Important**: To ensure data consistency after Docker restarts,
**only use SQL scripts** to add data. Data added through `spark-sql`
interactive shell is temporary and will not persist after container
restart.

### Using SQL Scripts

Add new SQL files in `scripts/create_preinstalled_scripts/hudi/`
directory:
- Files are executed in alphabetical order (e.g.,
`01_config_and_database.sql`, `02_create_user_activity_log_tables.sql`,
etc.)
- Use descriptive names with numeric prefixes to control execution order
- Use environment variable substitution: `${HIVE_METASTORE_URIS}` and
`${HUDI_BUCKET}`
- **Data created through SQL scripts will persist after Docker restart**

Example: Create `08_create_custom_table.sql`:
```sql
USE regression_hudi;

CREATE TABLE IF NOT EXISTS my_hudi_table (
  id BIGINT,
  name STRING,
  created_at TIMESTAMP
) USING hudi
TBLPROPERTIES (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'created_at',
  hoodie.datasource.hive_sync.enable = 'true',
  hoodie.datasource.hive_sync.metastore.uris = '${HIVE_METASTORE_URIS}',
  hoodie.datasource.hive_sync.mode = 'hms'
)
LOCATION 's3a://${HUDI_BUCKET}/warehouse/regression_hudi/my_hudi_table';

INSERT INTO my_hudi_table VALUES
  (1, 'Alice', TIMESTAMP '2024-01-01 10:00:00'),
  (2, 'Bob', TIMESTAMP '2024-01-02 11:00:00');
```

After adding SQL files, restart the container to execute them:
```bash
docker restart doris--hudi-spark
```

## Creating Hudi Catalog in Doris

After starting the Hudi Docker environment, you can create a Hudi
catalog in Doris to access Hudi tables:

```sql
-- Create Hudi catalog
CREATE CATALOG IF NOT EXISTS hudi_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://<externalEnvIp>:19083',
    's3.endpoint' = 'http://<externalEnvIp>:19100',
    's3.access_key' = 'minio',
    's3.secret_key' = 'minio123',
    's3.region' = 'us-east-1',
    'use_path_style' = 'true'
);

-- Switch to Hudi catalog
SWITCH hudi_catalog;

-- Use database
USE regression_hudi;

-- Show tables
SHOW TABLES;

-- Query Hudi table
SELECT * FROM user_activity_log_cow_partition LIMIT 10;
```

**Configuration Parameters:**
- `hive.metastore.uris`: Hive Metastore Thrift service address (default
port: 19083)
- `s3.endpoint`: MinIO S3 API endpoint (default port: 19100)
- `s3.access_key`: MinIO access key (default: `minio`)
- `s3.secret_key`: MinIO secret key (default: `minio123`)
- `s3.region`: S3 region (default: `us-east-1`)
- `use_path_style`: Use path-style access for MinIO (required: `true`)

Replace `<externalEnvIp>` with your actual external environment IP
address (e.g., `127.0.0.1` for localhost).

## Debugging with Spark SQL

⚠️ **Note**: The methods below are for debugging purposes only. Data
created through `spark-sql` interactive shell will **not persist** after
Docker restart. To add persistent data, use SQL scripts as described in
the "Adding Data" section.

### 1. Connect to Spark Container

```bash
docker exec -it doris--hudi-spark bash
```

### 2. Start Spark SQL Interactive Shell

```bash
/opt/spark/bin/spark-sql \
  --master local[*] \
  --name hudi-debug \
  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
  --conf spark.sql.catalogImplementation=hive \
  --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
  --conf spark.sql.warehouse.dir=s3a://datalake/warehouse
```

### 3. Common Debugging Commands

```sql
-- Show databases
SHOW DATABASES;

-- Use database
USE regression_hudi;

-- Show tables
SHOW TABLES;

-- Describe table structure
DESCRIBE EXTENDED user_activity_log_cow_partition;

-- Query data
SELECT * FROM user_activity_log_cow_partition LIMIT 10;

-- Check Hudi table properties
SHOW TBLPROPERTIES user_activity_log_cow_partition;

-- View Spark configuration
SET -v;

-- Check Hudi-specific configurations
SET hoodie.datasource.write.hive_style_partitioning;
```

### 4. View Spark Web UI

Access Spark Web UI at: `http://localhost:18080` (or configured
`SPARK_UI_PORT`)

### 5. Check Container Logs

```bash
# View Spark container logs
docker logs doris--hudi-spark --tail 100 -f

# View Hive Metastore logs
docker logs doris--hudi-metastore --tail 100 -f

# View MinIO logs
docker logs doris--hudi-minio --tail 100 -f
```

### 6. Verify S3 Data

```bash
# Access MinIO console
# URL: http://localhost:19101 (or configured MINIO_CONSOLE_PORT)
# Username: minio (or MINIO_ROOT_USER)
# Password: minio123 (or MINIO_ROOT_PASSWORD)

# Or use MinIO client
docker exec -it doris--hudi-minio-mc mc ls myminio/datalake/warehouse/regression_hudi/
```

## Troubleshooting

### Container Exits Immediately
- Check logs: `docker logs doris--hudi-spark`
- Verify SUCCESS file exists: `docker exec doris--hudi-spark test -f
/opt/hudi-scripts/SUCCESS`
- Ensure Hive Metastore is running: `docker ps | grep metastore`

### ClassNotFoundException Errors
- Verify JAR files are downloaded: `docker exec doris--hudi-spark ls -lh
/opt/hudi-cache/`
- Check JAR versions match Spark's Hadoop version (3.3.4)
- Review `hudi.env.tpl` for correct version numbers

### S3A Connection Issues
- Verify MinIO is running: `docker ps | grep minio`
- Check MinIO credentials in `hudi.env.tpl`
- Test S3 connection: `docker exec doris--hudi-minio-mc mc ls myminio/`

### Hive Metastore Connection Issues
- Check Metastore is ready: `docker logs doris--hudi-metastore | grep
"Metastore is ready"`
- Verify PostgreSQL is running: `docker ps | grep metastore-db`
- Test connection: `docker exec doris--hudi-metastore-db pg_isready -U
hive`

## File Structure

```
hudi/
├── hudi.yaml.tpl          # Docker Compose template
├── hudi.env.tpl           # Environment variables template
├── scripts/
│   ├── init.sh            # Initialization script
│   ├── create_preinstalled_scripts/
│   │   └── hudi/          # SQL scripts (01_config_and_database.sql, 02_create_user_activity_log_tables.sql, ...)
│   └── SUCCESS            # Initialization marker (generated)
└── cache/                 # Downloaded JAR files (generated)
```

## Notes

- All generated files (`.yaml`, `.env`, `cache/`, `SUCCESS`) are ignored
by Git
- SQL scripts support environment variable substitution using
`${VARIABLE_NAME}` syntax
- Hadoop version compatibility is critical - must match Spark's built-in
version
- Container keeps running after initialization for healthcheck and
debugging

## Notes

- All generated files (`.yaml`, `.env`, `cache/`, `SUCCESS`) are ignored
by Git
- SQL scripts support environment variable substitution using
`${VARIABLE_NAME}` syntax
- Hadoop version compatibility is critical - must match Spark's built-in
version
- Container keeps running after initialization for healthcheck and
debugging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants