Skip to content

Commit 06869cf

Browse files
suxiaogang223zzzxl1993
authored andcommitted
[regression](hudi) Impl new Hudi Docker environment (apache#59401)
### What problem does this PR solve? # Hudi Docker Environment This directory contains the Docker Compose configuration for setting up a Hudi test environment with Spark, Hive Metastore, MinIO (S3-compatible storage), and PostgreSQL. ## Components - **Spark**: Apache Spark 3.5.7 for processing Hudi tables - **Hive Metastore**: Starburst Hive Metastore for table metadata management - **PostgreSQL**: Database backend for Hive Metastore - **MinIO**: S3-compatible object storage for Hudi data files ## Important Configuration Parameters ### Container UID - **Parameter**: `CONTAINER_UID` in `custom_settings.env` - **Default**: `doris--` - **Note**: Must be set to a unique value to avoid conflicts with other Docker environments - **Example**: `CONTAINER_UID="doris--bender--"` ### Port Configuration (`hudi.env.tpl`) - `HIVE_METASTORE_PORT`: Port for Hive Metastore Thrift service (default: 19083) - `MINIO_API_PORT`: MinIO S3 API port (default: 19100) - `MINIO_CONSOLE_PORT`: MinIO web console port (default: 19101) - `SPARK_UI_PORT`: Spark web UI port (default: 18080) ### MinIO Credentials (`hudi.env.tpl`) - `MINIO_ROOT_USER`: MinIO access key (default: `minio`) - `MINIO_ROOT_PASSWORD`: MinIO secret key (default: `minio123`) - `HUDI_BUCKET`: S3 bucket name for Hudi data (default: `datalake`) ### Version Compatibility ⚠️ **Important**: Hadoop versions must match Spark's built-in Hadoop version - **Spark Version**: 3.5.7 (uses Hadoop 3.3.4) - default build for Hudi 1.0.2 - **Hadoop AWS Version**: 3.3.4 (matching Spark's Hadoop) - **Hudi Bundle Version**: 1.0.2 Spark 3.5 bundle (default build, matches Spark 3.5.7, matches Doris's Hudi version to avoid versionCode compatibility issues) - **AWS SDK v1 Version**: 1.12.262 (required for Hadoop 3.3.4 S3A support, 1.12.x series) - **PostgreSQL JDBC Version**: 42.7.1 (compatible with Hive Metastore) - **Hudi 1.0.x Compatibility**: Supports Spark 3.5.x (default), 3.4.x, and 3.3.x ### JAR Dependencies (`hudi.env.tpl`) All JAR file versions and URLs are configurable: - `HUDI_BUNDLE_VERSION` / `HUDI_BUNDLE_URL`: Hudi Spark bundle - `HADOOP_AWS_VERSION` / `HADOOP_AWS_URL`: Hadoop S3A filesystem support - `AWS_SDK_BUNDLE_VERSION` / `AWS_SDK_BUNDLE_URL`: AWS Java SDK Bundle v1 (required for Hadoop 3.3.4 S3A support, 1.12.x series) **Note**: `hadoop-common` is already included in Spark's built-in Hadoop distribution, so it's not configured here. - `POSTGRESQL_JDBC_VERSION` / `POSTGRESQL_JDBC_URL`: PostgreSQL JDBC driver ## Starting the Environment ```bash # Start Hudi environment ./docker/thirdparties/run-thirdparties-docker.sh -c hudi # Stop Hudi environment ./docker/thirdparties/run-thirdparties-docker.sh -c hudi --stop ``` ## Adding Data ⚠️ **Important**: To ensure data consistency after Docker restarts, **only use SQL scripts** to add data. Data added through `spark-sql` interactive shell is temporary and will not persist after container restart. ### Using SQL Scripts Add new SQL files in `scripts/create_preinstalled_scripts/hudi/` directory: - Files are executed in alphabetical order (e.g., `01_config_and_database.sql`, `02_create_user_activity_log_tables.sql`, etc.) - Use descriptive names with numeric prefixes to control execution order - Use environment variable substitution: `${HIVE_METASTORE_URIS}` and `${HUDI_BUCKET}` - **Data created through SQL scripts will persist after Docker restart** Example: Create `08_create_custom_table.sql`: ```sql USE regression_hudi; CREATE TABLE IF NOT EXISTS my_hudi_table ( id BIGINT, name STRING, created_at TIMESTAMP ) USING hudi TBLPROPERTIES ( type = 'cow', primaryKey = 'id', preCombineField = 'created_at', hoodie.datasource.hive_sync.enable = 'true', hoodie.datasource.hive_sync.metastore.uris = '${HIVE_METASTORE_URIS}', hoodie.datasource.hive_sync.mode = 'hms' ) LOCATION 's3a://${HUDI_BUCKET}/warehouse/regression_hudi/my_hudi_table'; INSERT INTO my_hudi_table VALUES (1, 'Alice', TIMESTAMP '2024-01-01 10:00:00'), (2, 'Bob', TIMESTAMP '2024-01-02 11:00:00'); ``` After adding SQL files, restart the container to execute them: ```bash docker restart doris--hudi-spark ``` ## Creating Hudi Catalog in Doris After starting the Hudi Docker environment, you can create a Hudi catalog in Doris to access Hudi tables: ```sql -- Create Hudi catalog CREATE CATALOG IF NOT EXISTS hudi_catalog PROPERTIES ( 'type' = 'hms', 'hive.metastore.uris' = 'thrift://<externalEnvIp>:19083', 's3.endpoint' = 'http://<externalEnvIp>:19100', 's3.access_key' = 'minio', 's3.secret_key' = 'minio123', 's3.region' = 'us-east-1', 'use_path_style' = 'true' ); -- Switch to Hudi catalog SWITCH hudi_catalog; -- Use database USE regression_hudi; -- Show tables SHOW TABLES; -- Query Hudi table SELECT * FROM user_activity_log_cow_partition LIMIT 10; ``` **Configuration Parameters:** - `hive.metastore.uris`: Hive Metastore Thrift service address (default port: 19083) - `s3.endpoint`: MinIO S3 API endpoint (default port: 19100) - `s3.access_key`: MinIO access key (default: `minio`) - `s3.secret_key`: MinIO secret key (default: `minio123`) - `s3.region`: S3 region (default: `us-east-1`) - `use_path_style`: Use path-style access for MinIO (required: `true`) Replace `<externalEnvIp>` with your actual external environment IP address (e.g., `127.0.0.1` for localhost). ## Debugging with Spark SQL ⚠️ **Note**: The methods below are for debugging purposes only. Data created through `spark-sql` interactive shell will **not persist** after Docker restart. To add persistent data, use SQL scripts as described in the "Adding Data" section. ### 1. Connect to Spark Container ```bash docker exec -it doris--hudi-spark bash ``` ### 2. Start Spark SQL Interactive Shell ```bash /opt/spark/bin/spark-sql \ --master local[*] \ --name hudi-debug \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.catalogImplementation=hive \ --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \ --conf spark.sql.warehouse.dir=s3a://datalake/warehouse ``` ### 3. Common Debugging Commands ```sql -- Show databases SHOW DATABASES; -- Use database USE regression_hudi; -- Show tables SHOW TABLES; -- Describe table structure DESCRIBE EXTENDED user_activity_log_cow_partition; -- Query data SELECT * FROM user_activity_log_cow_partition LIMIT 10; -- Check Hudi table properties SHOW TBLPROPERTIES user_activity_log_cow_partition; -- View Spark configuration SET -v; -- Check Hudi-specific configurations SET hoodie.datasource.write.hive_style_partitioning; ``` ### 4. View Spark Web UI Access Spark Web UI at: `http://localhost:18080` (or configured `SPARK_UI_PORT`) ### 5. Check Container Logs ```bash # View Spark container logs docker logs doris--hudi-spark --tail 100 -f # View Hive Metastore logs docker logs doris--hudi-metastore --tail 100 -f # View MinIO logs docker logs doris--hudi-minio --tail 100 -f ``` ### 6. Verify S3 Data ```bash # Access MinIO console # URL: http://localhost:19101 (or configured MINIO_CONSOLE_PORT) # Username: minio (or MINIO_ROOT_USER) # Password: minio123 (or MINIO_ROOT_PASSWORD) # Or use MinIO client docker exec -it doris--hudi-minio-mc mc ls myminio/datalake/warehouse/regression_hudi/ ``` ## Troubleshooting ### Container Exits Immediately - Check logs: `docker logs doris--hudi-spark` - Verify SUCCESS file exists: `docker exec doris--hudi-spark test -f /opt/hudi-scripts/SUCCESS` - Ensure Hive Metastore is running: `docker ps | grep metastore` ### ClassNotFoundException Errors - Verify JAR files are downloaded: `docker exec doris--hudi-spark ls -lh /opt/hudi-cache/` - Check JAR versions match Spark's Hadoop version (3.3.4) - Review `hudi.env.tpl` for correct version numbers ### S3A Connection Issues - Verify MinIO is running: `docker ps | grep minio` - Check MinIO credentials in `hudi.env.tpl` - Test S3 connection: `docker exec doris--hudi-minio-mc mc ls myminio/` ### Hive Metastore Connection Issues - Check Metastore is ready: `docker logs doris--hudi-metastore | grep "Metastore is ready"` - Verify PostgreSQL is running: `docker ps | grep metastore-db` - Test connection: `docker exec doris--hudi-metastore-db pg_isready -U hive` ## File Structure ``` hudi/ ├── hudi.yaml.tpl # Docker Compose template ├── hudi.env.tpl # Environment variables template ├── scripts/ │ ├── init.sh # Initialization script │ ├── create_preinstalled_scripts/ │ │ └── hudi/ # SQL scripts (01_config_and_database.sql, 02_create_user_activity_log_tables.sql, ...) │ └── SUCCESS # Initialization marker (generated) └── cache/ # Downloaded JAR files (generated) ``` ## Notes - All generated files (`.yaml`, `.env`, `cache/`, `SUCCESS`) are ignored by Git - SQL scripts support environment variable substitution using `${VARIABLE_NAME}` syntax - Hadoop version compatibility is critical - must match Spark's built-in version - Container keeps running after initialization for healthcheck and debugging ## Notes - All generated files (`.yaml`, `.env`, `cache/`, `SUCCESS`) are ignored by Git - SQL scripts support environment variable substitution using `${VARIABLE_NAME}` syntax - Hadoop version compatibility is critical - must match Spark's built-in version - Container keeps running after initialization for healthcheck and debugging
1 parent d9747df commit 06869cf

File tree

62 files changed

+6937
-899
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+6937
-899
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,9 @@ lru_cache_test
137137
docker/thirdparties/docker-compose/*/data
138138
docker/thirdparties/docker-compose/*/logs
139139
docker/thirdparties/docker-compose/*/*.yaml
140+
docker/thirdparties/docker-compose/*/*.env
141+
docker/thirdparties/docker-compose/*/cache/
142+
docker/thirdparties/docker-compose/*/scripts/SUCCESS
140143
docker/runtime/be/resource/apache-doris/
141144

142145
# other
Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either implied. See the License for the specific
16+
language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Hudi Docker Environment
21+
22+
This directory contains the Docker Compose configuration for setting up a Hudi test environment with Spark, Hive Metastore, MinIO (S3-compatible storage), and PostgreSQL.
23+
24+
## Components
25+
26+
- **Spark**: Apache Spark 3.5.7 for processing Hudi tables
27+
- **Hive Metastore**: Starburst Hive Metastore for table metadata management
28+
- **PostgreSQL**: Database backend for Hive Metastore
29+
- **MinIO**: S3-compatible object storage for Hudi data files
30+
31+
## Important Configuration Parameters
32+
33+
### Container UID
34+
- **Parameter**: `CONTAINER_UID` in `custom_settings.env`
35+
- **Default**: `doris--`
36+
- **Note**: Must be set to a unique value to avoid conflicts with other Docker environments
37+
- **Example**: `CONTAINER_UID="doris--bender--"`
38+
39+
### Port Configuration (`hudi.env.tpl`)
40+
- `HIVE_METASTORE_PORT`: Port for Hive Metastore Thrift service (default: 19083)
41+
- `MINIO_API_PORT`: MinIO S3 API port (default: 19100)
42+
- `MINIO_CONSOLE_PORT`: MinIO web console port (default: 19101)
43+
- `SPARK_UI_PORT`: Spark web UI port (default: 18080)
44+
45+
### MinIO Credentials (`hudi.env.tpl`)
46+
- `MINIO_ROOT_USER`: MinIO access key (default: `minio`)
47+
- `MINIO_ROOT_PASSWORD`: MinIO secret key (default: `minio123`)
48+
- `HUDI_BUCKET`: S3 bucket name for Hudi data (default: `datalake`)
49+
50+
### Version Compatibility
51+
⚠️ **Important**: Hadoop versions must match Spark's built-in Hadoop version
52+
- **Spark Version**: 3.5.7 (uses Hadoop 3.3.4) - default build for Hudi 1.0.2
53+
- **Hadoop AWS Version**: 3.3.4 (matching Spark's Hadoop)
54+
- **Hudi Bundle Version**: 1.0.2 Spark 3.5 bundle (default build, matches Spark 3.5.7, matches Doris's Hudi version to avoid versionCode compatibility issues)
55+
- **AWS SDK v1 Version**: 1.12.262 (required for Hadoop 3.3.4 S3A support, 1.12.x series)
56+
- **PostgreSQL JDBC Version**: 42.7.1 (compatible with Hive Metastore)
57+
- **Hudi 1.0.x Compatibility**: Supports Spark 3.5.x (default), 3.4.x, and 3.3.x
58+
59+
### JAR Dependencies (`hudi.env.tpl`)
60+
All JAR file versions and URLs are configurable:
61+
- `HUDI_BUNDLE_VERSION` / `HUDI_BUNDLE_URL`: Hudi Spark bundle
62+
- `HADOOP_AWS_VERSION` / `HADOOP_AWS_URL`: Hadoop S3A filesystem support
63+
- `AWS_SDK_BUNDLE_VERSION` / `AWS_SDK_BUNDLE_URL`: AWS Java SDK Bundle v1 (required for Hadoop 3.3.4 S3A support, 1.12.x series)
64+
65+
**Note**: `hadoop-common` is already included in Spark's built-in Hadoop distribution, so it's not configured here.
66+
- `POSTGRESQL_JDBC_VERSION` / `POSTGRESQL_JDBC_URL`: PostgreSQL JDBC driver
67+
68+
## Starting the Environment
69+
70+
```bash
71+
# Start Hudi environment
72+
./docker/thirdparties/run-thirdparties-docker.sh -c hudi
73+
74+
# Stop Hudi environment
75+
./docker/thirdparties/run-thirdparties-docker.sh -c hudi --stop
76+
```
77+
78+
## Adding Data
79+
80+
⚠️ **Important**: To ensure data consistency after Docker restarts, **only use SQL scripts** to add data. Data added through `spark-sql` interactive shell is temporary and will not persist after container restart.
81+
82+
### Using SQL Scripts
83+
84+
Add new SQL files in `scripts/create_preinstalled_scripts/hudi/` directory:
85+
- Files are executed in alphabetical order (e.g., `01_config_and_database.sql`, `02_create_user_activity_log_tables.sql`, etc.)
86+
- Use descriptive names with numeric prefixes to control execution order
87+
- Use environment variable substitution: `${HIVE_METASTORE_URIS}` and `${HUDI_BUCKET}`
88+
- **Data created through SQL scripts will persist after Docker restart**
89+
90+
Example: Create `08_create_custom_table.sql`:
91+
```sql
92+
USE regression_hudi;
93+
94+
CREATE TABLE IF NOT EXISTS my_hudi_table (
95+
id BIGINT,
96+
name STRING,
97+
created_at TIMESTAMP
98+
) USING hudi
99+
TBLPROPERTIES (
100+
type = 'cow',
101+
primaryKey = 'id',
102+
preCombineField = 'created_at',
103+
hoodie.datasource.hive_sync.enable = 'true',
104+
hoodie.datasource.hive_sync.metastore.uris = '${HIVE_METASTORE_URIS}',
105+
hoodie.datasource.hive_sync.mode = 'hms'
106+
)
107+
LOCATION 's3a://${HUDI_BUCKET}/warehouse/regression_hudi/my_hudi_table';
108+
109+
INSERT INTO my_hudi_table VALUES
110+
(1, 'Alice', TIMESTAMP '2024-01-01 10:00:00'),
111+
(2, 'Bob', TIMESTAMP '2024-01-02 11:00:00');
112+
```
113+
114+
After adding SQL files, restart the container to execute them:
115+
```bash
116+
docker restart doris--hudi-spark
117+
```
118+
119+
## Creating Hudi Catalog in Doris
120+
121+
After starting the Hudi Docker environment, you can create a Hudi catalog in Doris to access Hudi tables:
122+
123+
```sql
124+
-- Create Hudi catalog
125+
CREATE CATALOG IF NOT EXISTS hudi_catalog PROPERTIES (
126+
'type' = 'hms',
127+
'hive.metastore.uris' = 'thrift://<externalEnvIp>:19083',
128+
's3.endpoint' = 'http://<externalEnvIp>:19100',
129+
's3.access_key' = 'minio',
130+
's3.secret_key' = 'minio123',
131+
's3.region' = 'us-east-1',
132+
'use_path_style' = 'true'
133+
);
134+
135+
-- Switch to Hudi catalog
136+
SWITCH hudi_catalog;
137+
138+
-- Use database
139+
USE regression_hudi;
140+
141+
-- Show tables
142+
SHOW TABLES;
143+
144+
-- Query Hudi table
145+
SELECT * FROM user_activity_log_cow_partition LIMIT 10;
146+
```
147+
148+
**Configuration Parameters:**
149+
- `hive.metastore.uris`: Hive Metastore Thrift service address (default port: 19083)
150+
- `s3.endpoint`: MinIO S3 API endpoint (default port: 19100)
151+
- `s3.access_key`: MinIO access key (default: `minio`)
152+
- `s3.secret_key`: MinIO secret key (default: `minio123`)
153+
- `s3.region`: S3 region (default: `us-east-1`)
154+
- `use_path_style`: Use path-style access for MinIO (required: `true`)
155+
156+
Replace `<externalEnvIp>` with your actual external environment IP address (e.g., `127.0.0.1` for localhost).
157+
158+
## Debugging with Spark SQL
159+
160+
⚠️ **Note**: The methods below are for debugging purposes only. Data created through `spark-sql` interactive shell will **not persist** after Docker restart. To add persistent data, use SQL scripts as described in the "Adding Data" section.
161+
162+
### 1. Connect to Spark Container
163+
164+
```bash
165+
docker exec -it doris--hudi-spark bash
166+
```
167+
168+
### 2. Start Spark SQL Interactive Shell
169+
170+
```bash
171+
/opt/spark/bin/spark-sql \
172+
--master local[*] \
173+
--name hudi-debug \
174+
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
175+
--conf spark.sql.catalogImplementation=hive \
176+
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
177+
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
178+
--conf spark.sql.warehouse.dir=s3a://datalake/warehouse
179+
```
180+
181+
### 3. Common Debugging Commands
182+
183+
```sql
184+
-- Show databases
185+
SHOW DATABASES;
186+
187+
-- Use database
188+
USE regression_hudi;
189+
190+
-- Show tables
191+
SHOW TABLES;
192+
193+
-- Describe table structure
194+
DESCRIBE EXTENDED user_activity_log_cow_partition;
195+
196+
-- Query data
197+
SELECT * FROM user_activity_log_cow_partition LIMIT 10;
198+
199+
-- Check Hudi table properties
200+
SHOW TBLPROPERTIES user_activity_log_cow_partition;
201+
202+
-- View Spark configuration
203+
SET -v;
204+
205+
-- Check Hudi-specific configurations
206+
SET hoodie.datasource.write.hive_style_partitioning;
207+
```
208+
209+
### 4. View Spark Web UI
210+
211+
Access Spark Web UI at: `http://localhost:18080` (or configured `SPARK_UI_PORT`)
212+
213+
### 5. Check Container Logs
214+
215+
```bash
216+
# View Spark container logs
217+
docker logs doris--hudi-spark --tail 100 -f
218+
219+
# View Hive Metastore logs
220+
docker logs doris--hudi-metastore --tail 100 -f
221+
222+
# View MinIO logs
223+
docker logs doris--hudi-minio --tail 100 -f
224+
```
225+
226+
### 6. Verify S3 Data
227+
228+
```bash
229+
# Access MinIO console
230+
# URL: http://localhost:19101 (or configured MINIO_CONSOLE_PORT)
231+
# Username: minio (or MINIO_ROOT_USER)
232+
# Password: minio123 (or MINIO_ROOT_PASSWORD)
233+
234+
# Or use MinIO client
235+
docker exec -it doris--hudi-minio-mc mc ls myminio/datalake/warehouse/regression_hudi/
236+
```
237+
238+
## Troubleshooting
239+
240+
### Container Exits Immediately
241+
- Check logs: `docker logs doris--hudi-spark`
242+
- Verify SUCCESS file exists: `docker exec doris--hudi-spark test -f /opt/hudi-scripts/SUCCESS`
243+
- Ensure Hive Metastore is running: `docker ps | grep metastore`
244+
245+
### ClassNotFoundException Errors
246+
- Verify JAR files are downloaded: `docker exec doris--hudi-spark ls -lh /opt/hudi-cache/`
247+
- Check JAR versions match Spark's Hadoop version (3.3.4)
248+
- Review `hudi.env.tpl` for correct version numbers
249+
250+
### S3A Connection Issues
251+
- Verify MinIO is running: `docker ps | grep minio`
252+
- Check MinIO credentials in `hudi.env.tpl`
253+
- Test S3 connection: `docker exec doris--hudi-minio-mc mc ls myminio/`
254+
255+
### Hive Metastore Connection Issues
256+
- Check Metastore is ready: `docker logs doris--hudi-metastore | grep "Metastore is ready"`
257+
- Verify PostgreSQL is running: `docker ps | grep metastore-db`
258+
- Test connection: `docker exec doris--hudi-metastore-db pg_isready -U hive`
259+
260+
## File Structure
261+
262+
```
263+
hudi/
264+
├── hudi.yaml.tpl # Docker Compose template
265+
├── hudi.env.tpl # Environment variables template
266+
├── scripts/
267+
│ ├── init.sh # Initialization script
268+
│ ├── create_preinstalled_scripts/
269+
│ │ └── hudi/ # SQL scripts (01_config_and_database.sql, 02_create_user_activity_log_tables.sql, ...)
270+
│ └── SUCCESS # Initialization marker (generated)
271+
└── cache/ # Downloaded JAR files (generated)
272+
```
273+
274+
## Notes
275+
276+
- All generated files (`.yaml`, `.env`, `cache/`, `SUCCESS`) are ignored by Git
277+
- SQL scripts support environment variable substitution using `${VARIABLE_NAME}` syntax
278+
- Hadoop version compatibility is critical - must match Spark's built-in version
279+
- Container keeps running after initialization for healthcheck and debugging
280+

docker/thirdparties/docker-compose/hudi/hadoop.env

Lines changed: 0 additions & 52 deletions
This file was deleted.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
CONTAINER_UID=doris--
19+
HUDI_NETWORK=${CONTAINER_UID}hudi-network
20+
21+
# Ports exposed to host
22+
HIVE_METASTORE_PORT=19083
23+
MINIO_API_PORT=19100
24+
MINIO_CONSOLE_PORT=19101
25+
SPARK_UI_PORT=18080
26+
27+
# MinIO credentials/buckets
28+
MINIO_ROOT_USER=minio
29+
MINIO_ROOT_PASSWORD=minio123
30+
HUDI_BUCKET=datalake
31+
32+
# Hudi bundle
33+
# Hudi 1.0.2 supports Spark 3.5.x (default), 3.4.x, and 3.3.x
34+
# Using Spark 3.5 bundle to match Spark 3.5.7 image (default build)
35+
HUDI_BUNDLE_VERSION=1.0.2
36+
HUDI_BUNDLE_URL=https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.0.2/hudi-spark3.5-bundle_2.12-1.0.2.jar
37+
38+
# Hadoop AWS S3A filesystem (required for S3A support)
39+
# Note: Version must match Spark's built-in Hadoop version (3.3.4 for Spark 3.5.7)
40+
HADOOP_AWS_VERSION=3.3.4
41+
HADOOP_AWS_URL=https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar
42+
43+
# AWS Java SDK Bundle v1 (required for Hadoop 3.3.4 S3A support)
44+
# Note: Hadoop 3.3.x uses AWS SDK v1, version 1.12.x is recommended
45+
AWS_SDK_BUNDLE_VERSION=1.12.262
46+
AWS_SDK_BUNDLE_URL=https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.262/aws-java-sdk-bundle-1.12.262.jar
47+
48+
# PostgreSQL JDBC driver (required for Hive Metastore connection)
49+
POSTGRESQL_JDBC_VERSION=42.7.1
50+
POSTGRESQL_JDBC_URL=https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.1/postgresql-42.7.1.jar

0 commit comments

Comments
 (0)