Skip to content

Commit 3740dff

Browse files
committed
[filesystem] Support use Hadoop dependencies from environment variables HADOOP_CLASSPATH
1 parent b2f5a6e commit 3740dff

File tree

4 files changed

+60
-0
lines changed

4 files changed

+60
-0
lines changed

fluss-dist/src/main/resources/bin/config.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,12 @@ constructFlussClassPath() {
2828
else
2929
FLUSS_CLASSPATH="$FLUSS_CLASSPATH":"$jarfile"
3030
fi
31+
32+
# Add Hadoop dependencies from environment variables HADOOP_CLASSPATH
33+
if [ -n "${HADOOP_CLASSPATH}" ]; then
34+
FLUSS_CLASSPATH="$FLUSS_CLASSPATH":"$HADOOP_CLASSPATH"
35+
fi
36+
3137
done < <(find "$FLUSS_LIB_DIR" ! -type d -name '*.jar' -print0 | sort -z)
3238

3339
local FLUSS_SERVER_COUNT
@@ -131,6 +137,7 @@ KEY_ENV_SSH_OPTS="env.ssh.opts"
131137
KEY_ZK_HEAP_MB="zookeeper.heap.mb"
132138

133139
KEY_REMOTE_DATA_DIR="remote.data.dir"
140+
KEY_ENV_HADOOP_CLASSPATH="env.hadoop.class-path"
134141

135142
########################################################################################################################
136143
# PATHS AND CONFIG
@@ -285,6 +292,10 @@ if [ -z "${REMOTE_DATA_DIR}" ]; then
285292
REMOTE_DATA_DIR=$(readFromConfig ${KEY_REMOTE_DATA_DIR} "" "${YAML_CONF}")
286293
fi
287294

295+
if [ -z "${HADOOP_CLASSPATH}" ]; then
296+
HADOOP_CLASSPATH=$(readFromConfig ${KEY_ENV_HADOOP_CLASSPATH} "" "${YAML_CONF}")
297+
fi
298+
288299
# Arguments for the JVM. Used for Coordinator server and Tablet server JVMs.
289300
if [ -z "${JVM_ARGS}" ]; then
290301
JVM_ARGS=""

fluss-server/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,12 @@
131131
<include>*:*</include>
132132
</includes>
133133
</artifactSet>
134+
<relocations>
135+
<relocation>
136+
<pattern>org.apache.commons</pattern>
137+
<shadedPattern>org.apache.fluss.shaded.org.apache.commons</shadedPattern>
138+
</relocation>
139+
</relocations>
134140
</configuration>
135141
</execution>
136142
</executions>

website/docs/maintenance/filesystems/hdfs.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,20 @@ remote.data.dir: hdfs://namenode:50010/path/to/remote/storage
3636
To allow for easy adoption, you can use the same configuration keys in Fluss' server.yaml as in Hadoop's `core-site.xml`.
3737
You can see the configuration keys in Hadoop's [`core-site.xml`](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml).
3838

39+
#### Hadoop Environment Configuration
40+
41+
To use the machine hadoop environment, instead of Fluss' embedded Hadoop, follow these steps:
42+
43+
**Step 1: Set Hadoop Classpath**
44+
```bash
45+
export HADOOP_CLASSPATH=`hadoop classpath`
46+
```
47+
48+
**Step 2: Add the following to your configuration file**
49+
```yaml
50+
plugin.classloader.parent-first-patterns.default: java.,com.alibaba.fluss.,javax.annotation.,org.slf4j,org.apache.log4j,org.apache.logging,org.apache.commons.logging,ch.qos.logback,hdfs-site,core-site,org.apache.hadoop.,META-INF
51+
```
52+
3953
4054
4155

website/docs/maintenance/tiered-storage/lakehouse-storage.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,35 @@ datalake.paimon.metastore: filesystem
4747
datalake.paimon.warehouse: /tmp/paimon_data_warehouse
4848
```
4949
50+
#### Hadoop Environment Configuration
51+
52+
To use the machine hadoop environment, instead of Fluss' embedded Hadoop, follow these steps:
53+
54+
**Step 1: Set Hadoop Classpath**
55+
```bash
56+
export HADOOP_CLASSPATH=`hadoop classpath`
57+
```
58+
59+
**Step 2: Add the following to your configuration file**
60+
```yaml
61+
plugin.classloader.parent-first-patterns.default: java.,com.alibaba.fluss.,javax.annotation.,org.slf4j,org.apache.log4j,org.apache.logging,org.apache.commons.logging,ch.qos.logback,hdfs-site,core-site,org.apache.hadoop.,META-INF
62+
```
63+
64+
#### Hive Catalog Configuration
65+
66+
To use Hive as the metastore, follow these steps:
67+
68+
**Step 1: Add Hive Connector Dependency**
69+
[Download](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/hive/overview/#using-bundled-hive-jar) the Flink SQL Hive Client JAR.Place the downloaded JAR in Paimon's plugin directory:
70+
`$PAIMON_HOME/plugins/hive`.
71+
72+
**Step 2: Add the following to your configuration file**
73+
```yaml
74+
datalake.paimon.metastore: hive
75+
# this is recommended in the kerberos environment
76+
datalake.paimon.hive-conf-dir: '...',
77+
```
78+
5079
### Start The Datalake Tiering Service
5180
Then, you must start the datalake tiering service to compact Fluss's data to the lakehouse storage.
5281
To start the datalake tiering service, you must have a Flink cluster running since Fluss currently only supports Flink as a tiering service backend.

0 commit comments

Comments
 (0)