Skip to content

Commit f886bf7

Browse files
committed
[filesystem] Support use Hadoop dependencies from environment variables HADOOP_CLASSPATH
1 parent 6adb2d4 commit f886bf7

File tree

4 files changed

+60
-0
lines changed

4 files changed

+60
-0
lines changed

fluss-dist/src/main/resources/bin/config.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@ constructFlussClassPath() {
3030
else
3131
FLUSS_CLASSPATH="$FLUSS_CLASSPATH":"$jarfile"
3232
fi
33+
34+
# Add Hadoop dependencies from environment variables HADOOP_CLASSPATH
35+
if [ -n "${HADOOP_CLASSPATH}" ]; then
36+
FLUSS_CLASSPATH="$FLUSS_CLASSPATH":"$HADOOP_CLASSPATH"
37+
fi
38+
3339
done < <(find "$FLUSS_LIB_DIR" ! -type d -name '*.jar' -print0 | sort -z)
3440

3541
local FLUSS_SERVER_COUNT
@@ -133,6 +139,7 @@ KEY_ENV_SSH_OPTS="env.ssh.opts"
133139
KEY_ZK_HEAP_MB="zookeeper.heap.mb"
134140

135141
KEY_REMOTE_DATA_DIR="remote.data.dir"
142+
KEY_ENV_HADOOP_CLASSPATH="env.hadoop.class-path"
136143

137144
########################################################################################################################
138145
# PATHS AND CONFIG
@@ -287,6 +294,10 @@ if [ -z "${REMOTE_DATA_DIR}" ]; then
287294
REMOTE_DATA_DIR=$(readFromConfig ${KEY_REMOTE_DATA_DIR} "" "${YAML_CONF}")
288295
fi
289296

297+
if [ -z "${HADOOP_CLASSPATH}" ]; then
298+
HADOOP_CLASSPATH=$(readFromConfig ${KEY_ENV_HADOOP_CLASSPATH} "" "${YAML_CONF}")
299+
fi
300+
290301
# Arguments for the JVM. Used for Coordinator server and Tablet server JVMs.
291302
if [ -z "${JVM_ARGS}" ]; then
292303
JVM_ARGS=""

fluss-server/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,12 @@
133133
<include>*:*</include>
134134
</includes>
135135
</artifactSet>
136+
<relocations>
137+
<relocation>
138+
<pattern>org.apache.commons</pattern>
139+
<shadedPattern>org.apache.fluss.shaded.org.apache.commons</shadedPattern>
140+
</relocation>
141+
</relocations>
136142
</configuration>
137143
</execution>
138144
</executions>

website/docs/maintenance/filesystems/hdfs.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,20 @@ remote.data.dir: hdfs://namenode:50010/path/to/remote/storage
3838
To allow for easy adoption, you can use the same configuration keys in Fluss' server.yaml as in Hadoop's `core-site.xml`.
3939
You can see the configuration keys in Hadoop's [`core-site.xml`](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml).
4040

41+
#### Hadoop Environment Configuration
42+
43+
To use the machine hadoop environment, instead of Fluss' embedded Hadoop, follow these steps:
44+
45+
**Step 1: Set Hadoop Classpath**
46+
```bash
47+
export HADOOP_CLASSPATH=`hadoop classpath`
48+
```
49+
50+
**Step 2: Add the following to your configuration file**
51+
```yaml
52+
plugin.classloader.parent-first-patterns.default: java.,com.alibaba.fluss.,javax.annotation.,org.slf4j,org.apache.log4j,org.apache.logging,org.apache.commons.logging,ch.qos.logback,hdfs-site,core-site,org.apache.hadoop.,META-INF
53+
```
54+
4155
4256
4357

website/docs/maintenance/tiered-storage/lakehouse-storage.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,35 @@ datalake.paimon.warehouse: hdfs:///path/to/warehouse
6363
While Fluss includes the core Paimon library, additional jars may still need to be manually added to `${FLUSS_HOME}/plugins/paimon/` according to your needs.
6464
For example, for OSS filesystem support, you need to put `paimon-oss-<paimon_version>.jar` into directory `${FLUSS_HOME}/plugins/paimon/`.
6565

66+
#### Hadoop Environment Configuration
67+
68+
To use the machine hadoop environment, instead of Fluss' embedded Hadoop, follow these steps:
69+
70+
**Step 1: Set Hadoop Classpath**
71+
```bash
72+
export HADOOP_CLASSPATH=`hadoop classpath`
73+
```
74+
75+
**Step 2: Add the following to your configuration file**
76+
```yaml
77+
plugin.classloader.parent-first-patterns.default: java.,com.alibaba.fluss.,javax.annotation.,org.slf4j,org.apache.log4j,org.apache.logging,org.apache.commons.logging,ch.qos.logback,hdfs-site,core-site,org.apache.hadoop.,META-INF
78+
```
79+
80+
#### Hive Catalog Configuration
81+
82+
To use Hive as the metastore, follow these steps:
83+
84+
**Step 1: Add Hive Connector Dependency**
85+
[Download](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/hive/overview/#using-bundled-hive-jar) the Flink SQL Hive Client JAR.Place the downloaded JAR in Paimon's plugin directory:
86+
`$PAIMON_HOME/plugins/hive`.
87+
88+
**Step 2: Add the following to your configuration file**
89+
```yaml
90+
datalake.paimon.metastore: hive
91+
# this is recommended in the kerberos environment
92+
datalake.paimon.hive-conf-dir: '...',
93+
```
94+
6695
### Start The Datalake Tiering Service
6796
Then, you must start the datalake tiering service to tier Fluss's data to the lakehouse storage.
6897
#### Prerequisites

0 commit comments

Comments
 (0)