Skip to content

Commit 4f33129

Browse files
committed
some improvement for doc
1 parent baf331c commit 4f33129

File tree

2 files changed

+14
-7
lines changed

2 files changed

+14
-7
lines changed

website/docs/maintenance/filesystems/hdfs.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,12 @@ You can see the configuration keys in Hadoop's [`core-site.xml`](https://hadoop.
4040

4141
#### Hadoop Environment Configuration
4242

43-
To use the machine hadoop environment, instead of Fluss' embedded Hadoop, follow these steps:
43+
Fluss includes bundled Hadoop libraries with version 3.3.4 for deploying Fluss in machine without Hadoop installed.
44+
For most use cases, these work perfectly. However, you should configure your machine's native Hadoop environment if:
45+
1. Your HDFS uses kerberos security
46+
2. You need to avoid version conflicts between Fluss's bundled hadoop libraries and your HDFS cluster
47+
48+
Configuration Steps are following:
4449

4550
**Step 1: Set Hadoop Classpath**
4651
```bash

website/docs/maintenance/tiered-storage/lakehouse-storage.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,27 +52,29 @@ datalake.paimon.warehouse: /tmp/paimon_data_warehouse
5252
5353
Fluss processes Paimon configurations by removing the `datalake.paimon.` prefix and then use the remaining configuration (without the prefix `datalake.paimon.`) to create the Paimon catalog. Checkout the [Paimon documentation](https://paimon.apache.org/docs/1.1/maintenance/configurations/) for more details on the available configurations.
5454

55-
For example, to configure the use of a Hive catalog, you need to [download](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/hive/overview/#using-bundled-hive-jar) the Flink SQL Hive Client JAR, place the downloaded JAR in Paimon's plugin directory at $FLUSS_HOME/plugins/paimon, and then add the following configuration:
55+
For example, if you want to configure to use Hive catalog, you can configure like following:
5656
```yaml
5757
datalake.format: paimon
5858
datalake.paimon.metastore: hive
5959
datalake.paimon.uri: thrift://<hive-metastore-host-name>:<port>
6060
datalake.paimon.warehouse: hdfs:///path/to/warehouse
6161
```
62+
6263
#### Add other jars required by datalake
6364
While Fluss includes the core Paimon library, additional jars may still need to be manually added to `${FLUSS_HOME}/plugins/paimon/` according to your needs.
64-
For example, for OSS filesystem support, you need to put `paimon-oss-<paimon_version>.jar` into directory `${FLUSS_HOME}/plugins/paimon/`.
65-
66-
#### Hadoop Environment Configuration
65+
For example:
66+
- If you are using Paimon filesystem catalog with OSS filesystem, you need to put `paimon-oss-<paimon_version>.jar` into directory `${FLUSS_HOME}/plugins/paimon/`.
67+
- If you are using Hive catalog, you need to put [the flink sql hive connector jar](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/hive/overview/#using-bundled-hive-jar) into directory `${FLUSS_HOME}/plugins/paimon/`.
6768

68-
To use the machine hadoop environment, instead of Fluss' embedded Hadoop, follow these steps:
69+
#### Hadoop Environment Configuration(required for kerberos-secured HDFS)
70+
Other usage scenarios can skip this section.
6971

7072
**Step 1: Set Hadoop Classpath**
7173
```bash
7274
export HADOOP_CLASSPATH=`hadoop classpath`
7375
```
7476

75-
**Step 2: Add the following to your configuration file**
77+
**Step 2: Add the following to your `server.yaml` file**
7678
```yaml
7779
plugin.classloader.parent-first-patterns.default: java.,com.alibaba.fluss.,javax.annotation.,org.slf4j,org.apache.log4j,org.apache.logging,org.apache.commons.logging,ch.qos.logback,hdfs-site,core-site,org.apache.hadoop.,META-INF
7880
```

0 commit comments

Comments
 (0)