You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/getting_started/first_steps.adoc
+17-8Lines changed: 17 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= First steps
2
+
:description: Deploy and verify an HDFS cluster with Stackable by setting up Zookeeper and HDFS components, then test file operations using WebHDFS API.
2
3
3
-
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies. Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS.
4
+
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies.
5
+
Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS.
4
6
5
7
== Setup
6
8
@@ -11,7 +13,8 @@ To deploy a Zookeeper cluster create one file called `zk.yaml`:
11
13
[source,yaml]
12
14
include::example$getting_started/zk.yaml[]
13
15
14
-
We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper. Create another file called `znode.yaml`:
16
+
We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper.
An HDFS cluster has three components: the `namenode`, the `datanode` and the `journalnode`. Create a file named `hdfs.yaml` defining 2 `namenodes` and one `datanode` and `journalnode` each:
34
+
An HDFS cluster has three components: the `namenode`, the `datanode` and the `journalnode`.
35
+
Create a file named `hdfs.yaml` defining 2 `namenodes` and one `datanode` and `journalnode` each:
- `metadata.name` contains the name of the HDFS cluster
41
-
- the HDFS version in the Docker image provided by Stackable must be set in `spec.image.productVersion`
44
+
* `metadata.name` contains the name of the HDFS cluster
45
+
* the HDFS version in the Docker image provided by Stackable must be set in `spec.image.productVersion`
42
46
43
-
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache HDFS. You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator. For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop%2Ftags[image registry].
47
+
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache HDFS.
48
+
You can optionally specify the `spec.image.stackableVersion` to a certain release like `24.7.0` but it is recommended to leave it out and use the default provided by the operator.
49
+
For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop%2Ftags[image registry].
44
50
It should generally be safe to simply use the latest image version that is available.
45
51
46
52
Create the actual HDFS cluster by applying the file:
To test the cluster you can create a new file, check its status and then delete it. We will execute these actions from within a helper pod. Create a file called `webhdfs.yaml`:
66
+
To test the cluster operation, create a new file, check its status and then delete it.
67
+
You can execute these actions from within a helper Pod.
68
+
Create a file called `webhdfs.yaml`:
61
69
62
70
[source,yaml]
63
71
----
@@ -75,7 +83,8 @@ To begin with the cluster should be empty: this can be verified by listing all
Creating a file in HDFS using the https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File[Webhdfs API] requires a two-step `PUT` (the reason for having a two-step create/append is to prevent clients from sending out data before the redirect). First, create a file with some text in it called `testdata.txt` and copy it to the `tmp` directory on the helper pod:
86
+
Creating a file in HDFS using the https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File[Webhdfs API] requires a two-step `PUT` (the reason for having a two-step create/append is to prevent clients from sending out data before the redirect).
87
+
First, create a file with some text in it called `testdata.txt` and copy it to the `tmp` directory on the helper pod:
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/getting_started/index.adoc
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= Getting started
2
+
:description: Start with HDFS using the Stackable Operator. Install the Operator, set up your HDFS cluster, and verify its operation with this guide.
2
3
3
-
This guide will get you started with HDFS using the Stackable Operator. It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
4
+
This guide will get you started with HDFS using the Stackable Operator.
5
+
It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/index.adoc
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
= Stackable Operator for Apache HDFS
2
-
:description: The Stackable Operator for Apache HDFS is a Kubernetes operator that can manage Apache HDFS clusters. Learn about its features, resources, dependencies and demos, and see the list of supported HDFS versions.
2
+
:description: Manage Apache HDFS with the Stackable Operator for Kubernetes. Set up clusters, configure roles, and explore demos and supported versions.
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc
+20-14Lines changed: 20 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,21 +1,22 @@
1
-
2
1
= Configuration & Environment Overrides
2
+
:description: Override HDFS config properties and environment variables per role or role group. Manage settings like DNS cache and environment variables efficiently.
The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
5
6
6
-
IMPORTANT: Overriding certain properties can lead to faulty clusters. In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.
7
+
IMPORTANT: Overriding certain properties can lead to faulty clusters.
8
+
In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.
7
9
8
10
== Configuration Properties
9
11
10
12
For a role or role group, at the same level of `config`, you can specify `configOverrides` for the following files:
11
13
12
-
- `hdfs-site.xml`
13
-
- `core-site.xml`
14
-
- `hadoop-policy.xml`
15
-
- `ssl-server.xml`
16
-
- `ssl-client.xml`
17
-
- `security.properties`
18
-
14
+
* `hdfs-site.xml`
15
+
* `core-site.xml`
16
+
* `hadoop-policy.xml`
17
+
* `ssl-server.xml`
18
+
* `ssl-client.xml`
19
+
* `security.properties`
19
20
20
21
For example, if you want to set additional properties on the namenode servers, adapt the `nameNodes` section of the cluster resource like so:
21
22
@@ -51,13 +52,17 @@ nameNodes:
51
52
52
53
All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.
53
54
54
-
For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]
55
+
For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml].
55
56
56
57
=== The security.properties file
57
58
58
-
The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
59
+
The `security.properties` file is used to configure JVM security properties.
60
+
It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
59
61
60
-
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.3.4 HDFS performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Hbase queries you can configure the TTL of entries in the positive cache like this:
62
+
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
63
+
Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
64
+
As of version 3.3.4 HDFS performs poorly if the positive cache is disabled.
65
+
To cache resolved host names, and thus speeding up Hbase queries you can configure the TTL of entries in the positive cache like this:
61
66
62
67
[source,yaml]
63
68
----
@@ -80,12 +85,13 @@ The JVM manages it's own cache of successfully resolved host names as well as a
80
85
81
86
NOTE: The operator configures DNS caching by default as shown in the example above.
82
87
83
-
For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html
88
+
For details on the JVM security consult the {java-security-overview}[Java Security overview documentation].
84
89
85
90
86
91
== Environment Variables
87
92
88
-
In a similar fashion, environment variables can be (over)written. For example per role group:
93
+
In a similar fashion, environment variables can be (over)written.
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/usage-guide/fuse.adoc
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
= FUSE
2
+
:description: Use HDFS FUSE driver to mount HDFS filesystems into Linux environments via a Kubernetes Pod with necessary privileges and configurations.
2
3
3
4
Our images of Apache Hadoop do contain the necessary binaries and libraries to use the HDFS FUSE driver.
4
5
5
6
FUSE is short for _Filesystem in Userspace_ and allows a user to export a filesystem into the Linux kernel, which can then be mounted.
6
7
HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment.
7
8
8
9
To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod.
9
-
This pod, however, will need some extra capabilities.
10
+
This Pod, however, will need some extra capabilities.
10
11
11
-
This is an example pod that will work _as long as the host system that is running the kubelet does support FUSE_:
12
+
This is an example Pod that will work _as long as the host system that is running the kubelet does support FUSE_:
:description: Learn to configure and use the Stackable Operator for Apache HDFS. Ensure basic setup knowledge from the Getting Started guide before proceeding.
2
3
:page-aliases: ROOT:usage.adoc
3
4
4
-
This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
5
+
This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways.
6
+
You should already be familiar with how to set up a basic instance.
7
+
Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/usage-guide/listenerclass.adoc
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= Service exposition with ListenerClasses
2
+
:description: Configure HDFS service exposure using ListenerClasses to control internal and external access for DataNodes and NameNodes.
2
3
3
-
The operator deploys a xref:listener-operator:listener.adoc[Listener] for each DataNode and NameNode pod. They both default to only being accessible from within the Kubernetes cluster, but this can be changed by setting `.spec.{data,name}Nodes.config.listenerClass`.
4
+
The operator deploys a xref:listener-operator:listener.adoc[Listener] for each DataNode and NameNode pod.
5
+
They both default to only being accessible from within the Kubernetes cluster, but this can be changed by setting `.spec.{data,name}Nodes.config.listenerClass`.
4
6
5
7
Note that JournalNodes are not accessible from outside the Kubernetes cluster.
Copy file name to clipboardExpand all lines: docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
= Logging & log aggregation
2
+
:description: The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent.
2
3
3
-
The logs can be forwarded to a Vector log aggregator by providing a discovery
4
-
ConfigMap for the aggregator and by enabling the log agent:
4
+
The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:
:description: The HDFS cluster can be monitored with Prometheus from inside or outside the K8S cluster.
2
3
3
4
The cluster can be monitored with Prometheus from inside or outside the K8S cluster.
4
5
5
-
All services (with the exception of the Zookeeper daemon on the node names) run with the JMX exporter agent enabled and expose metrics on the `metrics` port. This port is available from the container level up to the NodePort services.
6
+
All services (with the exception of the Zookeeper daemon on the node names) run with the JMX exporter agent enabled and expose metrics on the `metrics` port.
7
+
This port is available from the container level up to the NodePort services.
6
8
7
-
The metrics endpoints are also used as liveliness probes by K8S.
9
+
The metrics endpoints are also used as liveliness probes by Kubernetes.
8
10
9
11
See xref:operators:monitoring.adoc[] for more details.
0 commit comments