Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/modules/hive/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
= First steps
:description: Deploy and verify a Hive metastore cluster with PostgreSQL and MinIO. Follow our setup guide and ensure all pods are ready for operation.
:description: Deploy and verify a Hive metastore cluster with PostgreSQL and MinIO. Follow the setup guide and ensure all pods are ready for operation.

After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Hive metastore cluster and it's dependencies.
Afterwards you can <<_verify_that_it_works, verify that it works>>.
After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, deploy a Hive metastore cluster and it's dependencies.
Afterward you can <<_verify_that_it_works, verify that it works>>.

== Setup

Expand Down
6 changes: 3 additions & 3 deletions docs/modules/hive/pages/getting_started/index.adoc
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
= Getting started
:description: Learn to set up Apache Hive with the Stackable Operator. Includes installation, dependencies, and creating a Hive metastore on Kubernetes.

This guide will get you started with Apache Hive using the Stackable Operator.
It will guide you through the installation of the operator, its dependencies and setting up your first Hive metastore instance.
This guide gets you started with Apache Hive using the Stackable Operator.
It guides you through the installation of the operator, its dependencies and setting up your first Hive metastore instance.

== Prerequisites

You will need:
You need:

* a Kubernetes cluster
* kubectl
Expand Down
13 changes: 6 additions & 7 deletions docs/modules/hive/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Installation
:description: Install Stackable Operator for Apache Hive with MinIO and PostgreSQL using stackablectl or Helm. Follow our guide for easy setup and configuration.
:description: Install Stackable Operator for Apache Hive with MinIO and PostgreSQL using stackablectl or Helm. Follow the guide for easy setup and configuration.

On this page you will install the Stackable Operator for Apache Hive and all required dependencies.
On this page you install the Stackable operator for Apache Hive and all required dependencies.
For the installation of the dependencies and operators you can use Helm or `stackablectl`.

The `stackablectl` command line tool is the recommended way to interact with operators and dependencies.
Expand All @@ -10,7 +10,7 @@ Follow the xref:management:stackablectl:installation.adoc[installation steps] fo
== Dependencies

First you need to install MinIO and PostgreSQL instances for the Hive metastore.
PostgreSQL is required as a database for Hive's metadata, and MinIO will be used as a data store, which the Hive metastore also needs access to.
PostgreSQL is required as a database for Hive's metadata, and MinIO is used as a data store, which the Hive metastore also needs access to.

There are two ways to install the dependencies:

Expand Down Expand Up @@ -66,7 +66,7 @@ Now call `stackablectl` and reference those two files:
include::example$getting_started/getting_started.sh[tag=stackablectl-install-minio-postgres-stack]
----

This will install MinIO and PostgreSQL as defined in the Stacks, as well as the Operators.
This installs MinIO and PostgreSQL as defined in the Stacks, as well as the operators.
You can now skip the <<Stackable Operators>> step that follows next.

TIP: Consult the xref:management:stackablectl:quickstart.adoc[Quickstart] to learn more about how to use `stackablectl`.
Expand Down Expand Up @@ -107,7 +107,7 @@ Run the following command to install all operators necessary for Apache Hive:
include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators]
----

The tool will show
The tool prints

[source]
----
Expand All @@ -132,8 +132,7 @@ Then install the Stackable operators:
include::example$getting_started/getting_started.sh[tag=helm-install-operators]
----

Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the CRDs for the required operators).
You are now ready to deploy the Apache Hive metastore in Kubernetes.
Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the CRDs for the required operators).

== What's next

Expand Down
2 changes: 1 addition & 1 deletion docs/modules/hive/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This operator does not support deploying Hive itself, but xref:trino:index.adoc[

== Getting started

Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable Hive operator and its dependencies.
Follow the xref:getting_started/index.adoc[Getting started guide] which guides you through the installation of the Stackable Hive operator and its dependencies.
It walks you through setting up a Hive metastore and connecting it to a demo Postgres database and a Minio instance to store data in.

Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your Hive metastore configuration to your needs, or have a look at the <<demos, demos>> for some example setups with either xref:trino:index.adoc[Trino] or xref:spark-k8s:index.adoc[Spark].
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/hive/pages/reference/commandline-parameters.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ stackable-hive-operator run --product-config /foo/bar/properties.yaml

*Multiple values:* false

If provided the operator will **only** watch for resources in the provided namespace.
If not provided it will watch in **all** namespaces.
If provided, the operator **only** watches for resources in the provided namespace.
If not provided, it watches in **all** namespaces.

.Example: Only watch the `test` namespace
[source,bash]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ docker run \

*Multiple values:* false

The operator will **only** watch for resources in the provided namespace `test`:
The operator **only** watches for resources in the provided namespace `test`:

[source]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,19 @@ metastore:
replicas: 1
----

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.
All override property values must be strings.
The properties are formatted and escaped correctly into the XML file.

For a full list of configuration options we refer to the Hive https://cwiki.apache.org/confluence/display/hive/configuration+properties[Configuration Reference].

== The security.properties file

The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
The `security.properties` file is used to configure JVM security properties.
It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.

The JVM manages its own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. To cache resolved host names, you can configure the TTL of entries in the positive cache like this:
The JVM manages its own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled.
To cache resolved host names, you can configure the TTL of entries in the positive cache like this:

[source,yaml]
----
Expand All @@ -64,9 +68,10 @@ NOTE: The operator configures DNS caching by default as shown in the example abo
For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html


== Environment Variables
== Environment variables

In a similar fashion, environment variables can be (over)written. For example per role group:
In a similar fashion, environment variables can be (over)written.
For example per role group:

[source,yaml]
----
Expand All @@ -91,3 +96,8 @@ metastore:
config: {}
replicas: 1
----

== Pod overrides

The Hive operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod.
Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature.
2 changes: 1 addition & 1 deletion docs/modules/hive/pages/usage-guide/database-driver.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ spec:
mountPath: /stackable/externals
----

This will make the driver available at `/stackable/external-drivers/mysql-connector-j-8.0.31.jar` when the volume `external-drivers` is mounted at `/stackable/external-drivers`.
This makes the driver available at `/stackable/external-drivers/mysql-connector-j-8.0.31.jar` when the volume `external-drivers` is mounted at `/stackable/external-drivers`.

Once the above has completed successfully, you can confirm that the driver is in the expected location by running another job:

Expand Down
6 changes: 3 additions & 3 deletions docs/modules/hive/pages/usage-guide/derby-example.adoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
= Derby example
:description: Deploy a single-node Apache Hive Metastore with Derby or PostgreSQL. Includes setup for S3 integration and tips for database configuration.

Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown.
The version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown.
This Stackable version is the version of the underlying container image which is used to execute the processes.
For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhive%2Ftags[image registry].
For a list of available versions check the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhive%2Ftags[image registry].
It should generally be safe to simply use the latest image version that is available.

.Create a single node Apache Hive Metastore cluster using Derby:
Expand Down Expand Up @@ -123,7 +123,7 @@ This is called `scram-sha-256` and has been the default as of PostgreSQL 14.
Unfortunately, Hive up until the latest 3.3.x version ships with JDBC drivers that do https://wiki.postgresql.org/wiki/List_of_drivers[_not_ support] this method.
You might see an error message like this:
`The authentication type 10 is not supported.`
If this is the case please either use an older PostgreSQL version or change its https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION[`password_encryption`] setting to `md5`.
If this is the case, either use an older PostgreSQL version or change its https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION[`password_encryption`] setting to `md5`.

This installs PostgreSQL in version 10 to work around the issue mentioned above:
[source,bash]
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/hive/pages/usage-guide/index.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Usage guide
:page-aliases: usage.adoc

This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways.
This Section helps you to use and configure the Stackable operator for Apache Hive in various ways.
You should already be familiar with how to set up a basic instance.
Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ You can configure the graceful shutdown as described in xref:concepts:operations

As a default, Hive metastores have `5 minutes` to shut down gracefully.

The Hive metastore process will receive a `SIGTERM` signal when Kubernetes wants to terminate the Pod.
After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes will issue a `SIGKILL` signal.
The Hive metastore process receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod.
After the graceful shutdown timeout runs out, and the process is still running, Kubernetes issues a `SIGKILL` signal.

However, there is no acknowledge message in the log indicating a graceful shutdown.
2 changes: 1 addition & 1 deletion docs/modules/hive/pages/usage-guide/operations/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

This section of the documentation is intended for the operations teams that maintain a Stackable Data Platform installation.

Please read the xref:concepts:operations/index.adoc[Concepts page on Operations] that contains the necessary details to operate the platform in a production environment.
Read the xref:concepts:operations/index.adoc[Concepts page on Operations] that contains the necessary details to operate the platform in a production environment.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

You can configure the permitted Pod disruptions for Hive nodes as described in xref:concepts:operations/pod_disruptions.adoc[].

Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs:
Unless you configure something else or disable the default PodDisruptionBudgets (PDBs), the operator writes the following PDBs:

== Metastores
We only allow a single metastore to be offline at any given time, regardless of the number of replicas or `roleGroups`.
Allow only a single metastore to be offline at any given time, regardless of the number of replicas or `roleGroups`.
8 changes: 5 additions & 3 deletions docs/modules/hive/pages/usage-guide/security.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,17 @@ Additionally, you need a service-user which the secret-operator uses to create p
The next step is to enter all the necessary information into a SecretClass, as described in xref:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation]. The following guide assumes you have named your SecretClass `kerberos`.

=== 3. Configure HDFS to use SecretClass
The next step is to configure your HdfsCluster to use the newly created SecretClass. Please follow the xref:hdfs:usage-guide/security.adoc[HDFS security guide] to set up and test this.
Please make sure to use the SecretClass named `kerberos`. It is also necessary to configure 2 additional things in HDFS:
The next step is to configure your HdfsCluster to use the newly created SecretClass.
Follow the xref:hdfs:usage-guide/security.adoc[HDFS security guide] to set up and test this.
Make sure to use the SecretClass named `kerberos`.
It is also necessary to configure 2 additional things in HDFS:

* Define group mappings for users with `hadoop.user.group.static.mapping.overrides`
* Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any _direct_ access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting `hadoop.proxyuser.hive.users=*` and `hadoop.proxyuser.hive.hosts=*` to allow the user `hive` to impersonate all other users.

An example of the above can be found in this https://github.com/stackabletech/hive-operator/blob/main/tests/templates/kuttl/kerberos-hdfs/30-install-hdfs.yaml.j2[integration test].

NOTE: This is only relevant if HDFS is used with the Hive metastore (many installations will use the metastore with an S3 backend instead of HDFS).
NOTE: This is only relevant if HDFS is used with the Hive metastore (many installations use the metastore with an S3 backend instead of HDFS).

=== 4. Configure Hive to use SecretClass
The last step is to configure the same SecretClass for Hive, which is done similarly to HDFS.
Expand Down
Loading