From f1d0beb4c39877d8118004a7158d2b1bfb59fbc8 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 16 Feb 2024 16:02:45 +0100 Subject: [PATCH 01/10] doc(kerberos): first draft --- .../spark-k8s/pages/usage-guide/kerberos.adoc | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc diff --git a/docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc b/docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc new file mode 100644 index 00000000..bc2b3fd5 --- /dev/null +++ b/docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc @@ -0,0 +1,117 @@ += Kerberos + +Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. It is used in Spark to authenticate users and to secure communication between Spark components. + +In this guide we show how to configure Spark applications to use a Kerberos. The Stackable Secret Operator is used to generate the keytab files. In production environments, the users might have different means to provision the keytab files. + +== Prerequisites + +It is assumed that you have a KDC server running in your cluster and that the Stackable Secret Operator is configured to provision the keytab files. + +This guide makes use of a SecretClass named `kerberos-default`. It is assumed that this class exists and is configured with a `kerberosBackend` as described in xref:secret-operator:secretclass.adoc#backend-kerberoskeytab[]. + +== Steps + +There are three steps to configure a Spark application to use Kerberos: +1. Provision the Spark driver end executor pods with the keytab and `krb5.conf` files. +2. Provision the Spark `job` pod with the keytab and `krb5.conf` files. +3. Instruct the Spark application to use Kerberos. + +=== Driver and Executor pods + +Install the keytab and the `krb5.conf` files in the Spark pods. The keytab file contains the credentials of the user that is used to authenticate with the Kerberos server. The `krb5.conf` file contains the configuration settings for the Kerberos client. + +In the example below, the Stackable Secret Operator is used to provision the keytab via a volume claim. The `krb5.conf` file is mounted as a ConfigMap. + +[source,yaml] +---- +... +driver: + config: + volumeMounts: + - name: kerberos + mountPath: /stackable/kerberos <1> +executor: + config: + volumeMounts: + - name: kerberos + mountPath: /stackable/kerberos <2> +volumes: + - name: kerberos-config + configMap: + name: krb5-kdc <3> + - name: kerberos + ephemeral: + volumeClaimTemplate: + metadata: + annotations: + secrets.stackable.tech/class: kerberos-default <4> + secrets.stackable.tech/scope: service=spark-teragen <5> + secrets.stackable.tech/kerberos.service.names: testuser <6> + spec: + storageClassName: secrets.stackable.tech + accessModes: + - ReadWriteOnce + resources: + requests: + storage: "1" +---- + +<1> Mount the `kerberos` volume in the driver pod. +<2> Mount the `kerberos` volume in the executor pods. +<3> Mount the `krb5.conf` file as a ConfigMap. +<4> Name of the Secret class used to provision the keytab. +<5> Scope of the Secret. +<6> Name of the user for which the keytab is provisioned. + + +=== Job pod + +Install the keytab and the `krb5.conf` files in the Spark `job` pod. This must be currently done via pod overrides. This is because the Spark application volumes are not currently visible to the `job` pod. We hope to address this limitation in a future release. + +[source,yaml] +---- +job: + podOverrides: + spec: + volumes: + - name: kerberos-config + configMap: + name: krb5-kdc + - name: kerberos + ephemeral: + volumeClaimTemplate: + metadata: + annotations: + secrets.stackable.tech/class: kerberos-default + secrets.stackable.tech/scope: service=spark-teragen + secrets.stackable.tech/kerberos.service.names: testuser + spec: + storageClassName: secrets.stackable.tech + accessModes: + - ReadWriteOnce + resources: + requests: + storage: "1" + containers: + - name: spark-submit + volumeMounts: + - name: kerberos + mountPath: /stackable/kerberos +---- + +=== Spark application + +Instruct the Spark application to use Kerberos by setting the `spark.kerberos.keytab` and `spark.kerberos.principal` properties in the `SparkApplication` CRD. + +Finally instruct Spark to use the keytab and `krb5.conf` files provisioned in the previous steps. + +[source,yaml] +---- +sparkConf: + "spark.kerberos.keytab": "/stackable/kerberos/keytab" + "spark.kerberos.principal": "testuser/spark-teragen.default.svc.cluster.local@CLUSTER.LOCAL" + "spark.driver.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" + "spark.executor.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" +---- + From 10ca2eea1db671a0c0d8279ce8ab889eb70f02eb Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 16 Feb 2024 16:06:23 +0100 Subject: [PATCH 02/10] fix: add logging and kerberos to nav.adoc --- docs/modules/spark-k8s/partials/nav.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/modules/spark-k8s/partials/nav.adoc b/docs/modules/spark-k8s/partials/nav.adoc index ce2ac493..09ce5d0b 100644 --- a/docs/modules/spark-k8s/partials/nav.adoc +++ b/docs/modules/spark-k8s/partials/nav.adoc @@ -6,6 +6,8 @@ ** xref:spark-k8s:usage-guide/job-dependencies.adoc[] ** xref:spark-k8s:usage-guide/resources.adoc[] ** xref:spark-k8s:usage-guide/s3.adoc[] +** xref:spark-k8s:usage-guide/kerberos.adoc[] +** xref:spark-k8s:usage-guide/logging.adoc[] ** xref:spark-k8s:usage-guide/history-server.adoc[] ** xref:spark-k8s:usage-guide/examples.adoc[] ** xref:spark-k8s:usage-guide/operations/index.adoc[] @@ -17,4 +19,4 @@ *** {crd-docs}/spark.stackable.tech/sparkapplication/v1alpha1/[SparkApplication {external-link-icon}^] *** {crd-docs}/spark.stackable.tech/sparkhistoryserver/v1alpha1/[SparkHistoryServer {external-link-icon}^] ** xref:spark-k8s:reference/commandline-parameters.adoc[] -** xref:spark-k8s:reference/environment-variables.adoc[] \ No newline at end of file +** xref:spark-k8s:reference/environment-variables.adoc[] From 77dc351ea7badd668dcd75f269c5a74d980694f0 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 16 Feb 2024 16:10:48 +0100 Subject: [PATCH 03/10] fix: rename kerberos.adoc to security.adoc --- .../pages/usage-guide/{kerberos.adoc => security.adoc} | 3 ++- docs/modules/spark-k8s/partials/nav.adoc | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) rename docs/modules/spark-k8s/pages/usage-guide/{kerberos.adoc => security.adoc} (97%) diff --git a/docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc similarity index 97% rename from docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc rename to docs/modules/spark-k8s/pages/usage-guide/security.adoc index bc2b3fd5..9b4af60f 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/kerberos.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -13,7 +13,8 @@ This guide makes use of a SecretClass named `kerberos-default`. It is assumed th == Steps There are three steps to configure a Spark application to use Kerberos: -1. Provision the Spark driver end executor pods with the keytab and `krb5.conf` files. + +1. Provision the Spark `driver` end `executor` pods with the keytab and `krb5.conf` files. 2. Provision the Spark `job` pod with the keytab and `krb5.conf` files. 3. Instruct the Spark application to use Kerberos. diff --git a/docs/modules/spark-k8s/partials/nav.adoc b/docs/modules/spark-k8s/partials/nav.adoc index 09ce5d0b..2fb175f1 100644 --- a/docs/modules/spark-k8s/partials/nav.adoc +++ b/docs/modules/spark-k8s/partials/nav.adoc @@ -6,7 +6,7 @@ ** xref:spark-k8s:usage-guide/job-dependencies.adoc[] ** xref:spark-k8s:usage-guide/resources.adoc[] ** xref:spark-k8s:usage-guide/s3.adoc[] -** xref:spark-k8s:usage-guide/kerberos.adoc[] +** xref:spark-k8s:usage-guide/security.adoc[] ** xref:spark-k8s:usage-guide/logging.adoc[] ** xref:spark-k8s:usage-guide/history-server.adoc[] ** xref:spark-k8s:usage-guide/examples.adoc[] From 6b6cefc416d0057ab8c7c50d9975382331e89cb8 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 16 Feb 2024 16:27:57 +0100 Subject: [PATCH 04/10] more security docs --- .../spark-k8s/pages/usage-guide/security.adoc | 38 +++++++++++++------ 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index 9b4af60f..e1a408fa 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -1,14 +1,21 @@ -= Kerberos += Security + +== Authentication + +Currently the only supported authentication mechanism is Kerberos, which is disabled by default. Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. It is used in Spark to authenticate users and to secure communication between Spark components. In this guide we show how to configure Spark applications to use a Kerberos. The Stackable Secret Operator is used to generate the keytab files. In production environments, the users might have different means to provision the keytab files. + == Prerequisites -It is assumed that you have a KDC server running in your cluster and that the Stackable Secret Operator is configured to provision the keytab files. +It is assumed that you have a KDC server running in your cluster and that the Stackable Secret Operator is configured to provision the keytab files as described in xref:home:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation]. + +If the Spark application processes data from a kerberized Hadoop cluster, follow the xref:hdfs-operator:usage-guide:security.adoc[HDFS operator guide] to configure HDFS with Kerberos. -This guide makes use of a SecretClass named `kerberos-default`. It is assumed that this class exists and is configured with a `kerberosBackend` as described in xref:secret-operator:secretclass.adoc#backend-kerberoskeytab[]. +This guide makes use of a SecretClass named `kerberos-default`. It is assumed that this class exists and is configured with a `kerberosBackend`. == Steps @@ -57,9 +64,8 @@ volumes: requests: storage: "1" ---- - -<1> Mount the `kerberos` volume in the driver pod. -<2> Mount the `kerberos` volume in the executor pods. +<1> Mount the keytab volume in the driver pod. +<2> Mount the keytab volume in the executor pods. <3> Mount the `krb5.conf` file as a ConfigMap. <4> Name of the Secret class used to provision the keytab. <5> Scope of the Secret. @@ -78,15 +84,15 @@ job: volumes: - name: kerberos-config configMap: - name: krb5-kdc + name: krb5-kdc <1> - name: kerberos ephemeral: volumeClaimTemplate: metadata: annotations: - secrets.stackable.tech/class: kerberos-default - secrets.stackable.tech/scope: service=spark-teragen - secrets.stackable.tech/kerberos.service.names: testuser + secrets.stackable.tech/class: kerberos-default <2> + secrets.stackable.tech/scope: service=spark-teragen <3> + secrets.stackable.tech/kerberos.service.names: testuser <4> spec: storageClassName: secrets.stackable.tech accessModes: @@ -97,9 +103,16 @@ job: containers: - name: spark-submit volumeMounts: - - name: kerberos + - name: kerberos <5> mountPath: /stackable/kerberos ---- +<1> Mount the `krb5.conf` file as a ConfigMap. +<2> Name of the Secret class used to provision the keytab. +<3> Scope of the Secret. +<4> Name of the user for which the keytab is provisioned. +<5> Mount the keytab volume in the job pod. + + === Spark application @@ -116,3 +129,6 @@ sparkConf: "spark.executor.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" ---- +=== Hadoop + +TODO: where is the kerberized HDFS discovery config map coming from ? From 44aa08af43592d62a52706e9ecd227ad5ee8dc26 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 16 Feb 2024 16:31:56 +0100 Subject: [PATCH 05/10] fix typos and grammar --- docs/modules/spark-k8s/pages/usage-guide/security.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index e1a408fa..bb623556 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -6,7 +6,7 @@ Currently the only supported authentication mechanism is Kerberos, which is disa Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. It is used in Spark to authenticate users and to secure communication between Spark components. -In this guide we show how to configure Spark applications to use a Kerberos. The Stackable Secret Operator is used to generate the keytab files. In production environments, the users might have different means to provision the keytab files. +In this guide we show how to configure Spark applications to use Kerberos. The Stackable Secret Operator is used to generate the keytab files. In production environments, users might have different means to provision the keytab files. == Prerequisites From d340f8459518e5c685f533e1f0505f0013d5f21c Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 16 Feb 2024 17:23:55 +0100 Subject: [PATCH 06/10] fix: language tool lints --- docs/modules/spark-k8s/pages/usage-guide/security.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index bb623556..3b8e4e4c 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -2,9 +2,9 @@ == Authentication -Currently the only supported authentication mechanism is Kerberos, which is disabled by default. +Currently, the only supported authentication mechanism is Kerberos, which is disabled by default. -Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. It is used in Spark to authenticate users and to secure communication between Spark components. +Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another securely. It is used in Spark to authenticate users and to secure communication between Spark components. In this guide we show how to configure Spark applications to use Kerberos. The Stackable Secret Operator is used to generate the keytab files. In production environments, users might have different means to provision the keytab files. @@ -118,7 +118,7 @@ job: Instruct the Spark application to use Kerberos by setting the `spark.kerberos.keytab` and `spark.kerberos.principal` properties in the `SparkApplication` CRD. -Finally instruct Spark to use the keytab and `krb5.conf` files provisioned in the previous steps. +Finally, instruct Spark to use the keytab and `krb5.conf` files provisioned in the previous steps. [source,yaml] ---- From d9405d8d0962de091d7be795dbd6f66ef914a3d9 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Tue, 20 Feb 2024 12:47:01 +0100 Subject: [PATCH 07/10] more on hdfs --- .../spark-k8s/pages/usage-guide/security.adoc | 39 ++++++++++++++++--- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index 3b8e4e4c..a68f5982 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -123,12 +123,41 @@ Finally, instruct Spark to use the keytab and `krb5.conf` files provisioned in t [source,yaml] ---- sparkConf: - "spark.kerberos.keytab": "/stackable/kerberos/keytab" - "spark.kerberos.principal": "testuser/spark-teragen.default.svc.cluster.local@CLUSTER.LOCAL" - "spark.driver.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" - "spark.executor.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" + "spark.kerberos.keytab": "/stackable/kerberos/keytab" <1> + "spark.kerberos.principal": "testuser/spark-teragen.default.svc.cluster.local@CLUSTER.LOCAL" <2> + "spark.driver.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" <3> + "spark.executor.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" <4> ---- +<1> Location of the keytab file. +<2> Principal name. This needs to have the format `.default.svc.cluster.local@` where `SERVICE_NAME` matches the volume claim annotation `secrets.stackable.tech/kerberos.service.names` and `REALM` must be `CLUSTER.LOCAL`. +<3> Location of the Kerberos configuration for the application driver. +<4> Location of the Kerberos configuration for the application executors. === Hadoop -TODO: where is the kerberized HDFS discovery config map coming from ? +When reading and writing data from a kerberized Hadoop cluster, a the HDFS discovery map must mounted the `SparkApplication` pods as follows: + +For the driver and executor pods: + +[source,yaml] +---- +... +driver: + config: + volumeMounts: + - name: hdfs-config + mountPath: /etc/hadoop/conf <1> +executor: + config: + volumeMounts: + - name: hdfs-config + mountPath: /etc/hadoop/conf <2> +volumes: + - name: hdfs-config + configMap: + name: hdfs-discovery-cm <3> +---- +<1> Location of the HDFS configuration for the driver. +<2> Location of the HDFS configuration for the executors. +<3> Name of the HDFS discovery ConfigMap as published by the HDFS operator. + From 818992dbeb4674575125d45e08d5c28542ec9416 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Mon, 19 Aug 2024 16:52:04 +0200 Subject: [PATCH 08/10] update --- .../spark-k8s/pages/usage-guide/security.adoc | 132 +++++------------- 1 file changed, 33 insertions(+), 99 deletions(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index a68f5982..06e087d6 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -6,16 +6,16 @@ Currently, the only supported authentication mechanism is Kerberos, which is dis Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another securely. It is used in Spark to authenticate users and to secure communication between Spark components. -In this guide we show how to configure Spark applications to use Kerberos. The Stackable Secret Operator is used to generate the keytab files. In production environments, users might have different means to provision the keytab files. +In this guide we show how to configure Spark applications to use Kerberos while accessing data in HDFS cluster. The Stackable Secret Operator is used to generate the keytab files. In production environments, users might have different means to provision the keytab files. == Prerequisites It is assumed that you have a KDC server running in your cluster and that the Stackable Secret Operator is configured to provision the keytab files as described in xref:home:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation]. -If the Spark application processes data from a kerberized Hadoop cluster, follow the xref:hdfs-operator:usage-guide:security.adoc[HDFS operator guide] to configure HDFS with Kerberos. +For details on HDFS and Kerberos, see the xref:hdfs-operator:usage-guide:security.adoc[HDFS operator guide]. -This guide makes use of a SecretClass named `kerberos-default`. It is assumed that this class exists and is configured with a `kerberosBackend`. +This guide makes use of a SecretClass named `kerberos`. It is assumed that this class exists and is configured with a `kerberosBackend`. == Steps @@ -29,33 +29,42 @@ There are three steps to configure a Spark application to use Kerberos: Install the keytab and the `krb5.conf` files in the Spark pods. The keytab file contains the credentials of the user that is used to authenticate with the Kerberos server. The `krb5.conf` file contains the configuration settings for the Kerberos client. -In the example below, the Stackable Secret Operator is used to provision the keytab via a volume claim. The `krb5.conf` file is mounted as a ConfigMap. +In the example below, the Stackable Secret Operator is used to provision the keytab via a volume claim. For brevity the configuration shared by the job, driver and executor pods is only specified once and then referenced in all other places where needed. [source,yaml] ---- ... -driver: - config: +job: + config: &config volumeMounts: - name: kerberos mountPath: /stackable/kerberos <1> -executor: - config: - volumeMounts: - name: kerberos - mountPath: /stackable/kerberos <2> + mountPath: /etc/krb5.conf <2> + subPath: krb5.conf + - name: hdfs-config + mountPath: /stackable/config/hdfs <3> + envOverrides: + HADOOP_CONF_DIR: /stackable/config/hdfs + +driver: + config: *config + +executor: + config: *config + volumes: - - name: kerberos-config + - name: hdfs-config <4> configMap: - name: krb5-kdc <3> + name: hdfs - name: kerberos ephemeral: volumeClaimTemplate: metadata: annotations: - secrets.stackable.tech/class: kerberos-default <4> - secrets.stackable.tech/scope: service=spark-teragen <5> - secrets.stackable.tech/kerberos.service.names: testuser <6> + secrets.stackable.tech/class: kerberos <5> + secrets.stackable.tech/scope: service=spark <6> + secrets.stackable.tech/kerberos.service.names: spark <7> spec: storageClassName: secrets.stackable.tech accessModes: @@ -64,100 +73,25 @@ volumes: requests: storage: "1" ---- -<1> Mount the keytab volume in the driver pod. -<2> Mount the keytab volume in the executor pods. -<3> Mount the `krb5.conf` file as a ConfigMap. -<4> Name of the Secret class used to provision the keytab. -<5> Scope of the Secret. -<6> Name of the user for which the keytab is provisioned. - - -=== Job pod - -Install the keytab and the `krb5.conf` files in the Spark `job` pod. This must be currently done via pod overrides. This is because the Spark application volumes are not currently visible to the `job` pod. We hope to address this limitation in a future release. - -[source,yaml] ----- -job: - podOverrides: - spec: - volumes: - - name: kerberos-config - configMap: - name: krb5-kdc <1> - - name: kerberos - ephemeral: - volumeClaimTemplate: - metadata: - annotations: - secrets.stackable.tech/class: kerberos-default <2> - secrets.stackable.tech/scope: service=spark-teragen <3> - secrets.stackable.tech/kerberos.service.names: testuser <4> - spec: - storageClassName: secrets.stackable.tech - accessModes: - - ReadWriteOnce - resources: - requests: - storage: "1" - containers: - - name: spark-submit - volumeMounts: - - name: kerberos <5> - mountPath: /stackable/kerberos ----- -<1> Mount the `krb5.conf` file as a ConfigMap. -<2> Name of the Secret class used to provision the keytab. -<3> Scope of the Secret. -<4> Name of the user for which the keytab is provisioned. -<5> Mount the keytab volume in the job pod. - +<1> Mount the keytab from the kerberos volume. +<2> Mount the `krb5.conf` file from the kerberos volume. +<3> Mount the Hadoop configuration files from the `hdfs-config` module. +<4> Hadoop configuration files as published by the Hdfs operator. +<5> Name of the Secret class used to provision the keytab. +<6> Scope of the Secret. +<7> Name of the user for which the keytab is provisioned. === Spark application Instruct the Spark application to use Kerberos by setting the `spark.kerberos.keytab` and `spark.kerberos.principal` properties in the `SparkApplication` CRD. -Finally, instruct Spark to use the keytab and `krb5.conf` files provisioned in the previous steps. - [source,yaml] ---- sparkConf: "spark.kerberos.keytab": "/stackable/kerberos/keytab" <1> - "spark.kerberos.principal": "testuser/spark-teragen.default.svc.cluster.local@CLUSTER.LOCAL" <2> - "spark.driver.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" <3> - "spark.executor.extraJavaOptions": "-Djava.security.krb5.conf=/stackable/kerberos/krb5.conf" <4> + "spark.kerberos.principal": "spark/spark.default.svc.cluster.local@CLUSTER.LOCAL" <2> ---- <1> Location of the keytab file. -<2> Principal name. This needs to have the format `.default.svc.cluster.local@` where `SERVICE_NAME` matches the volume claim annotation `secrets.stackable.tech/kerberos.service.names` and `REALM` must be `CLUSTER.LOCAL`. -<3> Location of the Kerberos configuration for the application driver. -<4> Location of the Kerberos configuration for the application executors. - -=== Hadoop - -When reading and writing data from a kerberized Hadoop cluster, a the HDFS discovery map must mounted the `SparkApplication` pods as follows: - -For the driver and executor pods: - -[source,yaml] ----- -... -driver: - config: - volumeMounts: - - name: hdfs-config - mountPath: /etc/hadoop/conf <1> -executor: - config: - volumeMounts: - - name: hdfs-config - mountPath: /etc/hadoop/conf <2> -volumes: - - name: hdfs-config - configMap: - name: hdfs-discovery-cm <3> ----- -<1> Location of the HDFS configuration for the driver. -<2> Location of the HDFS configuration for the executors. -<3> Name of the HDFS discovery ConfigMap as published by the HDFS operator. +<2> Principal name. This needs to have the format `.default.svc.cluster.local@` where `SERVICE_NAME` matches the volume claim annotation `secrets.stackable.tech/kerberos.service.names` and `REALM` must be `CLUSTER.LOCAL` unless a different realm was used explicitly. In that case, the `KERBEROS_REALM` environment variable must also be set accordingly. From 44b1714a529635a8569a716ac7520d5d26a9b089 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Tue, 20 Aug 2024 02:48:08 -0400 Subject: [PATCH 09/10] Update docs/modules/spark-k8s/pages/usage-guide/security.adoc Co-authored-by: Andrew Kenworthy <1712947+adwk67@users.noreply.github.com> --- docs/modules/spark-k8s/pages/usage-guide/security.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index 06e087d6..34012517 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -6,7 +6,7 @@ Currently, the only supported authentication mechanism is Kerberos, which is dis Kerberos is a network authentication protocol that works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another securely. It is used in Spark to authenticate users and to secure communication between Spark components. -In this guide we show how to configure Spark applications to use Kerberos while accessing data in HDFS cluster. The Stackable Secret Operator is used to generate the keytab files. In production environments, users might have different means to provision the keytab files. +In this guide we show how to configure Spark applications to use Kerberos while accessing data in an HDFS cluster. The Stackable Secret Operator is used to generate the keytab files. In production environments, users might have different means to provision the keytab files. == Prerequisites From cbb436584b2049459ef67e90e7d369e9947ff62a Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Tue, 20 Aug 2024 02:48:19 -0400 Subject: [PATCH 10/10] Update docs/modules/spark-k8s/pages/usage-guide/security.adoc Co-authored-by: Andrew Kenworthy <1712947+adwk67@users.noreply.github.com> --- docs/modules/spark-k8s/pages/usage-guide/security.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/spark-k8s/pages/usage-guide/security.adoc b/docs/modules/spark-k8s/pages/usage-guide/security.adoc index 34012517..26ec0193 100644 --- a/docs/modules/spark-k8s/pages/usage-guide/security.adoc +++ b/docs/modules/spark-k8s/pages/usage-guide/security.adoc @@ -11,7 +11,7 @@ In this guide we show how to configure Spark applications to use Kerberos while == Prerequisites -It is assumed that you have a KDC server running in your cluster and that the Stackable Secret Operator is configured to provision the keytab files as described in xref:home:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation]. +It is assumed that you have a KDC server running in your cluster and that the Stackable Secret Operator is configured to provision the keytab files as described in the xref:home:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation]. For details on HDFS and Kerberos, see the xref:hdfs-operator:usage-guide:security.adoc[HDFS operator guide].