Skip to content

Support Azure Data Lake Storage Gen2 (ADLS) #539

@siegfriedweber

Description

@siegfriedweber

Support Azure Data Lake Storage Gen2 (ADLS)

Required changes in the cluster spec

ADLS replaces HDFS. A custom "HDFS" ConfigMap can be used instead of the one provided by an HDFS cluster.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: adls
data:
  core-site.xml: |-
    <?xml version="1.0"?>
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>abfs://<container-name>@<storage-account>.dfs.core.windows.net/</value>
      </property>
      <property>
        <name>fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net</name>
        <value>SAS</value>
      </property>
      <property>
        <name>fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net</name>
        <value>${env.SAS_TOKEN}</value>
      </property>
      <!-- further properties, e.g.: -->
      <property>
        <name>hadoop.rpc.protection</name>
        <value>privacy</value>
      </property>
      <property>
        <name>hadoop.security.authentication</name>
        <value>kerberos</value>
      </property>
    </configuration>
  hdfs-site.xml: |-
    <?xml version="1.0"?>
    <configuration>
    </configuration>

Required changes in the image

  • The fixed SAS token provider should be added to the hadoop-azure library. HADOOP-18516 provides this and can be back-ported.
  • The Azure storage JARs must be copied from the Hadoop image.

These changes were already made as a test in stackabletech/docker-images@7314d4f.

Tasks

Metadata

Metadata

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions