Pattern 4 - ADLS Gen2 token authentication on multinode is missing hadoop configuration for RDD's

Hello,

Thanks for the extensive comparison between setups in relation to ADLS and ADB, really helpful. In setting up an environment using `Pattern 4 - Cluster scoped Service principal` there is a bit of configuration missing producing errors in multinode processing using RDD's, e.g. when reading files directly from ADSL using `sc.BinaryFiles()`. The worker nodes do not seem to be able to access the `ADLS Gen2 Token` causing errors using a multinode setup through abfss://, whereas using a Spark DataFrame or a single node (driver node only) cluster there are no issues. 

Turns out working with RDD's requires additional cluster configuration related to the hadoop config, see https://www.data-engineering.wiki/docs/spark/accessing-adls-gen-2-with-rdd/ 

If this bit or other method to achieve the same could be added in relation to working with RDD's that'd be great.
```
spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id <service-principal-application-id>
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/<your scope name>/<secret name>}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<tenant id>/oauth2/token
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pattern 4 - ADLS Gen2 token authentication on multinode is missing hadoop configuration for RDD's #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pattern 4 - ADLS Gen2 token authentication on multinode is missing hadoop configuration for RDD's #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions