Skip to content

Commit f5d33a0

Browse files
Merge pull request #244363 from WilliamDAssafMSFT/20230707-apache-spark
20230707 apache spark auth, API reference updates
2 parents de2d810 + 23755f8 commit f5d33a0

File tree

1 file changed

+45
-27
lines changed

1 file changed

+45
-27
lines changed

articles/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary.md

Lines changed: 45 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Secure access credentials with Linked Services in Apache Spark for Azure Synapse Analytics
33
description: This article provides concepts on how to securely integrate Apache Spark for Azure Synapse Analytics with other services using linked services and token library
44
services: synapse-analytics
5-
author: mlee3gsd
5+
author: vijaysr
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice: spark
@@ -13,12 +13,11 @@ ms.reviewer: shravan
1313
zone_pivot_groups: programming-languages-spark-all-minus-sql-r
1414
---
1515

16-
1716
# Secure credentials with linked services using the mssparkutils
1817

1918
Accessing data from external sources is a common pattern. Unless the external data source allows anonymous access, chances are you need to secure your connection with a credential, secret, or connection string.
2019

21-
Synapse uses Azure Active Directory (Azure AD) passthrough by default for authentication between resources. If you need to connect to a resource using other credentials, use the mssparkutils directly. The mssparkutils simplifies the process of retrieving SAS tokens, Azure AD tokens, connection strings, and secrets stored in a linked service or from an Azure Key Vault.
20+
Azure Synapse Analytics uses Azure Active Directory (Azure AD) passthrough by default for authentication between resources. If you need to connect to a resource using other credentials, use the mssparkutils directly. The mssparkutils simplifies the process of retrieving SAS tokens, Azure AD tokens, connection strings, and secrets stored in a linked service or from an Azure Key Vault.
2221

2322
Azure AD passthrough uses permissions assigned to you as a user in Azure AD, rather than permissions assigned to Synapse or a separate service principal. For example, if you want to use Azure AD passthrough to access a blob in a storage account, then you should go to that storage account and assign blob contributor role to yourself.
2423

@@ -75,7 +74,7 @@ Get result:
7574

7675
#### ADLS Gen2 Primary Storage
7776

78-
Accessing files from the primary Azure Data Lake Storage uses Azure Active Directory passthrough for authentication by default and doesn't require the explicit use of the mssparkutils. The identity used in the passthrough authentication differs based on a few factors. By default, interactive notebooks are executed using the user's identity, but they can be changed to the workspace MSI. Batch jobs and non-interactive executions of the notebook use the Workspace MSI identity.
77+
Accessing files from the primary Azure Data Lake Storage uses Azure Active Directory passthrough for authentication by default and doesn't require the explicit use of the mssparkutils. The identity used in the passthrough authentication differs based on a few factors. By default, interactive notebooks are executed using the user's identity, but it can be changed to the workspace managed service identity (MSI). Batch jobs and non-interactive executions of the notebook use the Workspace MSI.
7978

8079
::: zone pivot = "programming-language-scala"
8180

@@ -97,7 +96,7 @@ display(df.limit(10))
9796

9897
#### ADLS Gen2 storage with linked services
9998

100-
Synapse provides an integrated linked services experience when connecting to Azure Data Lake Storage Gen2. Linked Services can be configured to authenticate using an **Account Key**, **Service Principal**, **Managed Identity**, or **Credential**.
99+
Azure Synapse Analytics provides an integrated linked services experience when connecting to Azure Data Lake Storage Gen2. Linked services can be configured to authenticate using an **Account Key**, **Service Principal**, **Managed Identity**, or **Credential**.
101100

102101
When the linked service authentication method is set to **Account Key**, the linked service will authenticate using the provided storage account key, request a SAS key, and automatically apply it to the storage request using the **LinkedServiceBasedSASProvider**.
103102

@@ -108,9 +107,9 @@ Synapse allows users to set the linked service for a particular storage account.
108107
```scala
109108
val sc = spark.sparkContext
110109
val source_full_storage_account_name = "teststorage.dfs.core.windows.net"
111-
spark.conf.set(f"spark.storage.synapse.{source_full_storage_account_name}.linkedServiceName", "<LINKED SERVICE NAME>")
112-
sc.hadoopConfiguration.set(f"fs.azure.account.auth.type.{source_full_storage_account_name}", "SAS")
113-
sc.hadoopConfiguration.set(f"fs.azure.sas.token.provider.type.{source_full_storage_account_name}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
110+
spark.conf.set(s"spark.storage.synapse.$source_full_storage_account_name.linkedServiceName", "<LINKED SERVICE NAME>")
111+
sc.hadoopConfiguration.set(s"fs.azure.account.auth.type.$source_full_storage_account_name", "SAS")
112+
sc.hadoopConfiguration.set(s"fs.azure.sas.token.provider.type.$source_full_storage_account_name", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
114113

115114
val df = spark.read.csv("abfss://<CONTAINER>@<ACCOUNT>.dfs.core.windows.net/<FILE PATH>")
116115

@@ -145,8 +144,8 @@ When the linked service authentication method is set to **Managed Identity** or
145144
```scala
146145
val sc = spark.sparkContext
147146
val source_full_storage_account_name = "teststorage.dfs.core.windows.net"
148-
spark.conf.set(f"spark.storage.synapse.{source_full_storage_account_name}.linkedServiceName", "<LINKED SERVICE NAME>")
149-
sc.hadoopConfiguration.set(f"fs.azure.account.oauth.provider.type.{source_full_storage_account_name}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")
147+
spark.conf.set(s"spark.storage.synapse.$source_full_storage_account_name.linkedServiceName", "<LINKED SERVICE NAME>")
148+
sc.hadoopConfiguration.set(s"fs.azure.account.oauth.provider.type.$source_full_storage_account_name", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")
150149
val df = spark.read.csv("abfss://<CONTAINER>@<ACCOUNT>.dfs.core.windows.net/<FILE PATH>")
151150

152151
display(df.limit(10))
@@ -297,7 +296,7 @@ import json
297296
json.loads(mssparkutils.credentials.getPropertiesAll("<LINKED SERVICE NAME>"))
298297
```
299298
The output will look like
300-
````
299+
```
301300
{
302301
'AuthType': 'Key',
303302
'AuthKey': '[REDACTED]',
@@ -306,13 +305,13 @@ The output will look like
306305
'Endpoint': 'https://storageaccount.blob.core.windows.net/',
307306
'Database': None
308307
}
309-
````
308+
```
310309

311310
#### GetSecret()
312311

313312
To retrieve a secret stored from Azure Key Vault, we recommend that you create a linked service to Azure Key Vault within the Synapse workspace. The Synapse workspace managed service identity will need to be granted **GET** Secrets permission to the Azure Key Vault. The linked service will use the managed service identity to connect to Azure Key Vault service to retrieve the secret. Otherwise, connecting directly to Azure Key Vault will use the user's Azure Active Directory (Azure AD) credential. In this case, the user will need to be granted the Get Secret permissions in Azure Key Vault.
314313

315-
In national clouds, please provide the fully qualified domain name of the keyvault.
314+
In government clouds, please provide the fully qualified domain name of the keyvault.
316315

317316
`mssparkutils.credentials.getSecret("<AZURE KEY VAULT NAME>", "<SECRET KEY>" [, <LINKED SERVICE NAME>])`
318317

@@ -347,29 +346,48 @@ Console.WriteLine(connectionString);
347346

348347
::: zone-end
349348

350-
#### Linked service connections supported from the Spark runtime (notebook or batch jobs)
349+
#### Linked service connections supported from the Spark runtime
350+
351+
While Azure Synapse Analytics supports a variety of linked service connections (from pipelines and other Azure products), not all of them are supported from the Spark runtime. Here is the list of supported linked services:
351352

352-
The Azure Synapse Analytics supports a variety of linked service connections (from pipelines and other places), but not all of them are supported from the Spark runtime. Here is the list of supported linked services.
353353
- Azure Blob Storage
354-
- Azure Storage
355-
- Azure SQL Data Warehouse
356-
- Azure SQL
357-
- Azure Database for MySQL
358-
- Azure Database for PostgreSQL
354+
- Azure Cognitive Services
359355
- Azure Cosmos DB
360-
- Azure Data Lake Storage Gen1
361-
- Azure Key Vault
362356
- Azure Data Explorer
363-
- Azure Cognitive Services
357+
- Azure Database for MySQL
358+
- Azure Database for Postgre SQL
359+
- Azure Data Lake Store (Gen1)
360+
- Azure Key Vault
364361
- Azure Machine Learning
365362
- Azure Purview
363+
- Azure SQL Database
364+
- Azure SQL Data Warehouse (Dedicated and Serverless)
365+
- Azure Storage
366366

367-
#### The following methods of accessing the linked services are not supported from the Spark runtime
367+
#### mssparkutils.credenials.getToken()
368+
When you need an OAuth bearer token to access services directly, you can use the `getToken` method. The following resources are supported:
369+
370+
| Service Name | String literal to be used in API call |
371+
|-------------------------------------------------------|---------------------------------------|
372+
| Azure Storage | `Storage` |
373+
| Azure Key Vault | `Vault` |
374+
| Azure Management | `AzureManagement` |
375+
| Azure SQL Data Warehouse (Dedicated and Serverless) | `DW` |
376+
| Azure Synapse | `Synapse` |
377+
| Azure Data Lake Store | `DataLakeStore` |
378+
| Azure Data Factory | `ADF` |
379+
| Azure Data Explorer | `AzureDataExplorer` |
380+
| Azure Database for MySQL | `AzureOSSDB` |
381+
| Azure Database for MariaDB | `AzureOSSDB` |
382+
| Azure Database for PostgreSQL | `AzureOSSDB` |
383+
#### Unsupported linked service access from the Spark runtime
384+
385+
The following methods of accessing the linked services are not supported from the Spark runtime:
368386

369387
- Passing arguments to parameterized linked service
370-
- Connections that use User assigned managed identities (UAMI)
371-
372-
From a notebook or a spark job, when the request to get token/secret using Linked Service fails, if the error message indicates BadRequest, then this indicates the user error. The error message currently doesn't require all the details of the failure. Please reach out to our support to debug the issue.
388+
- Connections with User assigned managed identities (UAMI)
389+
390+
While running a notebook or a Spark job, requests to get a token / secret using a linked service may fail with an error message that indicates 'BadRequest'. This is often caused by a configuration issue with the linked service. If you see this error message, please check the configuration of your linked service. If you have any questions, please contact Microsoft Azure Support at https://portal.azure.com.
373391

374392
## Next steps
375393

0 commit comments

Comments
 (0)