You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary.md
+83-32Lines changed: 83 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
---
2
2
title: Secure access credentials with Linked Services in Apache Spark for Azure Synapse Analytics
3
3
description: This article provides concepts on how to securely integrate Apache Spark for Azure Synapse Analytics with other services using linked services and token library
@@ -20,7 +21,7 @@ Azure Synapse Analytics uses Microsoft Entra passthrough by default for authenti
20
21
21
22
Microsoft Entra passthrough uses permissions assigned to you as a user in Microsoft Entra ID, rather than permissions assigned to Synapse or a separate service principal. For example, if you want to use Microsoft Entra passthrough to access a blob in a storage account, then you should go to that storage account and assign blob contributor role to yourself.
22
23
23
-
When retrieving secrets from Azure Key Vault, we recommend creating a linked service to your Azure Key Vault. Ensure that the Synapse workspace managed service identity (MSI) has Secret Get privileges on your Azure Key Vault. Synapse will authenticate to Azure Key Vault using the Synapse workspace managed service identity. If you connect directly to Azure Key Vault without a linked service, you will authenticate using your user Microsoft Entra credential.
24
+
When retrieving secrets from Azure Key Vault, we recommend creating a linked service to your Azure Key Vault. Ensure that the Synapse workspace managed service identity (MSI) has Secret Get privileges on your Azure Key Vault. Synapse will authenticate to Azure Key Vault using the Synapse workspace managed service identity. If you connect directly to Azure Key Vault without a linked service, authenticate using your user Microsoft Entra credential.
24
25
25
26
For more information, see [linked services](../../data-factory/concepts-linked-services.md?context=/azure/synapse-analytics/context/context).
26
27
@@ -69,7 +70,7 @@ Get result:
69
70
putSecretWithLS(linkedService: String, secretName: String, secretValue: String): puts AKV secret for a given linked service, secretName
70
71
```
71
72
72
-
## Accessing Azure Data Lake Storage Gen2
73
+
## <aid="accessing-azure-data-lake-storage-gen2"></a> Access Azure Data Lake Storage Gen2
73
74
74
75
#### ADLS Gen2 Primary Storage
75
76
@@ -97,9 +98,12 @@ display(df.limit(10))
97
98
98
99
Azure Synapse Analytics provides an integrated linked services experience when connecting to Azure Data Lake Storage Gen2. Linked services can be configured to authenticate using an **Account Key**, **Service Principal**, **Managed Identity**, or **Credential**.
99
100
100
-
When the linked service authentication method is set to **Account Key**, the linked service will authenticate using the provided storage account key, request a SAS key, and automatically apply it to the storage request using the **LinkedServiceBasedSASProvider**.
101
+
When the linked service authentication method is set to **Account Key**, the linked service authenticates using the provided storage account key, request a SAS key, and automatically apply it to the storage request using the **LinkedServiceBasedSASProvider**.
102
+
103
+
Synapse allows users to set the linked service for a particular storage account. This makes it possible to read/write data from **multiple storage accounts** in a single spark application/query. Once we set `spark.storage.synapse.{source_full_storage_account_name}.linkedServiceName` for each storage account that will be used, Synapse figures out which linked service to use for a particular read/write operation. However if our spark job only deals with a single storage account, we can omit the storage account name and use `spark.storage.synapse.linkedServiceName`.
101
104
102
-
Synapse allows users to set the linked service for a particular storage account. This makes it possible to read/write data from **multiple storage accounts** in a single spark application/query. Once we set **spark.storage.synapse.{source_full_storage_account_name}.linkedServiceName** for each storage account that will be used, Synapse figures out which linked service to use for a particular read/write operation. However if our spark job only deals with a single storage account, we can simply omit the storage account name and use **spark.storage.synapse.linkedServiceName**
105
+
> [!NOTE]
106
+
> It is not possible to change the authentication method of the default ABFS storage container.
103
107
104
108
::: zone pivot = "programming-language-scala"
105
109
@@ -135,7 +139,7 @@ df.show()
135
139
136
140
::: zone-end
137
141
138
-
When the linked service authentication method is set to **Managed Identity** or **Service Principal**, the linked service will use the Managed Identity or Service Principal token with the **LinkedServiceBasedTokenProvider** provider.
142
+
When the linked service authentication method is set to **Managed Identity** or **Service Principal**, the linked service uses the Managed Identity or Service Principal token with the **LinkedServiceBasedTokenProvider** provider.
139
143
140
144
141
145
::: zone pivot = "programming-language-scala"
@@ -168,6 +172,15 @@ df.show()
168
172
169
173
::: zone-end
170
174
175
+
### <aid="setting-authentication-settings-through-spark-configuration"></a> Set authentication settings through spark configuration
176
+
177
+
Authentication settings can also be specified through spark configurations, instead of running spark statements. All spark configurations should be prefixed with `spark.` and all hadoop configurations should be prefixed with `spark.hadoop.`.
178
+
179
+
|Spark config name|Config value|
180
+
|------------------|-----------|
181
+
|`spark.storage.synapse.teststorage.dfs.core.windows.net.linkedServiceName`|LINKED SERVICE NAME|
Connect to ADLS Gen2 storage directly by using a SAS key. Use the `ConfBasedSASProvider` and provide the SAS key to the `spark.storage.synapse.sas` configuration setting. SAS tokens can be set at the container level, account level, or global. We do not recommend setting SAS keys at the global level, as the job will not be able to read/write from more than one storage account.
@@ -271,7 +284,44 @@ display(df.limit(10))
271
284
272
285
::: zone-end
273
286
274
-
#### ADLS Gen2 storage with Azure Key Vault
287
+
#### Use MSAL to acquire tokens (using custom app credentials)
288
+
289
+
When the ABFS storage driver is [configured](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html) to use MSAL directly for authentications, the provider doesn't cache tokens. This can result in reliability issues. We recommend using the `ClientCredsTokenProvider` is part of the Synapse Spark.
### ADLS Gen2 storage with SAS token (from Azure Key Vault)
275
325
276
326
Connect to ADLS Gen2 storage using a SAS token stored in Azure Key Vault secret.
277
327
@@ -313,7 +363,7 @@ To connect to other linked services, you can make a direct call to the TokenLibr
313
363
314
364
#### getConnectionString()
315
365
316
-
To retrieve the connection string, use the **getConnectionString** function and pass in the **linked service name**.
366
+
To retrieve the connection string, use the `getConnectionString` function and pass in the **linked service name**.
317
367
318
368
::: zone pivot = "programming-language-scala"
319
369
@@ -378,11 +428,11 @@ The output will look like
378
428
379
429
To retrieve a secret stored from Azure Key Vault, we recommend that you create a linked service to Azure Key Vault within the Synapse workspace. The Synapse workspace managed service identity will need to be granted **GET** Secrets permission to the Azure Key Vault. The linked service will use the managed service identity to connect to Azure Key Vault service to retrieve the secret. Otherwise, connecting directly to Azure Key Vault will use the user's Microsoft Entra credential. In this case, the user will need to be granted the Get Secret permissions in Azure Key Vault.
380
430
381
-
In government clouds, please provide the fully qualified domain name of the keyvault.
431
+
In government clouds, provide the fully qualified domain name of the keyvault.
382
432
383
433
`mssparkutils.credentials.getSecret("<AZURE KEY VAULT NAME>", "<SECRET KEY>" [, <LINKED SERVICE NAME>])`
384
434
385
-
To retrieve a secret from Azure Key Vault, use the **mssparkutils.credentials.getSecret()** function.
435
+
To retrieve a secret from Azure Key Vault, use the `mssparkutils.credentials.getSecret()` function.
#### Linked service connections supported from the Spark runtime
417
467
418
-
While Azure Synapse Analytics supports a variety of linked service connections (from pipelines and other Azure products), not all of them are supported from the Spark runtime. Here is the list of supported linked services:
468
+
While Azure Synapse Analytics supports various linked service connections (from pipelines and other Azure products), not all of them are supported from the Spark runtime. Here is the list of supported linked services:
419
469
420
470
- Azure Blob Storage
421
471
- Azure AI services
@@ -434,33 +484,34 @@ While Azure Synapse Analytics supports a variety of linked service connections (
434
484
#### mssparkutils.credentials.getToken()
435
485
When you need an OAuth bearer token to access services directly, you can use the `getToken` method. The following resources are supported:
436
486
437
-
| Service Name | String literal to be used in API call |
487
+
| Service Name | String literal to be used in API call |
| Azure SQL Data Warehouse (Dedicated and Serverless)|`DW`|
443
-
| Azure Synapse|`Synapse`|
444
-
| Azure Data Lake Store|`DataLakeStore`|
445
-
| Azure Data Factory|`ADF`|
446
-
| Azure Data Explorer|`AzureDataExplorer`|
447
-
| Azure Database for MySQL|`AzureOSSDB`|
448
-
| Azure Database for MariaDB|`AzureOSSDB`|
449
-
| Azure Database for PostgreSQL|`AzureOSSDB`|
489
+
|`Azure Storage`|`Storage`|
490
+
|`Azure Key Vault`|`Vault`|
491
+
|`Azure Management`|`AzureManagement`|
492
+
|`Azure SQL Data Warehouse (Dedicated and Serverless)`|`DW`|
493
+
|`Azure Synapse`|`Synapse`|
494
+
|`Azure Data Lake Store`|`DataLakeStore`|
495
+
|`Azure Data Factory`|`ADF`|
496
+
|`Azure Data Explorer`|`AzureDataExplorer`|
497
+
|`Azure Database for MySQL`|`AzureOSSDB`|
498
+
|`Azure Database for MariaDB`|`AzureOSSDB`|
499
+
|`Azure Database for PostgreSQL`|`AzureOSSDB`|
450
500
451
501
#### Unsupported linked service access from the Spark runtime
452
502
453
503
The following methods of accessing the linked services are not supported from the Spark runtime:
454
504
455
505
- Passing arguments to parameterized linked service
456
-
- Connections with User assigned managed identities (UAMI)
457
-
- System Assigned Managed identities are not supported on Keyvault resource
506
+
- Connections with user-assigned managed identities (UAMI)
507
+
- Getting the bearer token to Keyvault resource when your Notebook / SparkJobDefinition runs as managed identity
508
+
- As an alternative, instead of getting an access token, you can create a linked service to Keyvault and get the secret from your Notebook / batch job
458
509
- For Azure Cosmos DB connections, key based access alone is supported. Token based access is not supported.
459
510
460
-
While running a notebook or a Spark job, requests to get a token / secret using a linked service might fail with an error message that indicates 'BadRequest'. This is often caused by a configuration issue with the linked service. If you see this error message, please check the configuration of your linked service. If you have any questions, please contact Microsoft Azure Support at the [Azure portal](https://portal.azure.com).
511
+
While running a notebook or a Spark job, requests to get a token / secret using a linked service might fail with an error message that indicates 'BadRequest'. This is often caused by a configuration issue with the linked service. If you see this error message, please check the configuration of your linked service. If you have any questions, contact Microsoft Azure Support at the [Azure portal](https://portal.azure.com).
461
512
462
513
## Related content
463
514
464
-
-[Write to dedicated SQL pool](./synapse-spark-sql-pool-import-export.md)
515
+
-[Write to dedicated SQL pool](synapse-spark-sql-pool-import-export.md)
465
516
-[Apache Spark in Azure Synapse Analytics](apache-spark-overview.md)
466
517
-[Introduction to Microsoft Spark Utilities](microsoft-spark-utilities.md)
0 commit comments