You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/compute-linked-services.md
+27-16Lines changed: 27 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
9
9
author: nabhishek
10
10
ms.author: abnarain
11
11
manager: anandsub
12
-
ms.date: 10/10/2019
12
+
ms.date: 05/08/2019
13
13
---
14
14
15
15
# Compute environments supported by Azure Data Factory
@@ -33,16 +33,27 @@ The following table provides a list of compute environments supported by Data Fa
33
33
| [Azure Function](#azure-function-linked-service) | [Azure Function activity](control-flow-azure-function-activity.md)
34
34
>
35
35
36
-
## On-demand HDInsight compute environment
36
+
## HDInsight compute environment
37
+
38
+
Refer to below table for details about the supported storage linked service types for configuration in On-demand and BYOC (Bring your own compute) environment.
39
+
40
+
| In Compute Linked Service | Property Name | Description | Blob | ADLS Gen2 | Azure SQL DB | ADLS Gen 1 |
| On-demand | linkedServiceName | Azure Storage linked service to be used by the on-demand cluster for storing and processing data. | Yes | Yes | No | No |
43
+
|| additionalLinkedServiceNames | Specifies additional storage accounts for the HDInsight linked service so that the Data Factory service can register them on your behalf. | Yes | No | No | No |
44
+
|| hcatalogLinkedServiceName | The name of Azure SQL linked service that point to the HCatalog database. The on-demand HDInsight cluster is created by using the Azure SQL database as the metastore. | No | No | Yes | No |
45
+
| BYOC | linkedServiceName | The Azure Storage linked service reference. | Yes | Yes | No | No |
46
+
|| additionalLinkedServiceNames | Specifies additional storage accounts for the HDInsight linked service so that the Data Factory service can register them on your behalf. | No | No | No | No |
47
+
|| hcatalogLinkedServiceName | A reference to the Azure SQL linked service that points to the HCatalog database. | No | No | No | No |
48
+
49
+
### Azure HDInsight on-demand linked service
37
50
38
51
In this type of configuration, the computing environment is fully managed by the Azure Data Factory service. It is automatically created by the Data Factory service before a job is submitted to process data and removed when the job is completed. You can create a linked service for the on-demand compute environment, configure it, and control granular settings for job execution, cluster management, and bootstrapping actions.
39
52
40
53
> [!NOTE]
41
-
> The on-demand configuration is currently supported only for Azure HDInsight clusters. Azure Databricks also supports on-demand jobs using job clusters, refer to [Azure databricks linked service](#azure-databricks-linked-service) for more details.
54
+
> The on-demand configuration is currently supported only for Azure HDInsight clusters. Azure Databricks also supports on-demand jobs using job clusters. For more information, see [Azure databricks linked service](#azure-databricks-linked-service).
42
55
43
-
## Azure HDInsight on-demand linked service
44
-
45
-
The Azure Data Factory service can automatically create an on-demand HDInsight cluster to process data. The cluster is created in the same region as the storage account (linkedServiceName property in the JSON) associated with the cluster. The storage account must be a general-purpose standard Azure Storage account.
56
+
The Azure Data Factory service can automatically create an on-demand HDInsight cluster to process data. The cluster is created in the same region as the storage account (linkedServiceName property in the JSON) associated with the cluster. The storage account `must` be a general-purpose standard Azure Storage account.
46
57
47
58
Note the following **important** points about on-demand HDInsight linked service:
48
59
@@ -54,7 +65,7 @@ Note the following **important** points about on-demand HDInsight linked service
54
65
> [!IMPORTANT]
55
66
> It typically takes **20 minutes** or more to provision an Azure HDInsight cluster on demand.
56
67
57
-
### Example
68
+
####Example
58
69
59
70
The following JSON defines a Linux-based on-demand HDInsight linked service. The Data Factory service automatically creates a **Linux-based** HDInsight cluster to process the required activity.
60
71
@@ -95,7 +106,7 @@ The following JSON defines a Linux-based on-demand HDInsight linked service. The
95
106
>
96
107
> As more activity runs, you see many containers in your Azure blob storage. If you do not need them for troubleshooting of the jobs, you may want to delete them to reduce the storage cost. The names of these containers follow a pattern: `adf**yourdatafactoryname**-**linkedservicename**-datetimestamp`. Use tools such as [Microsoft Storage Explorer](https://storageexplorer.com/) to delete containers in your Azure blob storage.
@@ -126,7 +137,7 @@ The following JSON defines a Linux-based on-demand HDInsight linked service. The
126
137
> [!IMPORTANT]
127
138
> Currently, HDInsight linked services does not support HBase, Interactive Query (Hive LLAP), Storm.
128
139
129
-
####additionalLinkedServiceNames JSON example
140
+
* additionalLinkedServiceNames JSON example
130
141
131
142
```json
132
143
"additionalLinkedServiceNames": [{
@@ -135,7 +146,7 @@ The following JSON defines a Linux-based on-demand HDInsight linked service. The
135
146
}]
136
147
```
137
148
138
-
### Service principal authentication
149
+
####Service principal authentication
139
150
140
151
The On-Demand HDInsight linked service requires a service principal authentication to create HDInsight clusters on your behalf. To use service principal authentication, register an application entity in Azure Active Directory (Azure AD) and grant it the **Contributor** role of the subscription or the resource group in which the HDInsight cluster is created. For detailed steps, see [Use portal to create an Azure Active Directory application and service principal that can access resources](https://docs.microsoft.com/azure/azure-resource-manager/resource-group-create-service-principal-portal). Make note of the following values, which you use to define the linked service:
141
152
@@ -151,7 +162,7 @@ Use service principal authentication by specifying the following properties:
151
162
|**servicePrincipalKey**| Specify the application's key. | Yes |
152
163
|**tenant**| Specify the tenant information (domain name or tenant ID) under which your application resides. You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal. | Yes |
153
164
154
-
### Advanced Properties
165
+
####Advanced Properties
155
166
156
167
You can also specify the following properties for the granular configuration of the on-demand HDInsight cluster.
157
168
@@ -166,7 +177,7 @@ You can also specify the following properties for the granular configuration of
166
177
| stormConfiguration | Specifies the Storm configuration parameters (storm-site.xml) for the HDInsight cluster. | No |
167
178
| yarnConfiguration | Specifies the Yarn configuration parameters (yarn-site.xml) for the HDInsight cluster. | No |
168
179
169
-
####Example – On-demand HDInsight cluster configuration with advanced properties
180
+
* Example – On-demand HDInsight cluster configuration with advanced properties
170
181
171
182
```json
172
183
{
@@ -220,7 +231,7 @@ You can also specify the following properties for the granular configuration of
220
231
}
221
232
```
222
233
223
-
### Node sizes
234
+
####Node sizes
224
235
You can specify the sizes of head, data, and zookeeper nodes using the following properties:
225
236
226
237
| Property | Description | Required |
@@ -229,7 +240,7 @@ You can specify the sizes of head, data, and zookeeper nodes using the following
229
240
| dataNodeSize | Specifies the size of the data node. The default value is: Standard_D3. | No |
230
241
| zookeeperNodeSize | Specifies the size of the Zoo Keeper node. The default value is: Standard_D3. | No |
231
242
232
-
####Specifying node sizes
243
+
* Specifying node sizes
233
244
See the [Sizes of Virtual Machines](../virtual-machines/linux/sizes.md) article for string values you need to specify for the properties mentioned in the previous section. The values need to conform to the **CMDLETs & APIS** referenced in the article. As you can see in the article, the data node of Large (default) size has 7-GB memory, which may not be good enough for your scenario.
234
245
235
246
If you want to create D4 sized head nodes and worker nodes, specify **Standard_D4** as the value for headNodeSize and dataNodeSize properties.
@@ -241,7 +252,7 @@ If you want to create D4 sized head nodes and worker nodes, specify **Standard_D
241
252
242
253
If you specify a wrong value for these properties, you may receive the following **error:** Failed to create cluster. Exception: Unable to complete the cluster create operation. Operation failed with code '400'. Cluster left behind state: 'Error'. Message: 'PreClusterCreationValidationFailure'. When you receive this error, ensure that you are using the **CMDLET & APIS** name from the table in the [Sizes of Virtual Machines](../virtual-machines/linux/sizes.md) article.
243
254
244
-
## Bring your own compute environment
255
+
###Bring your own compute environment
245
256
In this type of configuration, users can register an already existing computing environment as a linked service in Data Factory. The computing environment is managed by the user and the Data Factory service uses it to execute the activities.
246
257
247
258
This type of configuration is supported for the following compute environments:
@@ -432,7 +443,7 @@ You create an Azure Machine Learning linked service to connect an Azure Machine
432
443
| servicePrincipalId | Specify the application's client ID. | No |
433
444
| servicePrincipalKey | Specify the application's key. | No |
434
445
| tenant | Specify the tenant information (domain name or tenant ID) under which your application resides. You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal. | Required if updateResourceEndpoint is specified | No |
435
-
| connectVia | The Integration Runtime to be used to dispatch the activities to this linked service. You can use Azure Integration Runtime or Self-hosted Integration Runtime. If not specified, it uses the default Azure Integration Runtime. | No |
446
+
| connectVia | The Integration Runtime to be used to dispatch the activities to this linked service. You can use Azure Integration Runtime or Self-hosted Integration Runtime. If not specified, it uses the default Azure Integration Runtime. | No |
436
447
437
448
## Azure Data Lake Analytics linked service
438
449
You create an **Azure Data Lake Analytics** linked service to link an Azure Data Lake Analytics compute service to an Azure data factory. The Data Lake Analytics U-SQL activity in the pipeline refers to this linked service.
0 commit comments