Skip to content

Commit b77cd26

Browse files
committed
Added supported linked service types
1 parent 7c5faff commit b77cd26

File tree

1 file changed

+27
-16
lines changed

1 file changed

+27
-16
lines changed

articles/data-factory/compute-linked-services.md

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
99
author: nabhishek
1010
ms.author: abnarain
1111
manager: anandsub
12-
ms.date: 10/10/2019
12+
ms.date: 05/08/2019
1313
---
1414

1515
# Compute environments supported by Azure Data Factory
@@ -33,16 +33,27 @@ The following table provides a list of compute environments supported by Data Fa
3333
| [Azure Function](#azure-function-linked-service) | [Azure Function activity](control-flow-azure-function-activity.md)
3434
>
3535
36-
## On-demand HDInsight compute environment
36+
## HDInsight compute environment
37+
38+
Refer to below table for details about the supported storage linked service types for configuration in On-demand and BYOC (Bring your own compute) environment.
39+
40+
| In Compute Linked Service | Property Name | Description | Blob | ADLS Gen2 | Azure SQL DB | ADLS Gen 1 |
41+
| ------------------------- | ---------------------------- | ------------------------------------------------------------ | ---- | --------- | ------------ | ---------- |
42+
| On-demand | linkedServiceName | Azure Storage linked service to be used by the on-demand cluster for storing and processing data. | Yes | Yes | No | No |
43+
| | additionalLinkedServiceNames | Specifies additional storage accounts for the HDInsight linked service so that the Data Factory service can register them on your behalf. | Yes | No | No | No |
44+
| | hcatalogLinkedServiceName | The name of Azure SQL linked service that point to the HCatalog database. The on-demand HDInsight cluster is created by using the Azure SQL database as the metastore. | No | No | Yes | No |
45+
| BYOC | linkedServiceName | The Azure Storage linked service reference. | Yes | Yes | No | No |
46+
| | additionalLinkedServiceNames | Specifies additional storage accounts for the HDInsight linked service so that the Data Factory service can register them on your behalf. | No | No | No | No |
47+
| | hcatalogLinkedServiceName | A reference to the Azure SQL linked service that points to the HCatalog database. | No | No | No | No |
48+
49+
### Azure HDInsight on-demand linked service
3750

3851
In this type of configuration, the computing environment is fully managed by the Azure Data Factory service. It is automatically created by the Data Factory service before a job is submitted to process data and removed when the job is completed. You can create a linked service for the on-demand compute environment, configure it, and control granular settings for job execution, cluster management, and bootstrapping actions.
3952

4053
> [!NOTE]
41-
> The on-demand configuration is currently supported only for Azure HDInsight clusters. Azure Databricks also supports on-demand jobs using job clusters, refer to [Azure databricks linked service](#azure-databricks-linked-service) for more details.
54+
> The on-demand configuration is currently supported only for Azure HDInsight clusters. Azure Databricks also supports on-demand jobs using job clusters. For more information, see [Azure databricks linked service](#azure-databricks-linked-service).
4255
43-
## Azure HDInsight on-demand linked service
44-
45-
The Azure Data Factory service can automatically create an on-demand HDInsight cluster to process data. The cluster is created in the same region as the storage account (linkedServiceName property in the JSON) associated with the cluster. The storage account must be a general-purpose standard Azure Storage account.
56+
The Azure Data Factory service can automatically create an on-demand HDInsight cluster to process data. The cluster is created in the same region as the storage account (linkedServiceName property in the JSON) associated with the cluster. The storage account `must` be a general-purpose standard Azure Storage account.
4657

4758
Note the following **important** points about on-demand HDInsight linked service:
4859

@@ -54,7 +65,7 @@ Note the following **important** points about on-demand HDInsight linked service
5465
> [!IMPORTANT]
5566
> It typically takes **20 minutes** or more to provision an Azure HDInsight cluster on demand.
5667
57-
### Example
68+
#### Example
5869

5970
The following JSON defines a Linux-based on-demand HDInsight linked service. The Data Factory service automatically creates a **Linux-based** HDInsight cluster to process the required activity.
6071

@@ -95,7 +106,7 @@ The following JSON defines a Linux-based on-demand HDInsight linked service. The
95106
>
96107
> As more activity runs, you see many containers in your Azure blob storage. If you do not need them for troubleshooting of the jobs, you may want to delete them to reduce the storage cost. The names of these containers follow a pattern: `adf**yourdatafactoryname**-**linkedservicename**-datetimestamp`. Use tools such as [Microsoft Storage Explorer](https://storageexplorer.com/) to delete containers in your Azure blob storage.
97108
98-
### Properties
109+
#### Properties
99110

100111
| Property | Description | Required |
101112
| ---------------------------- | ---------------------------------------- | -------- |
@@ -126,7 +137,7 @@ The following JSON defines a Linux-based on-demand HDInsight linked service. The
126137
> [!IMPORTANT]
127138
> Currently, HDInsight linked services does not support HBase, Interactive Query (Hive LLAP), Storm.
128139
129-
#### additionalLinkedServiceNames JSON example
140+
* additionalLinkedServiceNames JSON example
130141

131142
```json
132143
"additionalLinkedServiceNames": [{
@@ -135,7 +146,7 @@ The following JSON defines a Linux-based on-demand HDInsight linked service. The
135146
}]
136147
```
137148

138-
### Service principal authentication
149+
#### Service principal authentication
139150

140151
The On-Demand HDInsight linked service requires a service principal authentication to create HDInsight clusters on your behalf. To use service principal authentication, register an application entity in Azure Active Directory (Azure AD) and grant it the **Contributor** role of the subscription or the resource group in which the HDInsight cluster is created. For detailed steps, see [Use portal to create an Azure Active Directory application and service principal that can access resources](https://docs.microsoft.com/azure/azure-resource-manager/resource-group-create-service-principal-portal). Make note of the following values, which you use to define the linked service:
141152

@@ -151,7 +162,7 @@ Use service principal authentication by specifying the following properties:
151162
| **servicePrincipalKey** | Specify the application's key. | Yes |
152163
| **tenant** | Specify the tenant information (domain name or tenant ID) under which your application resides. You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal. | Yes |
153164

154-
### Advanced Properties
165+
#### Advanced Properties
155166

156167
You can also specify the following properties for the granular configuration of the on-demand HDInsight cluster.
157168

@@ -166,7 +177,7 @@ You can also specify the following properties for the granular configuration of
166177
| stormConfiguration | Specifies the Storm configuration parameters (storm-site.xml) for the HDInsight cluster. | No |
167178
| yarnConfiguration | Specifies the Yarn configuration parameters (yarn-site.xml) for the HDInsight cluster. | No |
168179

169-
#### Example – On-demand HDInsight cluster configuration with advanced properties
180+
* Example – On-demand HDInsight cluster configuration with advanced properties
170181

171182
```json
172183
{
@@ -220,7 +231,7 @@ You can also specify the following properties for the granular configuration of
220231
}
221232
```
222233

223-
### Node sizes
234+
#### Node sizes
224235
You can specify the sizes of head, data, and zookeeper nodes using the following properties:
225236

226237
| Property | Description | Required |
@@ -229,7 +240,7 @@ You can specify the sizes of head, data, and zookeeper nodes using the following
229240
| dataNodeSize | Specifies the size of the data node. The default value is: Standard_D3. | No |
230241
| zookeeperNodeSize | Specifies the size of the Zoo Keeper node. The default value is: Standard_D3. | No |
231242

232-
#### Specifying node sizes
243+
* Specifying node sizes
233244
See the [Sizes of Virtual Machines](../virtual-machines/linux/sizes.md) article for string values you need to specify for the properties mentioned in the previous section. The values need to conform to the **CMDLETs & APIS** referenced in the article. As you can see in the article, the data node of Large (default) size has 7-GB memory, which may not be good enough for your scenario.
234245

235246
If you want to create D4 sized head nodes and worker nodes, specify **Standard_D4** as the value for headNodeSize and dataNodeSize properties.
@@ -241,7 +252,7 @@ If you want to create D4 sized head nodes and worker nodes, specify **Standard_D
241252

242253
If you specify a wrong value for these properties, you may receive the following **error:** Failed to create cluster. Exception: Unable to complete the cluster create operation. Operation failed with code '400'. Cluster left behind state: 'Error'. Message: 'PreClusterCreationValidationFailure'. When you receive this error, ensure that you are using the **CMDLET & APIS** name from the table in the [Sizes of Virtual Machines](../virtual-machines/linux/sizes.md) article.
243254

244-
## Bring your own compute environment
255+
### Bring your own compute environment
245256
In this type of configuration, users can register an already existing computing environment as a linked service in Data Factory. The computing environment is managed by the user and the Data Factory service uses it to execute the activities.
246257

247258
This type of configuration is supported for the following compute environments:
@@ -432,7 +443,7 @@ You create an Azure Machine Learning linked service to connect an Azure Machine
432443
| servicePrincipalId | Specify the application's client ID. | No |
433444
| servicePrincipalKey | Specify the application's key. | No |
434445
| tenant | Specify the tenant information (domain name or tenant ID) under which your application resides. You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal. | Required if updateResourceEndpoint is specified | No |
435-
| connectVia | The Integration Runtime to be used to dispatch the activities to this linked service. You can use Azure Integration Runtime or Self-hosted Integration Runtime. If not specified, it uses the default Azure Integration Runtime. | No |
446+
| connectVia | The Integration Runtime to be used to dispatch the activities to this linked service. You can use Azure Integration Runtime or Self-hosted Integration Runtime. If not specified, it uses the default Azure Integration Runtime. | No |
436447

437448
## Azure Data Lake Analytics linked service
438449
You create an **Azure Data Lake Analytics** linked service to link an Azure Data Lake Analytics compute service to an Azure data factory. The Data Lake Analytics U-SQL activity in the pipeline refers to this linked service.

0 commit comments

Comments
 (0)