Skip to content

Commit 2e6ac97

Browse files
committed
2 parents df106fc + 79857ec commit 2e6ac97

File tree

4 files changed

+28
-27
lines changed

4 files changed

+28
-27
lines changed

articles/hdinsight/hdinsight-hadoop-create-linux-clusters-adf.md

Lines changed: 27 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 03/18/2020
1414

1515
[!INCLUDE [selector](../../includes/hdinsight-create-linux-cluster-selector.md)]
1616

17-
In this tutorial, you learn how to create a [Apache Hadoop](https://hadoop.apache.org/) cluster, on demand, in Azure HDInsight using Azure Data Factory. You then use data pipelines in Azure Data Factory to run Hive jobs and delete the cluster. By the end of this tutorial, you learn how to operationalize a big data job run where cluster creation, job run, and cluster deletion are performed on a schedule.
17+
In this tutorial, you learn how to create an [Apache Hadoop](./hadoop/apache-hadoop-introduction.md) cluster, on demand, in Azure HDInsight using Azure Data Factory. You then use data pipelines in Azure Data Factory to run Hive jobs and delete the cluster. By the end of this tutorial, you learn how to operationalize a big data job run where cluster creation, job run, and cluster deletion are performed on a schedule.
1818

1919
This tutorial covers the following tasks:
2020

@@ -38,7 +38,7 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
3838

3939
## Create preliminary Azure objects
4040

41-
In this section, you create various objects that will be used for the HDInsight cluster you create on-demand. The created storage account will contain the sample [HiveQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual) script, `partitionweblogs.hql`, that you use to simulate a sample [Apache Hive](https://hive.apache.org/) job that runs on the cluster.
41+
In this section, you create various objects that will be used for the HDInsight cluster you create on-demand. The created storage account will contain the sample [HiveQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual) script, `partitionweblogs.hql`, that you use to simulate a sample Apache Hive job that runs on the cluster.
4242

4343
This section uses an Azure PowerShell script to create the storage account and copy over the required files within the storage account. The Azure PowerShell sample script in this section performs the following tasks:
4444

@@ -48,7 +48,7 @@ This section uses an Azure PowerShell script to create the storage account and c
4848
4. Creates a Blob container in the storage account
4949
5. Copies the sample HiveQL script (**partitionweblogs.hql**) the Blob container. The script is available at [https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql](https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql). The sample script is already available in another public Blob container. The PowerShell script below makes a copy of these files into the Azure Storage account it creates.
5050

51-
**To create a storage account and copy the files using Azure PowerShell:**
51+
### Create storage account and copy files
5252

5353
> [!IMPORTANT]
5454
> Specify names for the Azure resource group and the Azure storage account that will be created by the script.
@@ -147,7 +147,7 @@ write-host "Storage Account Key: $destStorageAccountKey"
147147
Write-host "`nScript completed" -ForegroundColor Green
148148
```
149149

150-
**To verify the storage account creation**
150+
### Verify storage account
151151

152152
1. Sign on to the [Azure portal](https://portal.azure.com).
153153
1. From the left, navigate to **All services** > **General** > **Resource groups**.
@@ -163,18 +163,18 @@ Write-host "`nScript completed" -ForegroundColor Green
163163

164164
In Azure Data Factory, a data factory can have one or more data pipelines. A data pipeline has one or more activities. There are two types of activities:
165165

166-
- [Data Movement Activities](../data-factory/copy-activity-overview.md) - You use data movement activities to move data from a source data store to a destination data store.
167-
- [Data Transformation Activities](../data-factory/transform-data.md). You use data transformation activities to transform/process data. HDInsight Hive Activity is one of the transformation activities supported by Data Factory. You use the Hive transformation activity in this tutorial.
166+
* [Data Movement Activities](../data-factory/copy-activity-overview.md). You use data movement activities to move data from a source data store to a destination data store.
167+
* [Data Transformation Activities](../data-factory/transform-data.md). You use data transformation activities to transform/process data. HDInsight Hive Activity is one of the transformation activities supported by Data Factory. You use the Hive transformation activity in this tutorial.
168168

169169
In this article, you configure the Hive activity to create an on-demand HDInsight Hadoop cluster. When the activity runs to process data, here is what happens:
170170

171-
1. An HDInsight Hadoop cluster is automatically created for you just-in-time to process the slice.
171+
1. An HDInsight Hadoop cluster is automatically created for you just-in-time to process the slice.
172172

173173
2. The input data is processed by running a HiveQL script on the cluster. In this tutorial, the HiveQL script associated with the hive activity performs the following actions:
174174

175-
- Uses the existing table (*hivesampletable*) to create another table **HiveSampleOut**.
176-
- Populates the **HiveSampleOut** table with only specific columns from the original *hivesampletable*.
177-
175+
* Uses the existing table (*hivesampletable*) to create another table **HiveSampleOut**.
176+
* Populates the **HiveSampleOut** table with only specific columns from the original *hivesampletable*.
177+
178178
3. The HDInsight Hadoop cluster is deleted after the processing is complete and the cluster is idle for the configured amount of time (timeToLive setting). If the next data slice is available for processing with in this timeToLive idle time, the same cluster is used to process the slice.
179179

180180
## Create a data factory
@@ -190,13 +190,13 @@ In this article, you configure the Hive activity to create an on-demand HDInsigh
190190
|Property |Value |
191191
|---------|---------|
192192
|Name | Enter a name for the data factory. This name must be globally unique.|
193-
|Subscription | Select your Azure subscription. |
194-
|Resource group | Select **Use existing** and then select the resource group you created using the PowerShell script. |
195193
|Version | Leave at **V2**. |
194+
|Subscription | Select your Azure subscription. |
195+
|Resource group | Select the resource group you created using the PowerShell script. |
196196
|Location | The location is automatically set to the location you specified while creating the resource group earlier. For this tutorial, the location is set to **East US**. |
197197
|Enable GIT|Uncheck this box.|
198198

199-
![Create Azure Data Factory using Azure portal](./media/hdinsight-hadoop-create-linux-clusters-adf/create-data-factory-portal.png "Create Azure Data Factory using Azure portal")
199+
![Create Azure Data Factory using Azure portal](./media/hdinsight-hadoop-create-linux-clusters-adf/azure-portal-create-data-factory.png "Create Azure Data Factory using Azure portal")
200200

201201
4. Select **Create**. Creating a data factory might take anywhere between 2 to 4 minutes.
202202

@@ -210,8 +210,8 @@ In this article, you configure the Hive activity to create an on-demand HDInsigh
210210

211211
In this section, you author two linked services within your data factory.
212212

213-
- An **Azure Storage linked service** that links an Azure storage account to the data factory. This storage is used by the on-demand HDInsight cluster. It also contains the Hive script that is run on the cluster.
214-
- An **on-demand HDInsight linked service**. Azure Data Factory automatically creates an HDInsight cluster and runs the Hive script. It then deletes the HDInsight cluster after the cluster is idle for a preconfigured time.
213+
* An **Azure Storage linked service** that links an Azure storage account to the data factory. This storage is used by the on-demand HDInsight cluster. It also contains the Hive script that is run on the cluster.
214+
* An **on-demand HDInsight linked service**. Azure Data Factory automatically creates an HDInsight cluster and runs the Hive script. It then deletes the HDInsight cluster after the cluster is idle for a preconfigured time.
215215

216216
### Create an Azure Storage linked service
217217

@@ -278,26 +278,26 @@ In this section, you author two linked services within your data factory.
278278

279279
![Create a pipeline in Azure Data Factory](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-create-pipeline.png "Create a pipeline in Azure Data Factory")
280280

281-
2. In the **Activities** toolbox, expand **HDInsight**, and drag the **Hive** activity to the pipeline designer surface. In the **General** tab, provide a name for the activity.
281+
1. In the **Activities** toolbox, expand **HDInsight**, and drag the **Hive** activity to the pipeline designer surface. In the **General** tab, provide a name for the activity.
282282

283283
![Add activities to Data Factory pipeline](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-add-hive-pipeline.png "Add activities to Data Factory pipeline")
284284

285-
3. Make sure you have the Hive activity selected, select the **HDI Cluster** tab, and from the **HDInsight Linked Service** drop-down list, select the linked service you created earlier, **HDInsightLinkedService**, for HDInsight.
285+
1. Make sure you have the Hive activity selected, select the **HDI Cluster** tab, and from the **HDInsight Linked Service** drop-down list, select the linked service you created earlier, **HDInsightLinkedService**, for HDInsight.
286286

287287
![Provide HDInsight cluster details for the pipeline](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-hive-activity-select-hdinsight-linked-service.png "Provide HDInsight cluster details for the pipeline")
288288

289-
4. Select the **Script** tab and complete the following steps:
289+
1. Select the **Script** tab and complete the following steps:
290290

291291
1. For **Script Linked Service**, select **HDIStorageLinkedService** from the drop-down list. This value is the storage linked service you created earlier.
292292

293293
1. For **File Path**, select **Browse Storage** and navigate to the location where the sample Hive script is available. If you ran the PowerShell script earlier, this location should be `adfgetstarted/hivescripts/partitionweblogs.hql`.
294294

295295
![Provide Hive script details for the pipeline](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-provide-script-path.png "Provide Hive script details for the pipeline")
296296

297-
1. Under **Advanced** > **Parameters**, select **Auto-fill from script**. This option looks for any parameters in the Hive script that require values at runtime.
298-
297+
1. Under **Advanced** > **Parameters**, select **Auto-fill from script**. This option looks for any parameters in the Hive script that require values at runtime.
298+
299299
1. In the **value** text box, add the existing folder in the format `wasbs://adfgetstarted@<StorageAccount>.blob.core.windows.net/outputfolder/`. The path is case-sensitive. This is the path where the output of the script will be stored. The `wasbs` schema is necessary because storage accounts now have secure transfer required enabled by default.
300-
300+
301301
![Provide parameters for the Hive script](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-provide-script-parameters.png "Provide parameters for the Hive script")
302302

303303
1. Select **Validate** to validate the pipeline. Select the **>>** (right arrow) button to close the validation window.
@@ -314,7 +314,7 @@ In this section, you author two linked services within your data factory.
314314

315315
![Trigger the Azure Data Factory pipeline](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-trigger-pipeline.png "Trigger the Azure Data Factory pipeline")
316316

317-
2. Select **Finish** in the pop-up side bar.
317+
2. Select **OK** in the pop-up side bar.
318318

319319
## Monitor a pipeline
320320

@@ -332,11 +332,11 @@ In this section, you author two linked services within your data factory.
332332

333333
1. To verify the output, in the Azure portal navigate to the storage account that you used for this tutorial. You should see the following folders or containers:
334334

335-
- You see an **adfgerstarted/outputfolder** that contains the output of the Hive script that was run as part of the pipeline.
335+
* You see an **adfgerstarted/outputfolder** that contains the output of the Hive script that was run as part of the pipeline.
336336

337-
- You see an **adfhdidatafactory-\<linked-service-name>-\<timestamp>** container. This container is the default storage location of the HDInsight cluster that was created as part of the pipeline run.
337+
* You see an **adfhdidatafactory-\<linked-service-name>-\<timestamp>** container. This container is the default storage location of the HDInsight cluster that was created as part of the pipeline run.
338338

339-
- You see an **adfjobs** container that has the Azure Data Factory job logs.
339+
* You see an **adfjobs** container that has the Azure Data Factory job logs.
340340

341341
![Verify the Azure Data Factory pipeline output](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-verify-output.png "Verify the Azure Data Factory pipeline output")
342342

@@ -359,7 +359,8 @@ Alternatively, you can delete the entire resource group that you created for thi
359359
1. Enter the resource group name to confirm deletion, and then select **Delete**.
360360

361361
## Next steps
362+
362363
In this article, you learned how to use Azure Data Factory to create on-demand HDInsight cluster and run [Apache Hive](https://hive.apache.org/) jobs. Advance to the next article to learn how to create HDInsight clusters with custom configuration.
363364

364365
> [!div class="nextstepaction"]
365-
>[Create Azure HDInsight clusters with custom configuration](hdinsight-hadoop-provision-linux-clusters.md)
366+
> [Create Azure HDInsight clusters with custom configuration](hdinsight-hadoop-provision-linux-clusters.md)
54 KB
Loading

articles/synapse-analytics/sql-data-warehouse/load-data-from-azure-blob-storage-using-polybase.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Log in to the [Azure portal](https://portal.azure.com/).
4040

4141
## Create a blank database
4242

43-
A SQL pool is created with a defined set of [compute resources]memory-concurrency-limits.md). The database is created within an [Azure resource group](../../azure-resource-manager/management/overview.md) and in an [Azure SQL logical server](../../sql-database/sql-database-features.md).
43+
A SQL pool is created with a defined set of [compute resources](memory-concurrency-limits.md). The database is created within an [Azure resource group](../../azure-resource-manager/management/overview.md) and in an [Azure SQL logical server](../../sql-database/sql-database-features.md).
4444

4545
Follow these steps to create a blank database.
4646

0 commit comments

Comments
 (0)