Skip to content

Commit d5f4f76

Browse files
Merge pull request #80691 from dagiro/mvc44
Mvc44
2 parents 744e613 + fb97033 commit d5f4f76

File tree

2 files changed

+11
-16
lines changed

2 files changed

+11
-16
lines changed

articles/hdinsight/hdinsight-hadoop-create-linux-clusters-adf.md

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.date: 04/18/2019
1313
# Tutorial: Create on-demand Apache Hadoop clusters in HDInsight using Azure Data Factory
1414
[!INCLUDE [selector](../../includes/hdinsight-create-linux-cluster-selector.md)]
1515

16-
In this article, you learn how to create a [Apache Hadoop](https://hadoop.apache.org/) cluster, on demand, in Azure HDInsight using Azure Data Factory. You then use data pipelines in Azure Data Factory to run Hive jobs and delete the cluster. By the end of this tutorial, you learn how to operationalize a big data job run where cluster creation, job run, and cluster deletion are performed on a schedule.
16+
In this tutorial, you learn how to create a [Apache Hadoop](https://hadoop.apache.org/) cluster, on demand, in Azure HDInsight using Azure Data Factory. You then use data pipelines in Azure Data Factory to run Hive jobs and delete the cluster. By the end of this tutorial, you learn how to operationalize a big data job run where cluster creation, job run, and cluster deletion are performed on a schedule.
1717

1818
This tutorial covers the following tasks:
1919

@@ -37,15 +37,15 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
3737

3838
## Create preliminary Azure objects
3939

40-
In this section, you create various objects that will be used for the HDInsight cluster you create on-demand. The created storage account will contain the sample [HiveQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual) script (`hivescript.hql`) that you use to simulate a sample [Apache Hive](https://hive.apache.org/) job that runs on the cluster.
40+
In this section, you create various objects that will be used for the HDInsight cluster you create on-demand. The created storage account will contain the sample [HiveQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual) script (`partitionweblogs.hql`) that you use to simulate a sample [Apache Hive](https://hive.apache.org/) job that runs on the cluster.
4141

4242
This section uses an Azure PowerShell script to create the storage account and copy over the required files within the storage account. The Azure PowerShell sample script in this section performs the following tasks:
4343

4444
1. Signs in to Azure.
4545
2. Creates an Azure resource group.
4646
3. Creates an Azure Storage account.
4747
4. Creates a Blob container in the storage account
48-
5. Copies the sample HiveQL script (**hivescript.hql**) the Blob container. The script is available at [https://hditutorialdata.blob.core.windows.net/adfv2hiveactivity/hivescripts/hivescript.hql](https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql). The sample script is already available in another public Blob container. The PowerShell script below makes a copy of these files into the Azure Storage account it creates.
48+
5. Copies the sample HiveQL script (**partitionweblogs.hql**) the Blob container. The script is available at [https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql](https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql). The sample script is already available in another public Blob container. The PowerShell script below makes a copy of these files into the Azure Storage account it creates.
4949

5050
> [!WARNING]
5151
> Storage account kind `BlobStorage` cannot be used for HDInsight clusters.
@@ -151,7 +151,7 @@ Write-host "`nScript completed" -ForegroundColor Green
151151
4. On the **Resources** tile, you see one resource listed unless you share the resource group with other projects. That resource is the storage account with the name you specified earlier. Select the storage account name.
152152
5. Select the **Blobs** tiles.
153153
6. Select the **adfgetstarted** container. You see a folder called **hivescripts**.
154-
7. Open the folder and make sure it contains the sample script file, **hivescript.hql**.
154+
7. Open the folder and make sure it contains the sample script file, **partitionweblogs.hql**.
155155

156156
## Understand the Azure Data Factory activity
157157

@@ -286,11 +286,11 @@ In this section, you author two linked services within your data factory.
286286

287287
1. For **Script Linked Service**, select **HDIStorageLinkedService** from the drop-down list. This value is the storage linked service you created earlier.
288288

289-
1. For **File Path**, select **Browse Storage** and navigate to the location where the sample Hive script is available. If you ran the PowerShell script earlier, this location should be `adfgetstarted/hivescripts/hivescript.hql`.
289+
1. For **File Path**, select **Browse Storage** and navigate to the location where the sample Hive script is available. If you ran the PowerShell script earlier, this location should be `adfgetstarted/hivescripts/partitionweblogs.hql`.
290290

291291
![Provide Hive script details for the pipeline](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-provide-script-path.png "Provide Hive script details for the pipeline")
292292

293-
1. Under **Advanced** > **Parameters**, select **Auto-fill from script**. This option looks for any parameters in the Hive script that require values at runtime. The script you use (**hivescript.hql**) has an **Output** parameter. Provide the **value** in the format `wasb://adfgetstarted@<StorageAccount>.blob.core.windows.net/outputfolder/` to point to an existing folder on your Azure Storage. The path is case-sensitive. This is the path where the output of the script will be stored.
293+
1. Under **Advanced** > **Parameters**, select **Auto-fill from script**. This option looks for any parameters in the Hive script that require values at runtime. The script you use (**partitionweblogs.hql**) has an **Output** parameter. Provide the **value** in the format `wasb://adfgetstarted@<StorageAccount>.blob.core.windows.net/outputfolder/` to point to an existing folder on your Azure Storage. The path is case-sensitive. This is the path where the output of the script will be stored.
294294

295295
![Provide parameters for the Hive script](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-provide-script-parameters.png "Provide parameters for the Hive script")
296296

@@ -334,7 +334,7 @@ In this section, you author two linked services within your data factory.
334334

335335
![Verify the Azure Data Factory pipeline output](./media/hdinsight-hadoop-create-linux-clusters-adf/hdinsight-data-factory-verify-output.png "Verify the Azure Data Factory pipeline output")
336336

337-
## Clean up the tutorial
337+
## Clean up resources
338338

339339
With the on-demand HDInsight cluster creation, you do not need to explicitly delete the HDInsight cluster. The cluster is deleted based on the configuration you provided while creating the pipeline. However, even after the cluster is deleted, the storage accounts associated with the cluster continue to exist. This behavior is by design so that you can keep your data intact. However, if you do not want to persist the data, you may delete the storage account you created.
340340

@@ -352,11 +352,8 @@ Alternatively, you can delete the entire resource group that you created for thi
352352

353353
1. Enter the resource group name to confirm deletion, and then select **Delete**.
354354

355-
356355
## Next steps
357356
In this article, you learned how to use Azure Data Factory to create on-demand HDInsight cluster and run [Apache Hive](https://hive.apache.org/) jobs. Advance to the next article to learn how to create HDInsight clusters with custom configuration.
358357

359358
> [!div class="nextstepaction"]
360-
>[Create Azure HDInsight clusters with custom configuration](hdinsight-hadoop-provision-linux-clusters.md)
361-
362-
359+
>[Create Azure HDInsight clusters with custom configuration](hdinsight-hadoop-provision-linux-clusters.md)

articles/hdinsight/spark/TOC.yml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,7 @@
2929
- name: Create a machine learning app
3030
href: apache-spark-ipython-notebook-machine-learning.md
3131
- name: Create an Apache Spark app in IntelliJ
32-
href: apache-spark-create-standalone-application.md
33-
- name: Use VSCode to run Apache Spark queries
34-
href: ../hdinsight-for-vscode.md
35-
- name: Monitor cluster availability with Ambari and Azure Monitor logs
36-
href: ../hdinsight-cluster-availability.md
32+
href: apache-spark-create-standalone-application.md
3733
- name: Samples
3834
items:
3935
- name: .NET samples
@@ -52,6 +48,8 @@
5248
href: ../hdinsight-component-versioning.md
5349
- name: HDInsight 4.0
5450
href: ../hdinsight-version-release.md
51+
- name: Monitor cluster availability with Ambari and Azure Monitor logs
52+
href: ../hdinsight-cluster-availability.md
5553
- name: How to
5654
items:
5755
- name: Use cluster storage

0 commit comments

Comments
 (0)