Skip to content

Commit 2133229

Browse files
authored
Merge pull request #108327 from hrasheed-msft/etl_updates
article updates
2 parents 15f2572 + fe53cce commit 2133229

File tree

1 file changed

+36
-37
lines changed

1 file changed

+36
-37
lines changed

articles/hdinsight/hdinsight-sales-insights-etl.md

Lines changed: 36 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -32,37 +32,37 @@ Download [Power BI Desktop](https://www.microsoft.com/download/details.aspx?id=4
3232

3333
![Open Azure Cloud Shell](./media/hdinsight-sales-insights-etl/hdinsight-sales-insights-etl-click-cloud-shell.png)
3434
1. In the **Select environment** drop-down menu, choose **Bash**.
35-
1. Sign in to your Azure account and set the subscription.
36-
1. Set up the resource group for the project.
37-
1. Choose a unique name for the resource group.
38-
1. Run the following code snippet in Cloud Shell to set variables that will be used in later steps:
39-
40-
```azurecli-interactive
41-
resourceGroup="<RESOURCE GROUP NAME>"
42-
subscriptionID="<SUBSCRIPTION ID>"
43-
44-
az account set --subscription $subscriptionID
45-
az group create --name $resourceGroup --location westus
46-
```
35+
1. List your subscriptions by typing the command `az account list --output table`. Note the ID of the subscription that you will use for this project.
36+
1. Set the subscription you will use for this project and set the subscriptionID variable which will be used later.
37+
38+
```cli
39+
subscriptionID="<SUBSCRIPTION ID>"
40+
az account set --subscription $subscriptionID
41+
```
42+
43+
1. Create a new resource group for the project and set the resourceGroup variable which will be used later.
44+
45+
```cli
46+
resourceGroup="<RESOURCE GROUP NAME>"
47+
az group create --name $resourceGroup --location westus
48+
```
4749
4850
1. Download the data and scripts for this tutorial from the [HDInsight sales insights ETL repository](https://github.com/Azure-Samples/hdinsight-sales-insights-etl) by entering the following commands in Cloud Shell:
4951
50-
```azurecli-interactive
52+
```cli
5153
git clone https://github.com/Azure-Samples/hdinsight-sales-insights-etl.git
5254
cd hdinsight-sales-insights-etl
5355
```
5456
55-
1. Enter `ls` at the shell prompt to see that the following files and directories have been created:
57+
1. Enter `ls` at the shell prompt to verify that the following files and directories have been created:
5658
57-
```output
58-
/salesdata/
59-
/scripts/
60-
/templates/
59+
```
60+
salesdata scripts templates
6161
```
6262
6363
### Deploy Azure resources needed for the pipeline
6464
65-
1. Add execute permissions for the `chmod +x scripts/*.sh` script.
65+
1. Add execute permissions for all of the scripts by typing `chmod +x scripts/*.sh`.
6666
1. Use the command `./scripts/resources.sh <RESOURCE_GROUP_NAME> <LOCATION>` to run the script to deploy the following resources in Azure:
6767
6868
1. An Azure Blob storage account. This account will hold the company sales data.
@@ -74,21 +74,23 @@ Download [Power BI Desktop](https://www.microsoft.com/download/details.aspx?id=4
7474
7575
Cluster creation can take around 20 minutes.
7676
77-
The `resources.sh` script contains the following command. This command uses an Azure Resource Manager template (`resourcestemplate.json`) to create the specified resources with the desired configuration.
77+
The `resources.sh` script contains the following commands. It is not required for you to run these commands if you already executed the script in the previous step.
7878
79-
```azurecli-interactive
80-
az group deployment create --name ResourcesDeployment \
81-
--resource-group $resourceGroup \
82-
--template-file resourcestemplate.json \
83-
--parameters "@resourceparameters.json"
84-
```
79+
* `az group deployment create` - This command uses an Azure Resource Manager template (`resourcestemplate.json`) to create the specified resources with the desired configuration.
8580
86-
The `resources.sh` script also uploads the sales data .csv files into the newly created Blob storage account by using this command:
81+
```cli
82+
az group deployment create --name ResourcesDeployment \
83+
--resource-group $resourceGroup \
84+
--template-file resourcestemplate.json \
85+
--parameters "@resourceparameters.json"
86+
```
8787
88-
```
89-
az storage blob upload-batch -d rawdata \
90-
--account-name <BLOB STORAGE NAME> -s ./ --pattern *.csv
91-
```
88+
* `az storage blob upload-batch` - This command uploads the sales data .csv files into the newly created Blob storage account by using this command:
89+
90+
```cli
91+
az storage blob upload-batch -d rawdata \
92+
--account-name <BLOB STORAGE NAME> -s ./ --pattern *.csv
93+
```
9294
9395
The default password for SSH access to the clusters is `Thisisapassword1`. If you want to change the password, go to the `resourcesparameters.json` file and change the password for the `sparksshPassword`, `sparkClusterLoginPassword`, `llapClusterLoginPassword`, and `llapsshPassword` parameters.
9496
@@ -109,7 +111,7 @@ The default password for SSH access to the clusters is `Thisisapassword1`. If yo
109111
110112
> [!Note]
111113
> After you know the names of the storage accounts, you can get the account keys by using the following command at the Azure Cloud Shell prompt:
112-
> ```azurecli-interactive
114+
> ```cli
113115
> az storage account keys list \
114116
> --account-name <STORAGE NAME> \
115117
> --resource-group $rg \
@@ -118,17 +120,14 @@ The default password for SSH access to the clusters is `Thisisapassword1`. If yo
118120
119121
### Create a data factory
120122
121-
Azure Data Factory is a tool that helps automate Azure pipelines. It's not the only way to accomplish these tasks, but it's a great way to automate the processes. For more information on Azure Data Factory, see the [Azure Data Factory documentation](https://azure.microsoft.com/services/data-factory/).
123+
Azure Data Factory is a tool that helps automate Azure Pipelines. It's not the only way to accomplish these tasks, but it's a great way to automate the processes. For more information on Azure Data Factory, see the [Azure Data Factory documentation](https://azure.microsoft.com/services/data-factory/).
122124
123125
This data factory will have one pipeline with two activities:
124126
125127
- The first activity will copy the data from Azure Blob storage to the Data Lake Storage Gen 2 storage account to mimic data ingestion.
126128
- The second activity will transform the data in the Spark cluster. The script transforms the data by removing unwanted columns. It also appends a new column that calculates the revenue that a single transaction generates.
127129
128-
To set up your Azure Data Factory pipeline, run the `adf.sh` script:
129-
130-
1. Use `chmod +x adf.sh` to add execute permissions on the file.
131-
1. Use `./adf.sh` to run the script.
130+
To set up your Azure Data Factory pipeline, run the `adf.sh` script, by typing `./adf.sh`.
132131
133132
This script does the following things:
134133

0 commit comments

Comments
 (0)