Skip to content

Commit e3e356e

Browse files
committed
article updates
1 parent 5ad2be2 commit e3e356e

File tree

1 file changed

+37
-36
lines changed

1 file changed

+37
-36
lines changed

articles/hdinsight/hdinsight-sales-insights-etl.md

Lines changed: 37 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -32,37 +32,39 @@ Download [Power BI Desktop](https://www.microsoft.com/download/details.aspx?id=4
3232

3333
![Open Azure Cloud Shell](./media/hdinsight-sales-insights-etl/hdinsight-sales-insights-etl-click-cloud-shell.png)
3434
1. In the **Select environment** drop-down menu, choose **Bash**.
35-
1. Sign in to your Azure account and set the subscription.
36-
1. Set up the resource group for the project.
37-
1. Choose a unique name for the resource group.
38-
1. Run the following code snippet in Cloud Shell to set variables that will be used in later steps:
39-
40-
```azurecli-interactive
41-
resourceGroup="<RESOURCE GROUP NAME>"
42-
subscriptionID="<SUBSCRIPTION ID>"
43-
44-
az account set --subscription $subscriptionID
45-
az group create --name $resourceGroup --location westus
46-
```
35+
1. List your subscriptions by typing the command `az account list --output table`. Note the Id of the subscription that you will use for this project.
36+
1. Set the subscription you will use for this project and set the subscriptioID variable which will be used later.
37+
38+
```cli
39+
subscriptionID="<SUBSCRIPTION ID>"
40+
az account set --subscription $subscriptionID
41+
```
42+
43+
1. Create a new resource group for the project and set the resourceGroup variable which will be used later.
44+
45+
```cli
46+
resourceGroup="<RESOURCE GROUP NAME>"
47+
az group create --name $resourceGroup --location westus
48+
```
4749
4850
1. Download the data and scripts for this tutorial from the [HDInsight sales insights ETL repository](https://github.com/Azure-Samples/hdinsight-sales-insights-etl) by entering the following commands in Cloud Shell:
4951
50-
```azurecli-interactive
52+
TODO: CHANGE THE CODEFENCE
53+
54+
```cli
5155
git clone https://github.com/Azure-Samples/hdinsight-sales-insights-etl.git
5256
cd hdinsight-sales-insights-etl
5357
```
5458
55-
1. Enter `ls` at the shell prompt to see that the following files and directories have been created:
59+
1. Enter `ls` at the shell prompt to verify that the following files and directories have been created:
5660
57-
```output
58-
/salesdata/
59-
/scripts/
60-
/templates/
61+
```
62+
salesdata scripts templates
6163
```
6264
6365
### Deploy Azure resources needed for the pipeline
6466
65-
1. Add execute permissions for the `chmod +x scripts/*.sh` script.
67+
1. Add execute permissions for all of the scripts by typing `chmod +x scripts/*.sh`.
6668
1. Use the command `./scripts/resources.sh <RESOURCE_GROUP_NAME> <LOCATION>` to run the script to deploy the following resources in Azure:
6769
6870
1. An Azure Blob storage account. This account will hold the company sales data.
@@ -74,21 +76,23 @@ Download [Power BI Desktop](https://www.microsoft.com/download/details.aspx?id=4
7476
7577
Cluster creation can take around 20 minutes.
7678
77-
The `resources.sh` script contains the following command. This command uses an Azure Resource Manager template (`resourcestemplate.json`) to create the specified resources with the desired configuration.
79+
The `resources.sh` script contains the following commands. It is not required for you to run these commands if you already executed the script in the previous step.
7880
79-
```azurecli-interactive
80-
az group deployment create --name ResourcesDeployment \
81-
--resource-group $resourceGroup \
82-
--template-file resourcestemplate.json \
83-
--parameters "@resourceparameters.json"
84-
```
81+
* `az group deployment create` - This command uses an Azure Resource Manager template (`resourcestemplate.json`) to create the specified resources with the desired configuration.
8582
86-
The `resources.sh` script also uploads the sales data .csv files into the newly created Blob storage account by using this command:
83+
```cli
84+
az group deployment create --name ResourcesDeployment \
85+
--resource-group $resourceGroup \
86+
--template-file resourcestemplate.json \
87+
--parameters "@resourceparameters.json"
88+
```
8789
88-
```
89-
az storage blob upload-batch -d rawdata \
90-
--account-name <BLOB STORAGE NAME> -s ./ --pattern *.csv
91-
```
90+
* `az storage blob upload-batch` - This command uploads the sales data .csv files into the newly created Blob storage account by using this command:
91+
92+
```cli
93+
az storage blob upload-batch -d rawdata \
94+
--account-name <BLOB STORAGE NAME> -s ./ --pattern *.csv
95+
```
9296
9397
The default password for SSH access to the clusters is `Thisisapassword1`. If you want to change the password, go to the `resourcesparameters.json` file and change the password for the `sparksshPassword`, `sparkClusterLoginPassword`, `llapClusterLoginPassword`, and `llapsshPassword` parameters.
9498
@@ -109,7 +113,7 @@ The default password for SSH access to the clusters is `Thisisapassword1`. If yo
109113
110114
> [!Note]
111115
> After you know the names of the storage accounts, you can get the account keys by using the following command at the Azure Cloud Shell prompt:
112-
> ```azurecli-interactive
116+
> ```cli
113117
> az storage account keys list \
114118
> --account-name <STORAGE NAME> \
115119
> --resource-group $rg \
@@ -125,10 +129,7 @@ This data factory will have one pipeline with two activities:
125129
- The first activity will copy the data from Azure Blob storage to the Data Lake Storage Gen 2 storage account to mimic data ingestion.
126130
- The second activity will transform the data in the Spark cluster. The script transforms the data by removing unwanted columns. It also appends a new column that calculates the revenue that a single transaction generates.
127131
128-
To set up your Azure Data Factory pipeline, run the `adf.sh` script:
129-
130-
1. Use `chmod +x adf.sh` to add execute permissions on the file.
131-
1. Use `./adf.sh` to run the script.
132+
To set up your Azure Data Factory pipeline, run the `adf.sh` script, by typing `./adf.sh`.
132133
133134
This script does the following things:
134135

0 commit comments

Comments
 (0)