You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. In the **Select environment** drop-down menu, choose **Bash**.
35
-
1. Sign in to your Azure account and set the subscription.
36
-
1. Set up the resource group for the project.
37
-
1. Choose a unique name for the resource group.
38
-
1. Run the following code snippet in Cloud Shell to set variables that will be used in later steps:
39
-
40
-
```azurecli-interactive
41
-
resourceGroup="<RESOURCE GROUP NAME>"
42
-
subscriptionID="<SUBSCRIPTION ID>"
43
-
44
-
az account set --subscription $subscriptionID
45
-
az group create --name $resourceGroup --location westus
46
-
```
35
+
1. List your subscriptions by typing the command `az account list --output table`. Note the ID of the subscription that you will use for this project.
36
+
1. Set the subscription you will use for this project and set the subscriptionID variable which will be used later.
37
+
38
+
```cli
39
+
subscriptionID="<SUBSCRIPTION ID>"
40
+
az account set --subscription $subscriptionID
41
+
```
42
+
43
+
1. Create a new resource group for the project and set the resourceGroup variable which will be used later.
44
+
45
+
```cli
46
+
resourceGroup="<RESOURCE GROUP NAME>"
47
+
az group create --name $resourceGroup --location westus
48
+
```
47
49
48
50
1. Download the data and scripts for this tutorial from the [HDInsight sales insights ETL repository](https://github.com/Azure-Samples/hdinsight-sales-insights-etl) by entering the following commands in Cloud Shell:
1. Enter `ls` at the shell prompt to see that the following files and directories have been created:
57
+
1. Enter `ls` at the shell prompt to verify that the following files and directories have been created:
56
58
57
-
```output
58
-
/salesdata/
59
-
/scripts/
60
-
/templates/
59
+
```
60
+
salesdata scripts templates
61
61
```
62
62
63
63
### Deploy Azure resources needed for the pipeline
64
64
65
-
1. Add execute permissions for the `chmod +x scripts/*.sh` script.
65
+
1. Add execute permissions for all of the scripts by typing `chmod +x scripts/*.sh`.
66
66
1. Use the command `./scripts/resources.sh <RESOURCE_GROUP_NAME> <LOCATION>` to run the script to deploy the following resources in Azure:
67
67
68
68
1. An Azure Blob storage account. This account will hold the company sales data.
@@ -74,21 +74,23 @@ Download [Power BI Desktop](https://www.microsoft.com/download/details.aspx?id=4
74
74
75
75
Cluster creation can take around 20 minutes.
76
76
77
-
The `resources.sh` script contains the following command. This command uses an Azure Resource Manager template (`resourcestemplate.json`) to create the specified resources with the desired configuration.
77
+
The `resources.sh` script contains the following commands. It is not required for you to run these commands if you already executed the script in the previous step.
78
78
79
-
```azurecli-interactive
80
-
az group deployment create --name ResourcesDeployment \
81
-
--resource-group $resourceGroup \
82
-
--template-file resourcestemplate.json \
83
-
--parameters "@resourceparameters.json"
84
-
```
79
+
* `az group deployment create` - This command uses an Azure Resource Manager template (`resourcestemplate.json`) to create the specified resources with the desired configuration.
85
80
86
-
The `resources.sh` script also uploads the sales data .csv files into the newly created Blob storage account by using this command:
81
+
```cli
82
+
az group deployment create --name ResourcesDeployment \
The default password for SSH access to the clusters is `Thisisapassword1`. If you want to change the password, go to the `resourcesparameters.json` file and change the password for the `sparksshPassword`, `sparkClusterLoginPassword`, `llapClusterLoginPassword`, and `llapsshPassword` parameters.
94
96
@@ -109,7 +111,7 @@ The default password for SSH access to the clusters is `Thisisapassword1`. If yo
109
111
110
112
> [!Note]
111
113
> After you know the names of the storage accounts, you can get the account keys by using the following command at the Azure Cloud Shell prompt:
112
-
> ```azurecli-interactive
114
+
> ```cli
113
115
> az storage account keys list \
114
116
> --account-name <STORAGE NAME> \
115
117
> --resource-group $rg \
@@ -118,17 +120,14 @@ The default password for SSH access to the clusters is `Thisisapassword1`. If yo
118
120
119
121
### Create a data factory
120
122
121
-
Azure Data Factory is a tool that helps automate Azure pipelines. It's not the only way to accomplish these tasks, but it's a great way to automate the processes. For more information on Azure Data Factory, see the [Azure Data Factory documentation](https://azure.microsoft.com/services/data-factory/).
123
+
Azure Data Factory is a tool that helps automate Azure Pipelines. It's not the only way to accomplish these tasks, but it's a great way to automate the processes. For more information on Azure Data Factory, see the [Azure Data Factory documentation](https://azure.microsoft.com/services/data-factory/).
122
124
123
125
This data factory will have one pipeline with two activities:
124
126
125
127
- The first activity will copy the data from Azure Blob storage to the Data Lake Storage Gen 2 storage account to mimic data ingestion.
126
128
- The second activity will transform the data in the Spark cluster. The script transforms the data by removing unwanted columns. It also appends a new column that calculates the revenue that a single transaction generates.
127
129
128
-
To set up your Azure Data Factory pipeline, run the `adf.sh` script:
129
-
130
-
1. Use `chmod +x adf.sh` to add execute permissions on the file.
131
-
1. Use `./adf.sh` to run the script.
130
+
To set up your Azure Data Factory pipeline, run the `adf.sh` script, by typing `./adf.sh`.
0 commit comments