You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/lab-services/class-type-big-data-analytics.md
+37-38Lines changed: 37 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,59 +1,61 @@
1
1
---
2
-
title: Set up a lab to teach big data analytics using Azure Lab Services | Microsoft Docs
3
-
description: Learn how to set up a lab to teach the big data analytics using Docker deployment of Hortonworks Data Platform (HDP).
4
-
author: nicolela
5
-
ms.topic: how-to
6
-
ms.date: 03/08/2022
7
-
ms.custom: devdivchpfy22
2
+
title: Set up big data analytics lab
3
+
titleSuffix: Azure Lab Services
4
+
description: Learn how to set up a lab in Azure Lab Services to teach the big data analytics using Docker deployment of Hortonworks Data Platform (HDP).
5
+
services: lab-services
8
6
ms.service: lab-services
7
+
author: ntrogh
8
+
ms.author: nicktrog
9
+
ms.topic: how-to
10
+
ms.date: 04/25/2023
9
11
---
10
12
11
-
# Set up a lab for big data analytics using Docker deployment of HortonWorks Data Platform
13
+
# Set up a lab for big data analytics in Azure Lab Services using Docker deployment of HortonWorks Data Platform
This article shows you how to set up a lab to teach a big data analytics class. A big data analytics class teaches students to learn how to handle large volumes of data. It also teaches them to apply machine and statistical learning algorithms to derive data insights. A key objective for students is to learn how to use data analytics tools, such as [Apache Hadoop's open-source software package](https://hadoop.apache.org/). The software package provides tools for storing, managing, and processing big data.
17
+
This article shows you how to set up a lab to teach a big data analytics class. A big data analytics class teaches users how to handle large volumes of data. It also teaches them to apply machine and statistical learning algorithms to derive data insights. A key objective is to learn how to use data analytics tools, such as [Apache Hadoop's open-source software package](https://hadoop.apache.org/). The software package provides tools for storing, managing, and processing big data.
16
18
17
-
In this lab, students will use a popular commercial version of Hadoop provided by [Cloudera](https://www.cloudera.com/), called [Hortonworks Data Platform (HDP)](https://www.cloudera.com/products/hdp.html). Specifically, students will use [HDP Sandbox 3.0.1](https://www.cloudera.com/tutorials/getting-started-with-hdp-sandbox/1.html) that's a simplified, easy-to-use version of the platform. HDP Sandbox 3.0.1 is also free of cost and is intended for learning and experimentation. Although this class may use either Windows or Linux virtual machines (VM) with HDP Sandbox deployed. This article will show you how to use Windows.
19
+
In this lab, lab users work with a popular commercial version of Hadoop provided by [Cloudera](https://www.cloudera.com/), called [Hortonworks Data Platform (HDP)](https://www.cloudera.com/products/hdp.html). Specifically, lab users use [HDP Sandbox 3.0.1](https://www.cloudera.com/tutorials/getting-started-with-hdp-sandbox/1.html) that's a simplified, easy-to-use version of the platform. HDP Sandbox 3.0.1 is also free of cost and is intended for learning and experimentation. Although this class may use either Windows or Linux virtual machines (VM) with HDP Sandbox deployed. This article shows you how to use Windows.
18
20
19
-
Another interesting aspect is that we'll deploy HDP Sandbox on the lab VMs using [Docker](https://www.docker.com/) containers. Each Docker container provides its own isolated environment for software applications to run inside. Conceptually, Docker containers are like nested VMs and can be used to easily deploy and run a wide variety of software applications based on container images provided on [Docker Hub](https://www.docker.com/products/docker-hub). Cloudera's deployment script for HDP Sandbox automatically pulls the [HDP Sandbox 3.0.1 Docker image](https://hub.docker.com/r/hortonworks/sandbox-hdp) from Docker Hub and runs two Docker containers:
21
+
Another interesting aspect is that you deploy the HDP Sandbox on the lab VMs using [Docker](https://www.docker.com/) containers. Each Docker container provides its own isolated environment for software applications to run inside. Conceptually, Docker containers are like nested VMs and can be used to easily deploy and run a wide variety of software applications based on container images provided on [Docker Hub](https://www.docker.com/products/docker-hub). Cloudera's deployment script for HDP Sandbox automatically pulls the [HDP Sandbox 3.0.1 Docker image](https://hub.docker.com/r/hortonworks/sandbox-hdp) from Docker Hub and runs two Docker containers:
20
22
21
23
- sandbox-hdp
22
24
- sandbox-proxy
23
25
24
-
## Lab configuration
26
+
## Prerequisites
25
27
26
-
To set up this lab, you need an Azure subscription to get started. If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/) before you begin.
28
+
[!INCLUDE [must have subscription](./includes/lab-services-class-type-subscription.md)]
29
+
30
+
## Lab configuration
27
31
28
32
### Lab plan settings
29
33
30
-
Once you've an Azure subscription, you can create a new lab plan in Azure Lab Services. For more information about creating a new lab plan, see the tutorial on [how to set up a lab plan](./quick-create-resources.md). You can also use an existing labplan.
34
+
[!INCLUDE [must have lab plan](./includes/lab-services-class-type-lab-plan.md)]
31
35
32
-
Enable your lab plan settings as described in the following table. For more information about how to enable Azure Marketplace images, see [Specify the Azure Marketplace images available to lab creators](./specify-marketplace-images.md).
36
+
This lab uses a Windows 10 Pro Azure Marketplace images as the base VM image. You first need to enable this image in your lab plan. This lets lab creators then select the image as a base image for their lab.
33
37
34
-
| Lab plan setting | Instructions |
35
-
| ------------------- | ------------ |
36
-
|Marketplace image| Enable the **Windows 10 Pro** image.|
38
+
Follow these steps to [enable these Azure Marketplace images available to lab creators](specify-marketplace-images.md). Select one of the **Windows 10** Azure Marketplace images.
37
39
38
40
### Lab settings
39
41
40
-
For instructions on how to create a lab, see [Tutorial: Set up a lab](tutorial-setup-lab.md). Use the following settings when creating the lab.
42
+
Create a lab for your lab plan. [!INCLUDE [create lab](./includes/lab-services-class-type-lab.md)] Use the following settings when creating the lab.
41
43
42
44
| Lab settings | Value/instructions |
43
45
| ------------ | ------------------ |
44
46
|Virtual Machine Size|**Medium (Nested Virtualization)**. This VM size is best suited for relational databases, in-memory caching, and analytics. The size also supports nested virtualization.|
45
-
|Virtual Machine Image| Windows 10 Pro|
46
-
47
+
|Virtual Machine Image|**Windows 10 Pro**|
48
+
47
49
> [!NOTE]
48
-
> We need to use Medium (Nested Virtualization) since deploying HDP Sandbox using Docker requires Windows Hyper-V with nested virtualization and at least 10 GB of RAM.
50
+
> Use the Medium (Nested Virtualization) VM size because the HDP Sandbox using Docker requires Windows Hyper-V with nested virtualization and at least 10 GB of RAM.
49
51
50
52
## Template machine configuration
51
53
52
-
To set up the template machine, we'll:
54
+
To set up the template machine:
53
55
54
-
- Install Docker
55
-
- Deploy HDP Sandbox
56
-
- Use PowerShell and Windows Task Scheduler to automatically start the Docker containers
56
+
1. Install Docker
57
+
1. Deploy HDP Sandbox
58
+
1. Use PowerShell and Windows Task Scheduler to automatically start the Docker containers
57
59
58
60
### Install Docker
59
61
@@ -77,7 +79,7 @@ To use Docker containers, you must first install Docker Desktop on the template
77
79
78
80
### Deploy HDP Sandbox
79
81
80
-
In this section, you'll deploy HDP Sandbox and then access HDP Sandbox using the browser.
82
+
Next, deploy HDP Sandbox and then access HDP Sandbox using the browser.
81
83
82
84
1. Ensure that you have installed [Git Bash](https://gitforwindows.org/) as listed in the [Prerequisites section](https://www.cloudera.com/tutorials/sandbox-deployment-and-install-guide/3.html#prerequisites) of the guide. It's recommended for completing the next steps.
83
85
@@ -97,26 +99,23 @@ In this section, you'll deploy HDP Sandbox and then access HDP Sandbox using the
97
99
> [!NOTE]
98
100
> These instructions assume that you have first mapped the local IP address of the sandbox environment to the sandbox-hdp.hortonworks.com in the host file on your template VM. If you **don't** do this mapping, you can access the Sandbox Welcome page by navigating to `http://localhost:8080`.
99
101
100
-
### Automatically start Docker containers when students log in
102
+
### Automatically start Docker containers when lab users sign in
101
103
102
-
To provide an easy to use, experience for students, we'll use a PowerShell script that automatically:
104
+
To provide an easy-to-use experience for lab users, create a PowerShell script that automatically:
103
105
104
-
- Starts the HDP Sandbox Docker containers when a student starts and connects to their lab VM.
105
-
- Launches the browser and navigates to the Sandbox Welcome Page.
106
+
1. Starts the HDP Sandbox Docker containers when a lab user starts and connects to their lab VM.
107
+
1. Launches the browser and navigates to the Sandbox Welcome page.
106
108
107
-
We'll also use Windows Task Scheduler to automatically run this script when a student logs into their VM.
108
-
To set up a Task Scheduler, follow these steps: [Big Data Analytics scripting](https://aka.ms/azlabs/scripts/BigDataAnalytics).
109
+
Use Windows Task Scheduler to automatically run this script when a lab user logs into their VM. To set up a Task Scheduler, follow these steps: [Big Data Analytics scripting](https://aka.ms/azlabs/scripts/BigDataAnalytics).
109
110
110
111
## Cost estimate
111
112
112
-
If you would like to estimate the cost of this lab, you can use the following example:
113
+
This section provides a cost estimate for running this class for 25 users. There are 20 hours of scheduled class time. Also, each user gets 10 hours quota for homework or assignments outside scheduled class time. The virtual machine size we chose was **Medium (Nested Virtualization)**, which is 55 lab units.
113
114
114
-
For a class of 25 students with 20 hours of scheduled class time and 10 hours of quota for homework or assignments, the price for the lab would be:
115
-
116
-
25 students \* (20 + 10) hours \* 55 Lab Units \* 0.01 USD per hour = 412.50 USD
>Cost estimate is for example purposes only. For current details on pricing, see [Azure Lab Services Pricing](https://azure.microsoft.com/pricing/details/lab-services/).
117
+
>[!IMPORTANT]
118
+
>The cost estimate is for example purposes only. For current pricing information, see [Azure Lab Services pricing](https://azure.microsoft.com/pricing/details/lab-services/).
0 commit comments