Skip to content

Commit 3caeecc

Browse files
authored
Merge pull request #5 from MicrosoftCloudEssentials-LearningHub/azure-ml
terraform infra for azure ml
2 parents 80167fb + 5663af1 commit 3caeecc

File tree

8 files changed

+286
-1
lines changed

8 files changed

+286
-1
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@
88
# Crash log files
99
crash.log
1010
crash.*.log
11+
.terraform.lock.hcl
1112

1213
# Exclude all .tfvars files, which are likely to contain sensitive data, such as
1314
# password, private keys, and other secrets. These should not be part of version
1415
# control as they are data points which are potentially sensitive and subject
1516
# to change depending on the environment.
16-
*.tfvars
1717
*.tfvars.json
1818

1919
# Ignore override files as they are usually used to override resources locally and so
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Demonstration: Deploying Azure Resources for an ML Platform
2+
3+
Costa Rica
4+
5+
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
6+
[brown9804](https://github.com/brown9804)
7+
8+
Last updated: 2025-04-29
9+
10+
------------------------------------------
11+
12+
> This repository contains Terraform configurations for setting up an Azure Machine Learning workspace along with compute clusters and supportive resources to form the core of an ML platform.
13+
> `Remember, managing your infrastructure through code (IaC) not only ensures consistency, but also offers version control, reproducibility, and collaboration benefits—essential for scalable ML operations.`
14+
> For additional Terraform templates covering various Azure services, check out [this repository](https://github.com/MicrosoftCloudEssentials-LearningHub/AzureTerraformTemplates-v0.0.0). Explore and borrow ideas as needed!
15+
16+
> [!TIP]
17+
> **About Infrastructure via Terraform**: Terraform is a powerful IaC tool that enables you to define and provision your cloud resources through a high-level configuration language. This approach keeps not only your application objects under source control but also the infrastructure code, ensuring reproducible environments across development, testing, and production. Microsoft also offers additional IaC tools like Bicep and ARM templates, giving you flexibility in how you manage your Azure resources.
18+
19+
<p align="center">
20+
<img width="550" alt="Azure Machine Learning architecture" src="https://github.com/user-attachments/assets/8933eb5c-7cc9-4d06-978c-64cb755a48ee">
21+
</p>
22+
23+
<details>
24+
<summary><b>List of References </b> (Click to expand)</summary>
25+
26+
- [Azure Machine Learning Documentation](https://learn.microsoft.com/en-us/azure/machine-learning/)
27+
- [Terraform Azure Provider Documentation](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs)
28+
- [Azure Terraform Templates](https://github.com/MicrosoftCloudEssentials-LearningHub/AzureTerraformTemplates-v0.0.0)
29+
</details>
30+
31+
<details>
32+
<summary><b>Table of Contents </b> (Click to expand)</summary>
33+
34+
- [Overview](#overview)
35+
- [Configuring Access with Azure CLI](#configuring-access-with-azure-cli)
36+
- [Configure Remote Storage for Terraform Deployment](#configure-remote-storage-for-terraform-deployment)
37+
- [How to Execute the Deployment](#how-to-execute-the-deployment)
38+
</details>
39+
40+
## Overview
41+
42+
```plaintext
43+
.
44+
├── README.md
45+
├── src
46+
│ ├── main.tf
47+
│ ├── variables.tf
48+
│ ├── provider.tf
49+
│ ├── terraform.tfvars
50+
│ ├── remote-storage.tf
51+
│ └── outputs.tf
52+
```
53+
54+
- **main.tf** *(Main Terraform configuration file)*: Contains the core infrastructure code that provisions your Azure Machine Learning workspace, compute clusters, and other related services.
55+
- **variables.tf** *(Variable definitions)*: Defines variables to parameterize your configurations. This includes settings for workspace names, compute configurations, and other environment-specific parameters.
56+
- **provider.tf** *(Provider configurations)*: Specifies the necessary settings for the Azure provider so Terraform can authenticate and manage your Azure resources.
57+
- **terraform.tfvars** *(Variable values)*: Holds the actual values for the variables defined in `variables.tf`. Adjust these values according to the environment you’re targeting (development, staging, production).
58+
- **remote-storage.tf** *(Remote state storage configuration)*: Configures a remote backend (such as Azure Blob Storage) for storing Terraform’s state file securely, ensuring reliable collaboration.
59+
- **outputs.tf** *(Output values)*: Defines outputs to display resource endpoints, IDs, and other key details after a successful deployment.
60+
61+
## Configuring Access with Azure CLI
62+
63+
> To deploy Azure Machine Learning resources, proper authentication is required. In many cases, you might need to assign a service principal with the appropriate permissions.
64+
65+
To list available service principals, run:
66+
67+
```sh
68+
az ad sp list --query "[].{Name:displayName, AppId:appId, ObjectId:id}" --output table
69+
```
70+
71+
Below is an example showing how you would reference the service principal (whose Object ID you’ve retrieved) in your Terraform configuration:
72+
73+
```hcl
74+
ml_service_principal_id = "12345678-1234-1234-1234-1234567890ab"
75+
```
76+
77+
## Configure Remote Storage for Terraform Deployment
78+
79+
> For robust state management and collaboration, configuring a remote backend for Terraform is essential. This section outlines how to use Azure Blob Storage for remote state storage.
80+
81+
1. **Create an Azure Storage Account**:
82+
- Use the Azure portal or CLI to set up a new storage account if you do not already have one.
83+
- Note down the storage account name and access key.
84+
2. **Create a Storage Container**:
85+
- Within your storage account, create a container dedicated to holding your Terraform state file.
86+
3. **Configure Terraform Backend**:
87+
- In the `remote-storage.tf` file (located in the `src` folder), include the backend configuration to connect to your Azure Blob Storage container.
88+
89+
## How to Execute the Deployment
90+
91+
```mermaid
92+
graph TD;
93+
A[az login] --> B(terraform init)
94+
B --> C{Terraform Provisioning Stage}
95+
C -->|Review| D[terraform plan]
96+
C -->|Deploy Resources| E[terraform apply]
97+
C -->|Tear Down Infrastructure| F[terraform destroy]
98+
```
99+
100+
> [!IMPORTANT]
101+
> Before executing, update `terraform.tfvars` with your personalized configuration values. This repository provisions an Azure Machine Learning workspace, compute clusters,
102+
> and essential support resources for running ML experiments. A video walk-through is available that clearly explains the deployment steps. <br/>
103+
> *Note: Once your ML experiments are complete, remember to scale down compute clusters or delete the resource group to control costs.*
104+
105+
1. **Login to Azure**: Navigate to your Terraform directory and log in to your Azure account. This command opens a browser window for authentication.
106+
107+
```sh
108+
cd ./infrastructure/azMachineLearning/src/
109+
```
110+
```sh
111+
az login
112+
```
113+
114+
https://github.com/user-attachments/assets/aad4e0e6-46bb-457d-a768-0eedf6a9d2ba
115+
116+
117+
2. **Initialize Terraform**: Set up your working directory and install the necessary provider plugins.
118+
```sh
119+
terraform init
120+
```
121+
122+
https://github.com/user-attachments/assets/e56ed69c-7a82-48fd-ba72-bbd9f862175d
123+
124+
3. **Review the Deployment Plan**: Preview the changes Terraform will make.
125+
```sh
126+
terraform plan -var-file terraform.tfvars
127+
```
128+
129+
https://github.com/user-attachments/assets/bf2faa70-7ee4-4722-9e21-024873a75ac7
130+
131+
4. **Apply the Configuration**: Deploy the specified Azure resources.
132+
133+
```sh
134+
terraform apply -var-file terraform.tfvars
135+
```
136+
137+
<img width="550" alt="image" src="https://github.com/user-attachments/assets/5b1a08db-0a2e-46d9-832a-f2b2d0c9ccbd" />
138+
139+
5. **Destroy the Infrastructure (if needed)**: Clean up resources by tearing down the deployment.
140+
```sh
141+
terraform destroy -var-file terraform.tfvars
142+
```
143+
144+
<div align="center">
145+
<h3 style="color: #4CAF50;">Total Visitors</h3>
146+
<img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
147+
</div>
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
data "azurerm_client_config" "current" {}
2+
3+
resource "azurerm_resource_group" "example" {
4+
name = "RGbrownML"
5+
location = "East US 2"
6+
}
7+
8+
resource "azurerm_application_insights" "example" {
9+
name = "wwsbrownai"
10+
location = azurerm_resource_group.example.location
11+
resource_group_name = azurerm_resource_group.example.name
12+
application_type = "web"
13+
}
14+
15+
resource "azurerm_key_vault" "example" {
16+
name = "wsbrownkeyvault"
17+
location = azurerm_resource_group.example.location
18+
resource_group_name = azurerm_resource_group.example.name
19+
tenant_id = data.azurerm_client_config.current.tenant_id
20+
sku_name = "premium"
21+
}
22+
23+
resource "azurerm_storage_account" "example" {
24+
name = "wsbrownsa"
25+
location = azurerm_resource_group.example.location
26+
resource_group_name = azurerm_resource_group.example.name
27+
account_tier = "Standard"
28+
account_replication_type = "GRS"
29+
}
30+
31+
resource "azurerm_machine_learning_workspace" "example" {
32+
name = "wsbrownml"
33+
location = azurerm_resource_group.example.location
34+
resource_group_name = azurerm_resource_group.example.name
35+
application_insights_id = azurerm_application_insights.example.id
36+
key_vault_id = azurerm_key_vault.example.id
37+
storage_account_id = azurerm_storage_account.example.id
38+
39+
identity {
40+
type = "SystemAssigned"
41+
}
42+
}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
resource "azurerm_storage_account" "ml_storage" {
2+
name = "mlstorageacct123"
3+
resource_group_name = azurerm_resource_group.ml_rg.name
4+
location = azurerm_resource_group.ml_rg.location
5+
account_tier = "Standard"
6+
account_replication_type = "LRS"
7+
}
8+
9+
resource "azurerm_storage_container" "ml_container" {
10+
name = "ml-artifacts"
11+
container_access_type = "private"
12+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
output "client_config" {
2+
value = {
3+
client_id = data.azurerm_client_config.current.client_id
4+
tenant_id = data.azurerm_client_config.current.tenant_id
5+
subscription_id = data.azurerm_client_config.current.subscription_id
6+
}
7+
}
8+
9+
output "resource_group_name" {
10+
value = azurerm_resource_group.example.name
11+
}
12+
13+
output "application_insights_id" {
14+
value = azurerm_application_insights.example.id
15+
}
16+
17+
output "key_vault_id" {
18+
value = azurerm_key_vault.example.id
19+
}
20+
21+
output "storage_account_id" {
22+
value = azurerm_storage_account.example.id
23+
}
24+
25+
output "ml_workspace_id" {
26+
value = azurerm_machine_learning_workspace.example.id
27+
}
28+
29+
output "ml_workspace_name" {
30+
value = azurerm_machine_learning_workspace.example.name
31+
}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# provider.tf
2+
# This file configures the Azure provider to interact with Azure resources.
3+
# It specifies the required provider and its version, along with provider-specific configurations.
4+
5+
terraform {
6+
required_version = ">= 1.8, < 2.0"
7+
# Specify the required provider and its version
8+
required_providers {
9+
azurerm = {
10+
source = "hashicorp/azurerm" # Source of the AzureRM provider
11+
version = "~> 4.16.0" # Version of the AzureRM provider
12+
}
13+
}
14+
}
15+
16+
provider "azurerm" {
17+
features {} # Enable all features for the AzureRM provider
18+
subscription_id = var.subscription_id # Add your subscription ID here
19+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
resource_group_name = "ml-platform-rg"
2+
location = "eastus2"
3+
workspace_name = "ml-workspace"
4+
compute_name = "ml-compute-cluster"
5+
subscription_id = "your-subscription_id"
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# variables.tf
2+
# This file defines the input variables used in the Terraform configuration.
3+
# Each variable includes a description, type, and optional default value.
4+
5+
variable "subscription_id" {
6+
description = "The Azure subscription ID to use for the AzureRM provider."
7+
type = string
8+
}
9+
10+
variable "resource_group_name" {
11+
type = string
12+
description = "Name of the resource group"
13+
}
14+
15+
variable "location" {
16+
type = string
17+
description = "Azure region"
18+
default = "eastus"
19+
}
20+
21+
variable "workspace_name" {
22+
type = string
23+
description = "Name of the Azure ML workspace"
24+
}
25+
26+
variable "compute_name" {
27+
type = string
28+
description = "Name of the compute cluster"
29+
}

0 commit comments

Comments
 (0)