Skip to content

Commit d9587db

Browse files
authored
Merge pull request #94 from STRIDES/update_genai_notebooks_cj
Update genai notebooks - Cameron J.
2 parents 4464584 + 824e543 commit d9587db

File tree

17 files changed

+2609
-231
lines changed

17 files changed

+2609
-231
lines changed

notebooks/GenAI/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
__pycache__
2+
.venv
3+
.env
4+
microsoft-earnings_embeddings.csv
5+
embedding_demos/p1.py
Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Setting Up Azure Environment for Azure GenAI Cloud Lab
2+
3+
**Skill Level: Beginner**
4+
5+
This guide will help you set up your Azure environment to complete the activities in the [GenAI](../) directory of the NIH Cloud Lab.
6+
The purpose of this guide is to walk you through an automated deployment of the resources needed to carry out these activities.
7+
This automated approach utilizes a pre-built [ARM template](arm_resources.json) file, which serves as an alternative approach
8+
to manually deploying and configuring resources via the Azure portal.
9+
10+
## Page Contents
11+
+ [Learning Objectives](#learning_objectives)
12+
+ [Prerequisites](#prerequisites)
13+
+ [Resources and Pricing](#resources_and_pricing)
14+
+ [Get Started](#get_started)
15+
+ [Conclusion](#conclusion)
16+
+ [Clean Up](#clean_up)
17+
18+
## Learning Objectives <a name="learning_objectives"></a>
19+
20+
1. Configure PowerShell or Azure CLI
21+
- Step-by-step instructions to set up and configure PowerShell and Azure CLI for the neccessary Azure resource deployments.
22+
2. Deploy Resources Using an ARM Template
23+
- Detailed guidance on deploying the necessary resources in Azure using an ARM template for the [GenAI](../) directory.
24+
3. Upload Local Files to Azure Storage Account
25+
- Instructions on how to upload files from the [search_documents](../search_documents/) directory to an Azure Storage Account Blob container.
26+
4. Acquire Keys and Secrets for .env Variables
27+
- Steps to obtain keys and secrets from deployed resources and use them in your .env files for the tutorials in the [GenAI](../) directory.
28+
29+
## Prerequisites <a name="prerequisites"></a>
30+
31+
- An active Azure subscription
32+
- PowerShell installed on your machine (option 1)
33+
- Azure CLI installed (option 2)
34+
35+
### Powershell (option 1) vs. Azure CLI (option 2)
36+
37+
Choosing between Azure CLI and PowerShell comes down to personal preference and the working environment:
38+
39+
- **Cloud Environments**: For users working in the cloud, such as with Azure Machine Learning or Azure VMs, Azure CLI may be a more suitable option.
40+
- ***Note***: If users are utilizing any of these environments, please skip Step 1 and move directly to Step 2 using Azure CLI (option 2).
41+
- **Local Environments**: For users working on a local machine, both Azure CLI and PowerShell are viable options. The choice depends on personal preference.
42+
- ***Note***: If users are utilizing Azure CLI, please skip Step 1 and move directly to Step 2.
43+
44+
## Resources and Pricing <a name="resources_and_pricing"></a>
45+
46+
Provided is a list of resources that will be deployed by the provided ARM template along with the estimated cost breakdown for each resource.
47+
***An ARM Template is a JSON file that defines the infrastructure and configuration for your Azure project***. It allows you to deploy, manage, and configure
48+
all the resources for your solution in a single, coordinated operation. When executing the provided ARM template, actual costs may vary depending on usage
49+
and the Azure pricing model for each resource. Please find the resources that will be deployed below.
50+
51+
### Resources Deployed
52+
1. **Azure Storage Account**
53+
- **Resource Type**: Storage Account (Standard_LRS)
54+
- **Purpose**: This resource is used to store and manage files from [search_documents](../search_documents/) in a single container.
55+
- **Estimated Cost**: $0.018 per GB/$18.40 per 1000 GB per month
56+
57+
2. **Azure AI Search**
58+
- **Resource Type**: Cognitive Search (Basic)
59+
- **Purpose**: This resource provides AI search capabilities for the GenAI tutorials, including indexing and querying.
60+
- **Estimated Cost**: $0.10 per hour/$73.73 per month
61+
62+
3. **Azure OpenAI**
63+
- **Resource Type**: Cognitive Services (Standard)
64+
- **Purpose**: This resource provides access to OpenAI models, including GPT-4 and embeddings for AI processing.
65+
- **Models Deployed**:
66+
- **Model**: gpt-4o-mini
67+
- **Version**: 2024-07-18
68+
- **Cost per 1M Tokens**: $0.15 input/$0.60 output
69+
- **Model**: text-embedding-3-small
70+
- **Version**: 1
71+
- **Cost per 1K Tokens**: $0.00002
72+
- **Estimated Cost**: Varies based on model usage and API calls. Please refer to [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/?msockid=3df6a53ac4916aa73e41b1e3c5c36bd4) for more details.
73+
74+
Please refer to the [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/) for a more detailed and personalized estimate based on your specific usage patterns and region.
75+
76+
## Get Started <a name="get_started"></a>
77+
78+
### 1. Setting Up the Azure Module in PowerShell
79+
80+
First, you need to install the Azure module in PowerShell to connect to your Azure account.
81+
82+
```powershell
83+
# Install the Az module (if using PowerShell)
84+
Install-Module -Name Az -AllowClobber -Force
85+
```
86+
87+
### 2. Logging into Azure
88+
89+
You can log into your Azure account either using PowerShell or Azure CLI.
90+
91+
**Using PowerShell**
92+
```powershell
93+
# Log into your Azure account
94+
Connect-AzAccount
95+
```
96+
**Using Azure CLI**
97+
```powershell
98+
# Log into your Azure account
99+
!az login
100+
```
101+
102+
### 3. Setting Variables
103+
104+
Set the following variables, which you'll need throughout the setup process.
105+
106+
**Using PowerShell**
107+
```powershell
108+
# Variables
109+
$resourceGroupName="nihcloudlabrg"
110+
$location="eastus2"
111+
$templateFilePath="Path To ./arm_resources.json"
112+
$storageAccountName="cloudlabstgacct"
113+
$containerName="cloudlabdocuments"
114+
$localFilePath="Path To ../search_documents"
115+
$searchServiceName="cloudlabsearch"
116+
$openAIResourceName="cloudlabaoai"
117+
```
118+
**Using Azure CLI**
119+
```powershell
120+
# Variables
121+
resourceGroupName = 'nihcloudlabrg'
122+
location = 'eastus2'
123+
templateFilePath = "arm_resources.json"
124+
storageAccountName = "cloudlabstgacct"
125+
containerName = "cloudlabdocuments"
126+
localFilePath = "../search_documents"
127+
searchServiceName = "cloudlabsearch"
128+
openAIResourceName = "cloudlabaoai"
129+
openAImodel_name = "gpt-4o-mini"
130+
openAIEmbeddingmodel_name = "text-embedding-3-small"
131+
```
132+
133+
### 4. Creating an Empty Resource Group
134+
135+
Create an empty resource group where the ARM template will deploy the necessary resources.
136+
137+
**Using PowerShell**
138+
```powershell
139+
# Create a resource group
140+
New-AzResourceGroup -Name $resourceGroupName -Location $location
141+
```
142+
**Using Azure CLI**
143+
```powershell
144+
# Create a resource group
145+
! az group create --name {resourceGroupName} --location {location}
146+
```
147+
148+
### 5. Deploying the ARM Template
149+
150+
Deploy the [ARM template](arm_resources.json) to create the Azure Storage Account, Azure AI Search, and Azure OpenAI resources.
151+
152+
***Using PowerShell***
153+
```powershell
154+
# Deploy the ARM template
155+
New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateFile $templateFilePath
156+
```
157+
***Using Azure CLI***
158+
```powershell
159+
# Deploy the ARM template
160+
!az deployment group create \
161+
--resource-group {resourceGroupName} \
162+
--template-file {templateFilePath} \
163+
--parameters accounts_cloudlabaoai_name={openAIResourceName} \
164+
--parameters model_name={openAImodel_name} \
165+
--parameters embeddingModel_name={openAIEmbeddingmodel_name} \
166+
--parameters searchServices_cloudlabsearch_name={searchServiceName} \
167+
--parameters storageAccounts_genaicloudlab_name={storageAccountName}
168+
169+
```
170+
171+
### 6. Uploading Local Files to Azure Storage
172+
173+
Upload your local files to the blob container in the Azure Storage Account.
174+
175+
**Using PowerShell**
176+
```powershell
177+
# Get storage account context
178+
$storageContext = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context
179+
180+
# Upload all files in the directory
181+
Get-ChildItem -Path $localFilePath -File | ForEach-Object {
182+
Set-AzStorageBlobContent -File $_.FullName -Container $containerName -Context $storageContext
183+
}
184+
```
185+
**Using Azure CLI**
186+
```powershell
187+
# Get storage account key
188+
storageAccountKey=!az storage account keys list --resource-group {resourceGroupName} --account-name {storageAccountName} --output tsv --query "[0].value"
189+
```
190+
191+
```bash magic_args="-s \"$localFilePath\" \"$storageAccountName\" \"$storageAccountKey\" \"$containerName\""
192+
for file in $1/*;
193+
do
194+
az storage blob upload --account-name $2 --account-key $3 --container-name $4 --file "$file" --name $(basename "$file")
195+
done
196+
```
197+
198+
### 7. Retrieving API Keys
199+
200+
Retrieve the API keys for each service created by the ARM template deployment. These secrets are confidential and should be handled appropriately.
201+
Once the output is received, the values should be added to your `.env` file, which should be created in the [GenAI](../) directory.
202+
Note that this `.env` file is already added to the `.gitignore` file, which tells Git which files or directories to ignore in a project,
203+
preventing them from being tracked or included in version control. Adding `.env` to `.gitignore` is crucial because it prevents sensitive information
204+
like API keys and passwords from being exposed in your version control system.
205+
206+
**Azure Storage Account**
207+
208+
***Using PowerShell***
209+
```powershell
210+
# Get the storage account key
211+
$storageAccountKey = (Get-AzStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccountName)[0].Value
212+
# Construct the Blob connection string
213+
$connectionString = "DefaultEndpointsProtocol=https;AccountName=$storageAccountName;AccountKey=$storageAccountKey;EndpointSuffix=core.windows.net"
214+
# Output the connection string
215+
Write-Output $connectionString
216+
```
217+
***Using Azure CLI***
218+
```powershell
219+
# Get the storage account key that was made before in step 6
220+
!echo "BLOB_CONTAINER_NAME={containerName}" >> .env
221+
!echo "BLOB_ACCOUNT_NAME={storageAccountName}" >> .env
222+
# Construct the Blob connection string
223+
connectionString=f"DefaultEndpointsProtocol=https;AccountName={storageAccountName};AccountKey={storageAccountKey[0]};EndpointSuffix=core.windows.net"
224+
!echo "BLOB_CONNECTION_STRING={connectionString}" >> .env
225+
226+
```
227+
228+
Lets take a look at our .env file!
229+
230+
```powershell
231+
!cat .env
232+
```
233+
234+
You now have the secrets and variables set in a .env. If you run into any errors you can also copy them to your .env file using the info below:
235+
- ***BLOB_CONTAINER_NAME*** = Use the value of `$containerName` or `containerName`.
236+
- ***BLOB_ACCOUNT_NAME*** = Use the value of `$storageAccountName` or `storageAccountName`.
237+
- ***BLOB_CONNECTION_STRING*** = Use the value of `$connectionString ` or `connectionString`.
238+
239+
**Azure AI Search**
240+
241+
***Using PowerShell***
242+
```powershell
243+
# Acquire the AI Search Admin Key
244+
$adminKeys = Get-AzSearchAdminKeyPair -ResourceGroupName $resourceGroupName -ServiceName $searchServiceName
245+
Write-Output $adminKeys
246+
# Construct the AI Search Admin Key
247+
$searchServiceEndpoint="https://$searchServiceName.search.windows.net"
248+
Write-Output $searchServiceEndpoint
249+
```
250+
***Using Azure CLI***
251+
```powershell
252+
# Acquire the AI Search Admin Key
253+
searchServiceKey = !az search admin-key show --resource-group {resourceGroupName} --service-name {searchServiceName} --query primaryKey -o tsv
254+
!echo "AZURE_SEARCH_API_KEY={searchServiceKey[0]}" >> .env
255+
# Construct the AI Search endpoint
256+
searchServiceEndpoint=f"https://{searchServiceName}.search.windows.net"
257+
!echo "AZURE_SEARCH_SERVICE_ENDPOINT={searchServiceEndpoint}" >> .env
258+
```
259+
260+
Lets take a look at our .env file!
261+
262+
```powershell
263+
!cat .env
264+
```
265+
266+
<!-- #region -->
267+
You now have the secrets and variables set in a .env. If you run into any errors you can also copy them to your .env file using the info below:
268+
- ***AZURE_SEARCH_ADMIN_KEY*** = Use the value of `$searchServiceKey` or `searchServiceKey`.
269+
- ***AZURE_SEARCH_ENDPOINT*** = Use the value of `$searchServiceEndpoint` or `searchServiceEndpoint`.
270+
271+
272+
**Azure OpenAI**
273+
274+
***Using PowerShell***
275+
<!-- #endregion -->
276+
```powershell
277+
# Get the Azure OpenAI key 1
278+
$openAIKey = az cognitiveservices account keys list --resource-group $resourceGroupName --name $openAIResourceName --query "key1" --output tsv
279+
Write-Output $openAIKey
280+
# Construct the Azure OpenAI endpoint
281+
$openAIEndpoint = "https://$openAIResourceName.openai.azure.com/"
282+
Write-Output $openAIEndpoint
283+
```
284+
***Using Azure CLI***
285+
```powershell
286+
!echo "AZURE_GPT_DEPLOYMENT={openAImodel_name}" >> .env
287+
!echo "AZURE_EMBEDDINGS_DEPLOYMENT={openAIEmbeddingmodel_name}" >>.env
288+
289+
# Construct the Azure OpenAI endpoint
290+
openAIEndpoint = f"https://{openAIResourceName}.openai.azure.com/"
291+
!echo "AZURE_OPENAI_ENDPOINT={openAIEndpoint}" >> .env
292+
# Get the Azure OpenAI key
293+
openAIKey=!az cognitiveservices account keys list --resource-group {resourceGroupName} --name {openAIResourceName} --query "key1" --output tsv
294+
!echo "AZURE_OPENAI_API_KEY={openAIKey[0]}" >> .env
295+
296+
```
297+
298+
```powershell
299+
cat .env
300+
```
301+
302+
<!-- #region -->
303+
You now have the secrets and variables set in a .env. If you run into any errors you can also copy them to your .env file using the info below:
304+
- ***AZURE_GPT_DEPLOYMENT*** = Use the value of `gpt-4o-mini`.
305+
- ***AZURE_EMBEDDINGS_DEPLOYMENT*** = Use the value of `text-embedding-3-small`.
306+
- ***AZURE_OPENAI_ENDPOINT*** = Use the value of `$openAIEndpoint` or `openAIEndpoint`.
307+
- ***AZURE_OPENAI_KEY*** = Use the value of `$openAIKey` or `openAIKey`.
308+
309+
310+
**Note**: To find the ***API version (Azure_OPENAI_VERSION)*** for your resource in the Azure OpenAI playground, follow these steps:
311+
1. **Navigate to Deployments**: In the left side panel of the Azure OpenAI playground, click on “Deployments.”
312+
2. **Select the Model Deployment**: Click on the specific model deployment you are working with.
313+
3. **Locate the Endpoint Section**: In the endpoint section, you will see the Target URI.
314+
4. **Find the API Version**: Look for the part of the URL that looks similar to `api-version=2025-01-01-preview`. This will be your API version. This may differ between models
315+
316+
Your final local `.env` file should look something like this:
317+
<!-- #endregion -->
318+
```powershell
319+
!cat .env
320+
```
321+
322+
## Conclusion <a name="conclusion"></a>
323+
324+
Congratulations on completing the Azure setup! During this process, we established a new resource group dedicated to the NIH Cloud Lab environment and
325+
configured three Azure resources in your tenant using an ARM template file. The resources include:
326+
327+
- An Azure Storage Account with a deployed Blob container and files uploaded from `../search_documents`
328+
- Azure AI Search
329+
- Azure OpenAI with deployed `gpt-4o-mini` and `text-embedding-3-small` models
330+
331+
Additionally, we configured `.env` variables in your local `.env` file, which is added to `.gitignore` by default.
332+
333+
You are now ready to proceed with the GenAI tutorials!
334+
335+
## Clean Up <a name="clean_up"></a>
336+
No clean up neccessary, as the created resources will be used for tutorials found in [GenAI](../) folder, specifically [embeddings demos](../embedding_demos/readme.md) and [AI Search RAG chatbot](../notebooks/AISearch_RAG_chatbot.ipynb) tutorial .

0 commit comments

Comments
 (0)