Merge pull request #458 from microsoft/psl-gpt4.1update

Roopan-Microsoft · web-flow · commit b6909290026b · 2025-06-10T13:17:06.000+05:30
chore: update GPT model config to gpt4.1
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -9,6 +9,7 @@ on:
         - main
         - dev
         - demo
+
     schedule:
       - cron: '0 9,21 * * *'  # Runs at 9:00 AM and 9:00 PM GMT
 
@@ -142,7 +143,7 @@ jobs:
                 environmentName="${{ env.SOLUTION_PREFIX }}" \
                 secondaryLocation="northcentralus" \
                 deploymentType="GlobalStandard" \
-                gptModelName="gpt-4o" \
+                gptModelName="gpt-4.1" \
                 azureOpenaiAPIVersion="2024-05-01-preview" \
                 gptDeploymentCapacity=${{ env.GPT_MIN_CAPACITY }} \
                 embeddingModel="text-embedding-ada-002" \
diff --git a/README.md b/README.md
@@ -99,7 +99,7 @@ _Note: This is not meant to outline all costs as selected SKUs, scaled use, cust
 | [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/) | Standard tier, S1. Pricing is based on the number of documents and operations. Information retrieval at scale for vector and text content in traditional or generative search scenarios. | [Pricing](https://azure.microsoft.com/pricing/details/search/) |
 | [Azure Storage Account](https://learn.microsoft.com/en-us/azure/storage/blobs/) | Standard tier, LRS. Pricing is based on storage and operations. Blob storage in the clopud, optimized for storing massive amounts of unstructured data. | [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/) |
 | [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/) | Standard tier. Pricing is based on the number of operations. Maintain keys that access and encrypt your cloud resources, apps, and solutions. | [Pricing](https://azure.microsoft.com/pricing/details/key-vault/) |
-| [Azure AI Services](https://learn.microsoft.com/en-us/azure/ai-services/) | S0 tier, defaults to gpt-4o and text-embedding-ada-002 models. Pricing is based on token count. | [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/) |
+| [Azure AI Services](https://learn.microsoft.com/en-us/azure/ai-services/) | S0 tier, defaults to gpt-4.1 and text-embedding-ada-002 models. Pricing is based on token count. | [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/) |
 | [Azure Container App](https://learn.microsoft.com/en-us/azure/container-apps/) | Consumption tier with 0.5 CPU, 1GiB memory/storage. Pricing is based on resource allocation, and each month allows for a certain amount of free usage. Allows you to run containerized applications without worrying about orchestration or infrastructure. | [Pricing](https://azure.microsoft.com/pricing/details/container-apps/) |
 | [Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/) | Basic tier. Build, store, and manage container images and artifacts in a private registry for all types of container deployments | [Pricing](https://azure.microsoft.com/pricing/details/container-registry/) |
 | [Log analytics](https://learn.microsoft.com/en-us/azure/azure-monitor/) | Pay-as-you-go tier. Costs based on data ingested. Collect and analyze on telemetry data generated by Azure. | [Pricing](https://azure.microsoft.com/pricing/details/monitor/) |
diff --git a/docs/AzureGPTQuotaSettings.md b/docs/AzureGPTQuotaSettings.md
@@ -5,6 +5,6 @@
 3. **Go to** the `Management Center` from the bottom-left navigation menu.  
 4. Select `Quota`  
    - Click on the `GlobalStandard` dropdown.  
-   - Select the required **GPT model** (`GPT-4, GPT-4o`) or **Embeddings model** (`text-embedding-ada-002`).  
+   - Select the required **GPT model** (`GPT-4.1`) or **Embeddings model** (`text-embedding-ada-002`).  
    - Choose the **region** where the deployment is hosted.  
 5. Request More Quota or delete any unused model deployments as needed.  
diff --git a/docs/CustomizingAzdParameters.md b/docs/CustomizingAzdParameters.md
@@ -13,7 +13,7 @@ By default this template will use the environment name as the prefix to prevent
 | `AZURE_ENV_NAME`                       | string  | `docgen`                   | Sets the environment name prefix for all Azure resources.                     |
 | `AZURE_ENV_SECONDARY_LOCATION`         | string  | `eastus2`                  | Specifies a secondary Azure region.                                           |
 | `AZURE_ENV_MODEL_DEPLOYMENT_TYPE`      | string  | `Standard`                 | Defines the model deployment type (allowed: `Standard`, `GlobalStandard`).    |
-| `AZURE_ENV_MODEL_NAME`                 | string  | `gpt-4o`                   | Specifies the GPT model name (allowed: `gpt-4`, `gpt-4o`).                    |
+| `AZURE_ENV_MODEL_NAME`                 | string  | `gpt-4.1`                   | Specifies the GPT model name (allowed: `gpt-4`, `gpt-4o`).                    |
 | `AZURE_ENV_MODEL_VERSION`                 | string  | `2024-05-13`                   | Set the Azure model version (allowed values: `2024-08-06`).                    |
 | `AZURE_ENV_OPENAI_API_VERSION`                 | string  | `2024-05-01-preview`                   | Specifies the API version for Azure OpenAI.                    |
 | `AZURE_ENV_MODEL_CAPACITY`             | integer | `30`                         | Sets the GPT model capacity (based on what's available in your subscription). |
@@ -23,8 +23,10 @@ By default this template will use the environment name as the prefix to prevent
 | `AZURE_ENV_LOG_ANALYTICS_WORKSPACE_ID` | string  | `<Existing Workspace Id>`  | Reuses an existing Log Analytics Workspace instead of creating a new one.     |
 
 
+
 ## How to Set a Parameter
 
+
 To customize any of the above values, run the following command **before** `azd up`:
 
 ```bash
diff --git a/docs/DeploymentGuide.md b/docs/DeploymentGuide.md
@@ -105,8 +105,8 @@ When you start the deployment, most parameters will have **default values**, but
 | **Environment Name**                 | A **3–20 character alphanumeric** value used to generate a unique ID to prefix the resources. | `byctemplate`            |
 | **Secondary Location**               | A **less busy** region for **CosmosDB**, useful in case of availability constraints.          | `eastus2`                |
 | **Deployment Type**                  | Model deployment type (allowed: `Standard`, `GlobalStandard`).                                | `GlobalStandard`         |
-| **GPT Model**                        | Choose from **gpt-4**, **gpt-4o**.                                                            | `gpt-4o`                 |
-| **GPT Model Version**                | Version of the GPT model to use (e.g., `2024-08-06`).                                         | `2024-05-13`             |
+| **GPT Model**                        | The GPT model used by the app                                                                 | `gpt-4.1`                |
+| **GPT Model Version**                | The GPT Version used by the app                                                               | `2024-05-13`             |
 | **OpenAI API Version**               | Azure OpenAI API version used for deployments.                                                | `2024-05-01-preview`     |
 | **GPT Model Deployment Capacity**    | Configure the capacity for **GPT model deployments** (in thousands).                          | `30k`                    |
 | **Embedding Model**                  | The embedding model used by the app.                                                          | `text-embedding-ada-002` |
@@ -115,13 +115,14 @@ When you start the deployment, most parameters will have **default values**, but
 | **Existing Log Analytics Workspace** | If reusing a Log Analytics Workspace, specify the ID.                                         | *(none)*                 |
 
 
+
 </details>
 
 <details>
   <summary><b>[Optional] Quota Recommendations</b></summary>
 
-By default, the _Gpt-4o model capacity_ in deployment is set to _30k tokens_, so we recommend:
-- **For Global Standard | GPT-4o** - the capacity to at least 150k tokens post-deployment for optimal performance.
+By default, the _Gpt-4.1 model capacity_ in deployment is set to _30k tokens_, so we recommend:
+- **For Global Standard | GPT-4.1** - the capacity to at least 150k tokens post-deployment for optimal performance.
 
 - **For Standard | GPT-4** - ensure a minimum of 30k–40k tokens for best results.
 
diff --git a/docs/QuotaCheck.md b/docs/QuotaCheck.md
@@ -1,7 +1,8 @@
 ## Check Quota Availability Before Deployment
 
 Before deploying the accelerator, **ensure sufficient quota availability** for the required model.
-> **For Global Standard | GPT-4o - the capacity to at least 150k tokens post-deployment for optimal performance.**
+
+> **For Global Standard |GPT-4.1- the capacity to at least 150k tokens post-deployment for optimal performance.**
 
 > **For Standard | GPT-4 - ensure a minimum of 30k–40k tokens for best results.**
 
@@ -13,7 +14,7 @@ azd auth login
 
 ### 📌 Default Models & Capacities:
 ```
-gpt-4o:30, text-embedding-ada-002:80, gpt-4:30
+gpt-4.1:30, text-embedding-ada-002:80, gpt-4:30
 ```
 ### 📌 Default Regions:
 ```
@@ -39,15 +40,15 @@ eastus, uksouth, eastus2, northcentralus, swedencentral, westus, westus2, southc
    ```
 ✔️ Check specific model(s) in default regions:
   ```
-  ./quota_check_params.sh --models gpt-4o:30,text-embedding-ada-002:80
+  ./quota_check_params.sh --models gpt-4.1:30,text-embedding-ada-002:80
   ```
 ✔️ Check default models in specific region(s):
   ```
 ./quota_check_params.sh --regions eastus,westus
   ```
 ✔️ Passing Both models and regions:  
   ```
-  ./quota_check_params.sh --models gpt-4o:30 --regions eastus,westus2
+  ./quota_check_params.sh --models gpt-4.1:30 --regions eastus,westus2
   ```
 ✔️ All parameters combined:
   ```
diff --git a/infra/main.bicep b/infra/main.bicep
@@ -26,10 +26,10 @@ param secondaryLocation string
 param deploymentType string = 'GlobalStandard'
 
 @description('Name of the GPT model to deploy:')
-param gptModelName string = 'gpt-4o'
+param gptModelName string = 'gpt-4.1'
 
 @description('Version of the GPT model to deploy:')
-param gptModelVersion string = '2024-05-13'
+param gptModelVersion string = '2025-04-14'
 
 param azureOpenaiAPIVersion string = '2024-05-01-preview'
 
@@ -385,7 +385,7 @@ module appserviceModule 'deploy_app_service.bicep' = {
     aiSearchService: aifoundry.outputs.aiSearchService
     AzureSearchKey: keyVault.getSecret('AZURE-SEARCH-KEY')
     AzureOpenAIEndpoint:aifoundry.outputs.aiServicesTarget
-    AzureOpenAIModel: gptModelName //'gpt-4o-mini'
+    AzureOpenAIModel: gptModelName 
     AzureOpenAIKey:keyVault.getSecret('AZURE-OPENAI-KEY')
     azureOpenAIApiVersion: azureOpenaiAPIVersion //'2024-02-15-preview'
     AZURE_OPENAI_RESOURCE:aifoundry.outputs.aiServicesName
diff --git a/infra/main.bicepparam b/infra/main.bicepparam
@@ -4,9 +4,11 @@ param AZURE_LOCATION = readEnvironmentVariable('AZURE_LOCATION', '')
 param environmentName = readEnvironmentVariable('AZURE_ENV_NAME', 'env_name')
 param secondaryLocation = readEnvironmentVariable('AZURE_ENV_SECONDARY_LOCATION', 'eastus2')
 param deploymentType = readEnvironmentVariable('AZURE_ENV_MODEL_DEPLOYMENT_TYPE', 'GlobalStandard')
-param gptModelName = readEnvironmentVariable('AZURE_ENV_MODEL_NAME', 'gpt-4o')
-param gptModelVersion = readEnvironmentVariable('AZURE_ENV_MODEL_VERSION', '2024-05-13')
+
+param gptModelName = readEnvironmentVariable('AZURE_ENV_MODEL_NAME', 'gpt-4.1')
+param gptModelVersion = readEnvironmentVariable('AZURE_ENV_MODEL_VERSION', '2025-04-14')
 param azureOpenaiAPIVersion = readEnvironmentVariable('AZURE_ENV_OPENAI_API_VERSION', '2024-05-01-preview')
+
 param gptDeploymentCapacity = int(readEnvironmentVariable('AZURE_ENV_MODEL_CAPACITY', '30'))
 param embeddingModel = readEnvironmentVariable('AZURE_ENV_EMBEDDING_MODEL_NAME', 'text-embedding-ada-002')
 param imageTag = readEnvironmentVariable('AZURE_ENV_IMAGETAG', 'latest')
diff --git a/infra/main.json b/infra/main.json
@@ -4,8 +4,8 @@
   "metadata": {
     "_generator": {
       "name": "bicep",
-      "version": "0.35.1.17967",
-      "templateHash": "3433053339326968482"
+      "version": "0.36.1.42791",
+      "templateHash": "5449809042324258772"
     }
   },
   "parameters": {
@@ -41,14 +41,14 @@
     },
     "gptModelName": {
       "type": "string",
-      "defaultValue": "gpt-4o",
+      "defaultValue": "gpt-4.1",
       "metadata": {
         "description": "Name of the GPT model to deploy:"
       }
     },
     "gptModelVersion": {
       "type": "string",
-      "defaultValue": "2024-05-13",
+      "defaultValue": "2025-04-14",
       "metadata": {
         "description": "Version of the GPT model to deploy:"
       }
@@ -361,8 +361,8 @@
           "metadata": {
             "_generator": {
               "name": "bicep",
-              "version": "0.35.1.17967",
-              "templateHash": "14416829741819681429"
+              "version": "0.36.1.42791",
+              "templateHash": "8965508470098961595"
             }
           },
           "parameters": {
@@ -456,8 +456,8 @@
           "metadata": {
             "_generator": {
               "name": "bicep",
-              "version": "0.35.1.17967",
-              "templateHash": "14711167186840027914"
+              "version": "0.36.1.42791",
+              "templateHash": "15511025830087119739"
             }
           },
           "parameters": {
@@ -601,8 +601,8 @@
           "metadata": {
             "_generator": {
               "name": "bicep",
-              "version": "0.35.1.17967",
-              "templateHash": "3118038315112495212"
+              "version": "0.36.1.42791",
+              "templateHash": "8750828267619251070"
             }
           },
           "parameters": {
@@ -1448,8 +1448,8 @@
           "metadata": {
             "_generator": {
               "name": "bicep",
-              "version": "0.35.1.17967",
-              "templateHash": "12684246002053954621"
+              "version": "0.36.1.42791",
+              "templateHash": "11115444345720629816"
             }
           },
           "parameters": {
@@ -1688,8 +1688,8 @@
           "metadata": {
             "_generator": {
               "name": "bicep",
-              "version": "0.35.1.17967",
-              "templateHash": "16988932665267526316"
+              "version": "0.36.1.42791",
+              "templateHash": "9597436405986955034"
             }
           },
           "parameters": {
@@ -2191,8 +2191,8 @@
           "metadata": {
             "_generator": {
               "name": "bicep",
-              "version": "0.35.1.17967",
-              "templateHash": "12799194170352887919"
+              "version": "0.36.1.42791",
+              "templateHash": "14768176812719476461"
             }
           },
           "parameters": {
diff --git a/infra/scripts/index_scripts/02_process_data.py b/infra/scripts/index_scripts/02_process_data.py
@@ -32,7 +32,7 @@ def get_secrets_from_kv(kv_name, secret_name):
 openai_api_key = get_secrets_from_kv(key_vault_name, "AZURE-OPENAI-KEY")
 openai_api_base = get_secrets_from_kv(key_vault_name, "AZURE-OPENAI-ENDPOINT")
 openai_api_version = get_secrets_from_kv(key_vault_name, "AZURE-OPENAI-PREVIEW-API-VERSION")
-deployment = get_secrets_from_kv(key_vault_name, "AZURE-OPEN-AI-DEPLOYMENT-MODEL")  # "gpt-4o-mini"
+deployment = get_secrets_from_kv(key_vault_name, "AZURE-OPEN-AI-DEPLOYMENT-MODEL")
 
 
 # Function: Get Embeddings
diff --git a/scripts/checkquota.sh b/scripts/checkquota.sh
@@ -32,7 +32,9 @@ echo "✅ Azure subscription set successfully."
 
 # Define models and their minimum required capacities
 declare -A MIN_CAPACITY=(
-    ["OpenAI.GlobalStandard.gpt-4o"]=$GPT_MIN_CAPACITY
+
+    ["OpenAI.GlobalStandard.gpt4.1"]=$GPT_MIN_CAPACITY
+
     ["OpenAI.Standard.text-embedding-ada-002"]=$TEXT_EMBEDDING_MIN_CAPACITY
 )
 
diff --git a/scripts/quota_check_params.sh b/scripts/quota_check_params.sh
@@ -47,7 +47,7 @@ log_verbose() {
 }
 
 # Default Models and Capacities (Comma-separated in "model:capacity" format)
-DEFAULT_MODEL_CAPACITY="gpt-4o:30,text-embedding-ada-002:80,gpt-4:30"
+DEFAULT_MODEL_CAPACITY="gpt4.1:30,text-embedding-ada-002:80"
 
 # Convert the comma-separated string into an array
 IFS=',' read -r -a MODEL_CAPACITY_PAIRS <<< "$DEFAULT_MODEL_CAPACITY"