MicrosoftDocs
diff --git a/‎articles/advisor/advisor-reference-performance-recommendations.md
Lines changed: 38 additions & 118 deletions b/‎articles/advisor/advisor-reference-performance-recommendations.md
Lines changed: 38 additions & 118 deletions
diff --git a/‎articles/ai-services/openai/how-to/dynamic-quota.md
Lines changed: 2 additions & 2 deletions b/‎articles/ai-services/openai/how-to/dynamic-quota.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai-services/openai/how-to/migration.md
Lines changed: 5 additions & 5 deletions b/‎articles/ai-services/openai/how-to/migration.md
Lines changed: 5 additions & 5 deletions
diff --git a/‎articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
Lines changed: 7 additions & 5 deletions b/‎articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
Lines changed: 7 additions & 5 deletions
diff --git a/‎articles/ai-services/openai/media/how-to/provisioned-onboarding/capacity-calculator.png
26.1 KB b/‎articles/ai-services/openai/media/how-to/provisioned-onboarding/capacity-calculator.png
26.1 KB
diff --git a/‎articles/api-management/v2-service-tiers-overview.md
Lines changed: 5 additions & 0 deletions b/‎articles/api-management/v2-service-tiers-overview.md
Lines changed: 5 additions & 0 deletions
diff --git a/‎articles/application-gateway/application-gateway-faq.yml
Lines changed: 4 additions & 1 deletion b/‎articles/application-gateway/application-gateway-faq.yml
Lines changed: 4 additions & 1 deletion
diff --git a/‎articles/application-gateway/configuration-frontend-ip.md
Lines changed: 4 additions & 1 deletion b/‎articles/application-gateway/configuration-frontend-ip.md
Lines changed: 4 additions & 1 deletion
@@ -7,7 +7,7 @@ author: mrbullwinkle
 manager: nitinme
 ms.service: azure-ai-openai
 ms.topic: how-to
-ms.date: 01/30/2024
+ms.date: 06/27/2024
 ms.author: mbullwin
 ---
 
@@ -34,7 +34,7 @@ For dynamic quota, consider scenarios such as:
 
 ### When does dynamic quota come into effect?
 
-The Azure OpenAI backend decides if, when, and how much extra dynamic quota is added or removed from different deployments. It isn't forecasted or announced in advance, and isn't predictable. Azure OpenAI lets your application know there's more quota available by responding with an HTTP 429 and not letting more API calls through. To take advantage of dynamic quota, your application code must be able to issue more requests as HTTP 429 responses become infrequent.
+The Azure OpenAI backend decides if, when, and how much extra dynamic quota is added or removed from different deployments. It isn't forecasted or announced in advance, and isn't predictable. To take advantage of dynamic quota, your application code must be able to issue more requests as HTTP 429 responses become infrequent. Azure OpenAI lets your application know when you've hit your quota limit by responding with an HTTP 429 and not letting more API calls through.
 
 ### How does dynamic quota change costs?
 
 
@@ -74,7 +74,7 @@ client = AzureOpenAI(
 )
 
 response = client.chat.completions.create(
-    model="gpt-35-turbo", # model = "deployment_name".
+    model="gpt-35-turbo", # model = "deployment_name"
     messages=[
         {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
@@ -135,7 +135,7 @@ deployment_name='REPLACE_WITH_YOUR_DEPLOYMENT_NAME' #This will correspond to the
 # Send a completion call to generate an answer
 print('Sending a test completion job')
 start_phrase = 'Write a tagline for an ice cream shop. '
-response = client.completions.create(model=deployment_name, prompt=start_phrase, max_tokens=10)
+response = client.completions.create(model=deployment_name, prompt=start_phrase, max_tokens=10) # model = "deployment_name"
 print(response.choices[0].text)
 ```
 
@@ -221,7 +221,7 @@ async def main():
       api_version = "2024-02-01",
       azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
     )
-    response = await client.chat.completions.create(model="gpt-35-turbo", messages=[{"role": "user", "content": "Hello world"}])
+    response = await client.chat.completions.create(model="gpt-35-turbo", messages=[{"role": "user", "content": "Hello world"}]) # model = model deployment name
 
     print(response.model_dump_json(indent=2))
 
@@ -246,7 +246,7 @@ client = AzureOpenAI(
 )
 
 completion = client.chat.completions.create(
-    model="deployment-name",  # gpt-35-instant
+    model="deployment-name",  # model = "deployment_name"
     messages=[
         {
             "role": "user",
@@ -281,7 +281,7 @@ client = openai.AzureOpenAI(
 )
 
 completion = client.chat.completions.create(
-    model=deployment,
+    model=deployment, # model = "deployment_name"
     messages=[
         {
             "role": "user",
 
@@ -3,7 +3,7 @@ title: Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding
 description: Learn about provisioned throughput units onboarding and Azure OpenAI. 
 ms.service: azure-ai-openai
 ms.topic: conceptual 
-ms.date: 05/02/2024
+ms.date: 06/25/2024
 manager: nitinme
 author: mrbullwinkle 
 ms.author: mbullwin 
@@ -44,11 +44,13 @@ The **Provisioned** option and the capacity planner are only available in certai
 |---|---|
 |Model | OpenAI model you plan to use. For example: GPT-4 |
 | Version | Version of the model you plan to use, for example 0614 |
-| Prompt tokens | Number of tokens in the prompt for each call |
-| Generation tokens | Number of tokens generated by the model on each call |
-| Peak calls per minute | Peak concurrent load to the endpoint measured in calls per minute|
+| Peak calls per min | The number of calls per minute that are expected to be sent to the model |
+| Tokens in prompt call | The number of tokens in the prompt for each call to the model. Calls with larger prompts will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
+| Tokens in model response | The number of tokens generated from each call to the model. Calls with larger generation sizes will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
 
-After you fill in the required details, select **Calculate** to view the suggested PTU for your scenario.
+After you fill in the required details, select **Calculate** button in the output column.
+
+The values in the output column are the estimated value of PTU units required for the provided workload inputs. The first output value represents the estimated PTU units required for the workload, rounded to the nearest PTU scale increment. The second output value represents the raw estimated PTU units required for the workload. The token totals are calculated using the following equation: `Total = Peak calls per minute * (Tokens in prompt call + Tokens in model response)`.
 
 :::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::
 
 
@@ -53,7 +53,12 @@ The v2 tiers are available in the following regions:
 * France Central
 * Germany West Central
 * North Europe
+* West Europe
+* UK South
+* UK West
 * Central India
+* Brazil South
+* Australia Central
 * Australia East
 * Australia Southeast
 * East Asia
 
@@ -6,7 +6,7 @@ metadata:
   author: greg-lindsay
   ms.service: application-gateway
   ms.topic: faq
-  ms.date: 03/15/2024
+  ms.date: 06/27/2024
   ms.author: greglin
   ms.custom: references_regions, devx-track-azurepowershell
 title: Frequently asked questions about Application Gateway
@@ -557,6 +557,9 @@ sections:
       - question: Which ports are supported for TLS/TCP listeners?
         answer: The same list of [allowed port range and exceptions](application-gateway-components.md#ports) apply for the Layer 4 proxy too.
 
+      - question: How can I use the same port number for Public and Private TLS/TCP proxy listeners?
+        answer: The use of a common port for TLS/TCP listeners is currently not supported.
+
   - name: Configuration - ingress controller for AKS
     questions:
       - question: What is an ingress controller?
 
@@ -5,7 +5,7 @@ services: application-gateway
 author: greg-lindsay
 ms.service: application-gateway
 ms.topic: conceptual
-ms.date: 09/14/2023
+ms.date: 06/27/2024
 ms.author: greglin
 ---
 
@@ -38,6 +38,9 @@ A frontend IP address is associated to a *listener*, which checks for incoming r
 
 You can create private and public listeners with the same port number. However, be aware of any network security group (NSG) associated with the Application Gateway subnet. Depending on your NSG's configuration, you might need an allow-inbound rule with **Destination IP addresses** as your application gateway's public and private frontend IPs. When you use the same port, your application gateway changes the **Destination** of the inbound flow to the frontend IPs of your gateway.
 
+> [!NOTE]
+> Currently, the use of the same port number for public and private TCP/TLS protocol or IPv6 listeners is not supported.
+
  **Inbound rule**:
 
 - **Source**: According to your requirement