Merge pull request #263304 from mrbullwinkle/mrb_01_16_2024_limits

prmerger-automator[bot] · web-flow · commit 736c45eb08dc · 2024-01-16T22:29:45.000Z
[Azure OpenAI] Update limits
diff --git a/articles/ai-services/openai/concepts/models.md b/articles/ai-services/openai/concepts/models.md
@@ -148,6 +148,9 @@ These models can only be used with Embedding API requests.
 | `text-embedding-ada-002` (version 2) | Australia East <br> Canada East <br> East US <br> East US2 <br> France Central <br> Japan East <br> North Central US <br> Norway East <br> South Central US <br> Sweden Central <br> Switzerland North <br> UK South <br> West Europe <br> West US |8,191 | Sep 2021 | 1,536 |
 | `text-embedding-ada-002` (version 1) | East US <br> South Central US <br> West Europe |2,046 | Sep 2021 | 1,536 |
 
+> [!NOTE]
+> When sending an array of inputs for embedding, the max number of input items in the array per call to the embedding endpoint is 2048.
+
 ### DALL-E models (Preview)
 
 |  Model ID  | Feature Availability | Max Request (characters) |
diff --git a/articles/ai-services/openai/how-to/embeddings.md b/articles/ai-services/openai/how-to/embeddings.md
@@ -6,7 +6,7 @@ description: Learn how to generate embeddings with Azure OpenAI
 manager: nitinme
 ms.service: azure-ai-openai
 ms.topic: how-to
-ms.date: 11/06/2023
+ms.date: 01/16/2024
 author: mrbullwinkle
 ms.author: mbullwin
 recommendations: false
@@ -126,7 +126,9 @@ return $response.data.embedding
 
 ### Verify inputs don't exceed the maximum length
 
-The maximum length of input text for our latest embedding models is 8192 tokens. You should verify that your inputs don't exceed this limit before making a request.
+- The maximum length of input text for our latest embedding models is 8192 tokens. You should verify that your inputs don't exceed this limit before making a request.
+- If sending an array of inputs in a single embedding request the max array size is 2048.
+
 
 ## Limitations & risks
 
diff --git a/articles/ai-services/openai/how-to/switching-endpoints.md b/articles/ai-services/openai/how-to/switching-endpoints.md
@@ -7,7 +7,7 @@ ms.author: mbullwin
 ms.service: azure-ai-openai
 ms.custom: devx-track-python
 ms.topic: how-to
-ms.date: 11/22/2023
+ms.date: 01/06/2023
 manager: nitinme
 ---
 
@@ -166,7 +166,7 @@ embedding = client.embeddings.create(
 
 ## Azure OpenAI embeddings multiple input support
 
-OpenAI currently allows a larger number of array inputs with text-embedding-ada-002. Azure OpenAI currently supports input arrays up to 16 for text-embedding-ada-002 Version 2. Both require the max input token limit per API request to remain under 8191 for this model.
+OpenAI and Azure OpenAI currently support input arrays up to 2048 input items for text-embedding-ada-002. Both require the max input token limit per API request to remain under 8191 for this model.
 
 <table>
 <tr>
@@ -190,7 +190,7 @@ embedding = client.embeddings.create(
 <td>
 
 ```python
-inputs = ["A", "B", "C"] #max array size=16
+inputs = ["A", "B", "C"] #max array size=2048
 
 embedding = client.embeddings.create(
   input=inputs,
diff --git a/articles/ai-services/openai/includes/embeddings-python.md b/articles/ai-services/openai/includes/embeddings-python.md
@@ -340,6 +340,8 @@ len(decode)
 
 Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an [Azure Database to support Vector Search](../../../cosmos-db/mongodb/vcore/vector-search.md). As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
 
+In the example below we are calling the embedding model once per every item that we want to embed. When working with large embedding projects you can alternatively pass the model an array of inputs to embed rather than one input at a time. When you pass the model an array of inputs the max number of input items per call to the embedding endpoint is 2048.
+
 # [OpenAI Python 0.28.1](#tab/python)
 
 ```python
diff --git a/articles/ai-services/openai/quotas-limits.md b/articles/ai-services/openai/quotas-limits.md
@@ -10,7 +10,7 @@ ms.custom:
   - ignite-2023
   - references_regions
 ms.topic: conceptual
-ms.date: 01/12/2024
+ms.date: 01/16/2024
 ms.author: mbullwin
 ---
 
@@ -37,8 +37,14 @@ The following sections provide you with a quick guide to the default quotas and
 | Max training job time (job will fail if exceeded) | 720 hours |
 | Max training job size (tokens in training file) x (# of epochs) | 2 Billion |
 | Max size of all files per upload (Azure OpenAI on your data) | 16 MB |
+| Max number or inputs in array with `/embeddings` | 2048 |
+| Max number of `/chat/completions` messages | 2048 |
+| Max number of `/chat/completions` functions | 128 |
+| Max number of `/chat completions` tools | 128 |
 | Maximum number of Provisioned throughput units per deployment | 100,000 |
 
+
+
 ## Regional quota limits
 
 The default quota for models varies by model and region. Default quota limits are subject to change.