Skip to content

Commit 736c45e

Browse files
Merge pull request #263304 from mrbullwinkle/mrb_01_16_2024_limits
[Azure OpenAI] Update limits
2 parents 3fad711 + 65d9adf commit 736c45e

File tree

5 files changed

+19
-6
lines changed

5 files changed

+19
-6
lines changed

articles/ai-services/openai/concepts/models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ These models can only be used with Embedding API requests.
148148
| `text-embedding-ada-002` (version 2) | Australia East <br> Canada East <br> East US <br> East US2 <br> France Central <br> Japan East <br> North Central US <br> Norway East <br> South Central US <br> Sweden Central <br> Switzerland North <br> UK South <br> West Europe <br> West US |8,191 | Sep 2021 | 1,536 |
149149
| `text-embedding-ada-002` (version 1) | East US <br> South Central US <br> West Europe |2,046 | Sep 2021 | 1,536 |
150150

151+
> [!NOTE]
152+
> When sending an array of inputs for embedding, the max number of input items in the array per call to the embedding endpoint is 2048.
153+
151154
### DALL-E models (Preview)
152155

153156
| Model ID | Feature Availability | Max Request (characters) |

articles/ai-services/openai/how-to/embeddings.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: Learn how to generate embeddings with Azure OpenAI
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: how-to
9-
ms.date: 11/06/2023
9+
ms.date: 01/16/2024
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -126,7 +126,9 @@ return $response.data.embedding
126126

127127
### Verify inputs don't exceed the maximum length
128128

129-
The maximum length of input text for our latest embedding models is 8192 tokens. You should verify that your inputs don't exceed this limit before making a request.
129+
- The maximum length of input text for our latest embedding models is 8192 tokens. You should verify that your inputs don't exceed this limit before making a request.
130+
- If sending an array of inputs in a single embedding request the max array size is 2048.
131+
130132

131133
## Limitations & risks
132134

articles/ai-services/openai/how-to/switching-endpoints.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: mbullwin
77
ms.service: azure-ai-openai
88
ms.custom: devx-track-python
99
ms.topic: how-to
10-
ms.date: 11/22/2023
10+
ms.date: 01/06/2023
1111
manager: nitinme
1212
---
1313

@@ -166,7 +166,7 @@ embedding = client.embeddings.create(
166166

167167
## Azure OpenAI embeddings multiple input support
168168

169-
OpenAI currently allows a larger number of array inputs with text-embedding-ada-002. Azure OpenAI currently supports input arrays up to 16 for text-embedding-ada-002 Version 2. Both require the max input token limit per API request to remain under 8191 for this model.
169+
OpenAI and Azure OpenAI currently support input arrays up to 2048 input items for text-embedding-ada-002. Both require the max input token limit per API request to remain under 8191 for this model.
170170

171171
<table>
172172
<tr>
@@ -190,7 +190,7 @@ embedding = client.embeddings.create(
190190
<td>
191191

192192
```python
193-
inputs = ["A", "B", "C"] #max array size=16
193+
inputs = ["A", "B", "C"] #max array size=2048
194194

195195
embedding = client.embeddings.create(
196196
input=inputs,

articles/ai-services/openai/includes/embeddings-python.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,8 @@ len(decode)
340340

341341
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an [Azure Database to support Vector Search](../../../cosmos-db/mongodb/vcore/vector-search.md). As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
342342

343+
In the example below we are calling the embedding model once per every item that we want to embed. When working with large embedding projects you can alternatively pass the model an array of inputs to embed rather than one input at a time. When you pass the model an array of inputs the max number of input items per call to the embedding endpoint is 2048.
344+
343345
# [OpenAI Python 0.28.1](#tab/python)
344346

345347
```python

articles/ai-services/openai/quotas-limits.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: conceptual
13-
ms.date: 01/12/2024
13+
ms.date: 01/16/2024
1414
ms.author: mbullwin
1515
---
1616

@@ -37,8 +37,14 @@ The following sections provide you with a quick guide to the default quotas and
3737
| Max training job time (job will fail if exceeded) | 720 hours |
3838
| Max training job size (tokens in training file) x (# of epochs) | 2 Billion |
3939
| Max size of all files per upload (Azure OpenAI on your data) | 16 MB |
40+
| Max number or inputs in array with `/embeddings` | 2048 |
41+
| Max number of `/chat/completions` messages | 2048 |
42+
| Max number of `/chat/completions` functions | 128 |
43+
| Max number of `/chat completions` tools | 128 |
4044
| Maximum number of Provisioned throughput units per deployment | 100,000 |
4145

46+
47+
4248
## Regional quota limits
4349

4450
The default quota for models varies by model and region. Default quota limits are subject to change.

0 commit comments

Comments
 (0)