You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/openai/concepts/models.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -222,11 +222,11 @@ These models can only be used with Completions API requests.
222
222
These models can only be used with Embedding API requests.
223
223
224
224
> [!NOTE]
225
-
> We strongly recommend using `text-embedding-ada-002 (version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using version 1 you should migrate to version 2 to take advantage of the latest weights/updated token limit.
225
+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
226
226
227
227
| Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
228
228
| --- | --- | --- | --- | --- |
229
-
| text-embedding-ada-002 (version 2) | East US, South Central US, West Europe| N/A |8,191 | Sep 2021 |
229
+
| text-embedding-ada-002 (version 2) | East US, South Central US | N/A |8,191 | Sep 2021 |
230
230
| text-embedding-ada-002 (version 1) | East US, South Central US, West Europe | N/A |4,095 | Sep 2021 |
231
231
| text-similarity-ada-001| East US, South Central US, West Europe | N/A | 2,046 | Aug 2020 |
232
232
| text-similarity-babbage-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
Copy file name to clipboardExpand all lines: articles/cognitive-services/openai/tutorials/embeddings.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,11 +25,11 @@ In this tutorial, you learn how to:
25
25
> * Install Azure OpenAI and other dependent Python libraries.
26
26
> * Download the BillSum dataset and prepare it for analysis.
27
27
> * Create environment variables for your resources endpoint and API key.
28
-
> * Use the **text-embedding-ada-002 (version 2)** model
28
+
> * Use the **text-embedding-ada-002 (Version 2)** model
29
29
> * Use [cosine similarity](../concepts/understand-embeddings.md) to rank search results.
30
30
31
31
> [!Important]
32
-
> We strongly recommend using `text-embedding-ada-002 (version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using version 1 you should migrate to version 2 to take advantage of the latest weights/updated token limit.
32
+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
33
33
34
34
## Prerequisites
35
35
@@ -39,7 +39,7 @@ In this tutorial, you learn how to:
39
39
* <ahref="https://www.python.org/"target="_blank">Python 3.7.1 or later version</a>
40
40
* The following Python libraries: openai, num2words, matplotlib, plotly, scipy, scikit-learn, pandas, tiktoken.
41
41
*[Jupyter Notebooks](https://jupyter.org/)
42
-
* An Azure OpenAI resource with the **text-embedding-ada-002 (version 2)** model deployed. This model is currently only available in [certain regions](../concepts/models.md#model-summary-table-and-region-availability). If you don't have a resource the process of creating one is documented in our [resource deployment guide](../how-to/create-resource.md).
42
+
* An Azure OpenAI resource with the **text-embedding-ada-002 (Version 2)** model deployed. This model is currently only available in [certain regions](../concepts/models.md#model-summary-table-and-region-availability). If you don't have a resource the process of creating one is documented in our [resource deployment guide](../how-to/create-resource.md).
43
43
44
44
## Set up
45
45
@@ -356,7 +356,7 @@ len(decode)
356
356
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an Azure Database. As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
357
357
358
358
```python
359
-
df_bills['ada_v2'] = df_bills["text"].apply(lambdax : get_embedding(x, engine='text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (version 2) model
359
+
df_bills['ada_v2'] = df_bills["text"].apply(lambdax : get_embedding(x, engine='text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model
360
360
```
361
361
362
362
```python
@@ -367,14 +367,14 @@ df_bills
367
367
368
368
:::image type="content" source="../media/tutorials/embed-text-documents.png" alt-text="Screenshot of the formatted results from df_bills command." lightbox="../media/tutorials/embed-text-documents.png":::
369
369
370
-
As we run the search code block below, we'll embed the search query *"Can I get information on cable company tax revenue?"* with the same **text-embedding-ada-002 (version 2)** model. Next we'll find the closest bill embedding to the newly embedded text from our query ranked by [cosine similarity](../concepts/understand-embeddings.md).
370
+
As we run the search code block below, we'll embed the search query *"Can I get information on cable company tax revenue?"* with the same **text-embedding-ada-002 (Version 2)** model. Next we'll find the closest bill embedding to the newly embedded text from our query ranked by [cosine similarity](../concepts/understand-embeddings.md).
371
371
372
372
```python
373
373
# search through the reviews for a specific product
0 commit comments