You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -128,7 +128,10 @@ Cushman is powerful, yet fast. While Davinci is stronger when it comes to analyz
128
128
129
129
## Embeddings models
130
130
131
-
Currently, we offer three families of Embeddings models for different functionalities:
131
+
> [!IMPORTANT]
132
+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. This model/version provides parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
133
+
134
+
Currently, we offer three families of Embeddings models for different functionalities:
132
135
133
136
-[Similarity](#similarity-embedding)
134
137
-[Text search](#text-search-embedding)
@@ -221,9 +224,13 @@ These models can only be used with Completions API requests.
221
224
222
225
These models can only be used with Embedding API requests.
223
226
227
+
> [!NOTE]
228
+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. This model/version provides parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
229
+
224
230
| Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
225
231
| --- | --- | --- | --- | --- |
226
-
| text-embedding-ada-002 | East US, South Central US, West Europe | N/A |2,046 | Sep 2021 |
232
+
| text-embedding-ada-002 (version 2) | East US, South Central US | N/A |8,191 | Sep 2021 |
233
+
| text-embedding-ada-002 (version 1) | East US, South Central US, West Europe | N/A |4,095 | Sep 2021 |
227
234
| text-similarity-ada-001| East US, South Central US, West Europe | N/A | 2,046 | Aug 2020 |
228
235
| text-similarity-babbage-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
229
236
| text-similarity-curie-001 | East US, South Central US, West Europe | N/A | 2046 | Aug 2020 |
Copy file name to clipboardExpand all lines: articles/cognitive-services/openai/concepts/understand-embeddings.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ manager: nitinme
7
7
ms.service: cognitive-services
8
8
ms.subservice: openai
9
9
ms.topic: tutorial
10
-
ms.date: 12/06/2022
10
+
ms.date: 03/22/2023
11
11
author: mrbullwinkle
12
12
ms.author: mbullwin
13
13
recommendations: false
@@ -28,7 +28,7 @@ Embeddings make it easier to do machine learning on large inputs representing wo
28
28
29
29
One method of identifying similar documents is to count the number of common words between documents. Unfortunately, this approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words detected even among completely disparate topics. For this reason, cosine similarity can offer a more effective alternative.
30
30
31
-
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity.
31
+
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information on cosine similarity and the [underlying formula](https://en.wikipedia.org/wiki/Cosine_similarity).
32
32
33
33
Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.
0 commit comments