Skip to content

Commit b58edca

Browse files
Merge pull request #231806 from mrbullwinkle/mrb_03_22_2023_openai_ada_v2
[Cognitive Services] [Azure OpenAI] Update to ada_v2_model
2 parents dd0d41b + 75f858e commit b58edca

File tree

7 files changed

+107
-84
lines changed

7 files changed

+107
-84
lines changed

articles/cognitive-services/openai/concepts/models.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ titleSuffix: Azure OpenAI
44
description: Learn about the different model capabilities that are available with Azure OpenAI.
55
ms.service: cognitive-services
66
ms.topic: conceptual
7-
ms.date: 03/21/2023
7+
ms.date: 03/31/2023
88
ms.custom: event-tier1-build-2022, references_regions
99
manager: nitinme
10-
author: ChrisHMSFT
11-
ms.author: chrhoder
10+
author: mrbullwinkle #ChrisHMSFT
11+
ms.author: mbullwin #chrhoder
1212
recommendations: false
1313
keywords:
1414
---
@@ -128,7 +128,10 @@ Cushman is powerful, yet fast. While Davinci is stronger when it comes to analyz
128128

129129
## Embeddings models
130130

131-
Currently, we offer three families of Embeddings models for different functionalities:
131+
> [!IMPORTANT]
132+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. This model/version provides parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
133+
134+
Currently, we offer three families of Embeddings models for different functionalities:
132135

133136
- [Similarity](#similarity-embedding)
134137
- [Text search](#text-search-embedding)
@@ -221,9 +224,13 @@ These models can only be used with Completions API requests.
221224

222225
These models can only be used with Embedding API requests.
223226

227+
> [!NOTE]
228+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. This model/version provides parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
229+
224230
| Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
225231
| --- | --- | --- | --- | --- |
226-
| text-embedding-ada-002 | East US, South Central US, West Europe | N/A |2,046 | Sep 2021 |
232+
| text-embedding-ada-002 (version 2) | East US, South Central US | N/A |8,191 | Sep 2021 |
233+
| text-embedding-ada-002 (version 1) | East US, South Central US, West Europe | N/A |4,095 | Sep 2021 |
227234
| text-similarity-ada-001| East US, South Central US, West Europe | N/A | 2,046 | Aug 2020 |
228235
| text-similarity-babbage-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
229236
| text-similarity-curie-001 | East US, South Central US, West Europe | N/A | 2046 | Aug 2020 |

articles/cognitive-services/openai/concepts/understand-embeddings.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: cognitive-services
88
ms.subservice: openai
99
ms.topic: tutorial
10-
ms.date: 12/06/2022
10+
ms.date: 03/22/2023
1111
author: mrbullwinkle
1212
ms.author: mbullwin
1313
recommendations: false
@@ -28,7 +28,7 @@ Embeddings make it easier to do machine learning on large inputs representing wo
2828

2929
One method of identifying similar documents is to count the number of common words between documents. Unfortunately, this approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words detected even among completely disparate topics. For this reason, cosine similarity can offer a more effective alternative.
3030

31-
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity.
31+
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information on cosine similarity and the [underlying formula](https://en.wikipedia.org/wiki/Cosine_similarity).
3232

3333
Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.
3434

8.83 KB
Loading
23.1 KB
Loading
-4.72 KB
Loading
41.6 KB
Loading

0 commit comments

Comments
 (0)