Skip to content

Commit 55a3eba

Browse files
authored
Merge pull request #249408 from mohitp930/mp8252023-freshness-pass-155158
Azure OpenAI Freshness Pass - User Story: 141588
2 parents 40744a4 + cfb86c4 commit 55a3eba

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

articles/ai-services/openai/concepts/understand-embeddings.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Azure OpenAI Service embeddings
33
titleSuffix: Azure OpenAI - embeddings and cosine similarity
4-
description: Learn more about Azure OpenAI embeddings API for document search and cosine similarity
4+
description: Learn more about how the Azure OpenAI embeddings API uses cosine similarity for document search and to measure similarity between texts.
55
services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
@@ -13,26 +13,29 @@ recommendations: false
1313
ms.custom:
1414
---
1515

16-
# Understanding embeddings in Azure OpenAI Service
16+
# Understand embeddings in Azure OpenAI Service
1717

18-
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. Embeddings power vector similarity search in Azure Databases such as [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md).
18+
An embedding is a special format of data representation that machine learning models and algorithms can easily use. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. Embeddings power vector similarity search in Azure Databases such as [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md).
1919

2020
## Embedding models
2121

22-
Different Azure OpenAI embedding models are specifically created to be good at a particular task. **Similarity embeddings** are good at capturing semantic similarity between two or more pieces of text. **Text search embeddings** help measure whether long documents are relevant to a short query. **Code search embeddings** are useful for embedding code snippets and embedding natural language search queries.
22+
Different Azure OpenAI embedding models are created to be good at a particular task:
23+
24+
- **Similarity embeddings** are good at capturing semantic similarity between two or more pieces of text.
25+
- **Text search embeddings** help measure whether long documents are relevant to a short query.
26+
- **Code search embeddings** are useful for embedding code snippets and embedding natural language search queries.
2327

24-
Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, we can use embeddings to determine if two text chunks are semantically related or similar, and provide a score to assess similarity.
28+
Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, you can use embeddings to determine if two text chunks are semantically related or similar, and provide a score to assess similarity.
2529

2630
## Cosine similarity
2731

2832
Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.
2933

30-
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information about cosine similarity equations, see [this article on Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity).
34+
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multidimensional space. This measurement is beneficial, because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information about cosine similarity equations, see [Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
3135

32-
An alternative method of identifying similar documents is to count the number of common words between documents. Unfortunately, this approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words detected even among completely disparate topics. For this reason, cosine similarity can offer a more effective alternative.
36+
An alternative method of identifying similar documents is to count the number of common words between documents. This approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words detected even among disparate topics. For this reason, cosine similarity can offer a more effective alternative.
3337

3438
## Next steps
3539

3640
* Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).
3741
* Store your embeddings and perform vector (similarity) search using [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md) or [Azure Cosmos DB for NoSQL](../../../cosmos-db/rag-data-openai.md)
38-

0 commit comments

Comments
 (0)