Skip to content

Commit 9c50882

Browse files
committed
add conceptual content for background
1 parent 2f8b792 commit 9c50882

File tree

3 files changed

+42
-5
lines changed

3 files changed

+42
-5
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: Azure OpenAI embeddings tutorial
3+
titleSuffix: Azure OpenAI - embeddings and cosine similarity
4+
description: Learn more about Azure OpenAI embeddings API for document search and cosine similarity
5+
services: cognitive-services
6+
manager: nitinme
7+
ms.service: cognitive-services
8+
ms.subservice: openai
9+
ms.topic: tutorial
10+
ms.date: 12/06/2022
11+
author: mrbullwinkle
12+
ms.author: mbullwin
13+
recommendations: false
14+
ms.custom:
15+
---
16+
17+
# Understanding embeddings in Azure OpenAI
18+
19+
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.
20+
21+
## Embedding models
22+
23+
Different Azure OpenAI embedding models are specifically created to be good at a particular task. **Similarity embeddings** are good at capturing semantic similarity between two or more pieces of text. **Text search embeddings** help measure long documents are relevant to a short query. **Code search embeddings** are useful for embedding code snippets and embedding nature language search queries.
24+
25+
Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, we can use embeddings to determine if two text chunks are semantically related or similar, and provide a score to assess similarity.
26+
27+
## Cosine similarity
28+
29+
One method of identifying similar documents is to count the number of common words between documents. Unfortunately, this approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words detected even among completely disparate topics. For this reason, cosine similarity can offer a more effective alternative.
30+
31+
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity.
32+
33+
Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.
34+
35+
## Next steps
36+
37+
Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).

articles/cognitive-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ items:
1818
href: ./concepts/models.md
1919
- name: Content filtering
2020
href: ./concepts/content-filter.md
21+
- name: Embeddings
22+
href: ./concepts/embeddings.md
2123
- name: How-to
2224
items:
2325
- name: Create a resource

articles/cognitive-services/openai/tutorials/embeddings.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ In this tutorial, you learn how to:
2626
> * Download the BillSum dataset and prepare it for analyis.
2727
> * Create environment variables for your resources endpoint and API key.
2828
> * Use the **text-search-curie-doc-001** and **text-search-curie-query-001** models.
29-
> * Use cosine similarity to return search results.
29+
> * Use [cosine similarity](../concepts/understand-embeddings.md) to rank search results.
3030
3131
## Prerequisites
3232

@@ -134,7 +134,7 @@ openai.api_version = "2022-06-01-preview"
134134

135135
url = openai.api_base + "/openai/deployments?api-version=2022-06-01-preview"
136136

137-
r = requests.get(url, headers={"api-key": apiKey})
137+
r = requests.get(url, headers={"api-key": API_KEY})
138138

139139
print(r.text)
140140
```
@@ -475,7 +475,7 @@ df_bills
475475

476476
:::image type="content" source="../media/tutorials/embed-text-documents.png" alt-text="Screenshot of the formatted results from df_bills command." lightbox="../media/tutorials/embed-text-documents.png":::
477477

478-
At the time of search (live compute), we'll embed the search query using the corresponding *query* model (text-search-query-001). Next find the closest embedding in the database, ranked by cosine similarity.
478+
At the time of search (live compute), we'll embed the search query using the corresponding *query* model (text-search-query-001). Next find the closest embedding in the database, ranked by [cosine similarity](../concepts/understand-embeddings.md).
479479

480480
```python
481481
# search through the reviews for a specific product
@@ -523,10 +523,8 @@ If you created an OpenAI resource solely for completing this tutorial and want t
523523
- [Portal](../../cognitive-services-apis-create-account.md#clean-up-resources)
524524
- [Azure CLI](../../cognitive-services-apis-create-account-cli.md#clean-up-resources)
525525

526-
527526
## Next steps
528527

529528
Learn more about Azure OpenAI's models:
530529
> [!div class="nextstepaction"]
531530
> [Next steps button](../concepts/models.md)
532-

0 commit comments

Comments
 (0)