You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/understand-embeddings.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ manager: nitinme
7
7
ms.service: cognitive-services
8
8
ms.subservice: openai
9
9
ms.topic: tutorial
10
-
ms.date: 03/22/2023
10
+
ms.date: 09/12/2023
11
11
author: mrbullwinkle
12
12
ms.author: mbullwin
13
13
recommendations: false
@@ -16,7 +16,7 @@ ms.custom:
16
16
17
17
# Understanding embeddings in Azure OpenAI Service
18
18
19
-
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.
19
+
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. Embeddings power vector similarity search in Azure Databases such as [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md).
20
20
21
21
## Embedding models
22
22
@@ -34,4 +34,6 @@ An alternative method of identifying similar documents is to count the number of
34
34
35
35
## Next steps
36
36
37
-
Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).
37
+
* Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).
38
+
* Store your embeddings and perform vector (similarity) search using [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md) or [Azure Cosmos DB for NoSQL](../../../cosmos-db/rag-data-openai.md)
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/embeddings.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ manager: nitinme
7
7
ms.service: cognitive-services
8
8
ms.subservice: openai
9
9
ms.topic: how-to
10
-
ms.date: 5/9/2023
10
+
ms.date: 9/12/2023
11
11
author: ChrisHMSFT
12
12
ms.author: chrhoder
13
13
recommendations: false
@@ -16,7 +16,8 @@ keywords:
16
16
---
17
17
# Learn how to generate embeddings with Azure OpenAI
18
18
19
-
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.
19
+
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. Embeddings power vector similarity search in Azure Databases such as [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md).
20
+
20
21
21
22
## How to get embeddings
22
23
@@ -93,3 +94,5 @@ Our embedding models may be unreliable or pose social risks in certain cases, an
93
94
94
95
* Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).
95
96
* Learn more about the [underlying models that power Azure OpenAI](../concepts/models.md).
97
+
* Store your embeddings and perform vector (similarity) search using [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md) or [Azure Cosmos DB for NoSQL](../../../cosmos-db/rag-data-openai.md)
Copy file name to clipboardExpand all lines: articles/ai-services/openai/tutorials/embeddings.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ manager: nitinme
7
7
ms.service: cognitive-services
8
8
ms.subservice: openai
9
9
ms.topic: tutorial
10
-
ms.date: 06/14/2023
10
+
ms.date: 09/12/2023
11
11
author: mrbullwinkle #noabenefraim
12
12
ms.author: mbullwin
13
13
recommendations: false
@@ -333,7 +333,7 @@ len(decode)
333
333
1466
334
334
```
335
335
336
-
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an Azure Database. As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
336
+
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an [Azure Database to support Vector Search](../../../cosmos-db/mongodb/vcore/vector-search.md). As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
337
337
338
338
```python
339
339
df_bills['ada_v2'] = df_bills["text"].apply(lambdax : get_embedding(x, engine='text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model
@@ -398,3 +398,4 @@ If you created an OpenAI resource solely for completing this tutorial and want t
398
398
Learn more about Azure OpenAI's models:
399
399
> [!div class="nextstepaction"]
400
400
> [Azure OpenAI Service models](../concepts/models.md)
401
+
* Store your embeddings and perform vector (similarity) search using [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md) or [Azure Cosmos DB for NoSQL](../../../cosmos-db/rag-data-openai.md)
Copy file name to clipboardExpand all lines: articles/cosmos-db/mongodb/vcore/vector-search.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,9 +19,9 @@ Use vector search in Azure Cosmos DB for MongoDB vCore to seamlessly integrate y
19
19
20
20
## What is vector search?
21
21
22
-
Vector search is a method that helps you find similar items based on their data characteristics rather than by exact matches on a property field. This technique is useful in applications such as searching for similar text, finding related images, making recommendations, or even detecting anomalies. It works by taking the vector representations (lists of numbers) of your data that you created by using a machine learning model by using or an embeddings API. Examples of embeddings APIs are [Azure OpenAI Embeddings](/azure/ai-services/openai/how-to/embeddings) or [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure/). It then measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically.
22
+
Vector search is a method that helps you find similar items based on their data characteristics rather than by exact matches on a property field. This technique is useful in applications such as searching for similar text, finding related images, making recommendations, or even detecting anomalies. It works by taking the [vector representations](../../../ai-services/openai/concepts/understand-embeddings.md) (lists of numbers) of your data that you created by using a machine learning model by using or an embeddings API. Examples of embeddings APIs are [Azure OpenAI Embeddings](/azure/ai-services/openai/how-to/embeddings) or [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure/). It then measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically.
23
23
24
-
By integrating vector search capabilities natively, you can unlock the full potential of your data in applications that are built on top of the OpenAI API. You can also create custom-built solutions that use vector embeddings.
24
+
By integrating vector search capabilities natively, you can unlock the full potential of your data in applications that are built on top of the [OpenAI API](../../../ai-services/openai/concepts/understand-embeddings.md). You can also create custom-built solutions that use vector embeddings.
25
25
26
26
## Use the createIndexes template to create a vector index
27
27
@@ -97,7 +97,7 @@ This command creates a `vector-ivf` index against the `vectorContent` property i
97
97
98
98
### Add vectors to your database
99
99
100
-
To add vectors to your database's collection, you first need to create the embeddings by using your own model, [Azure OpenAI Embeddings](../../../cognitive-services/openai/tutorials/embeddings.md), or another API (such as [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure/)). In this example, new documents are added through sample embeddings:
100
+
To add vectors to your database's collection, you first need to create the [embeddings](../../../ai-services/openai/concepts/understand-embeddings.md) by using your own model, [Azure OpenAI Embeddings](../../../cognitive-services/openai/tutorials/embeddings.md), or another API (such as [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure/)). In this example, new documents are added through sample embeddings:
101
101
102
102
```javascript
103
103
db.exampleCollection.insertMany([
@@ -201,7 +201,9 @@ In this example, `vectorIndex` is returned with all the `cosmosSearch` parameter
201
201
202
202
## Next steps
203
203
204
-
This guide demonstrates how to create a vector index, add documents that have vector data, perform a similarity search, and retrieve the index definition. By using vector search, you can efficiently store, index, and query high-dimensional vector data directly in Azure Cosmos DB for MongoDB vCore. Vector search enables you to unlock the full potential of your data via vector embeddings, and it empowers you to build more accurate, efficient, and powerful applications.
204
+
This guide demonstrates how to create a vector index, add documents that have vector data, perform a similarity search, and retrieve the index definition. By using vector search, you can efficiently store, index, and query high-dimensional vector data directly in Azure Cosmos DB for MongoDB vCore. Vector search enables you to unlock the full potential of your data via [vector embeddings](../../../ai-services/openai/concepts/understand-embeddings.md), and it empowers you to build more accurate, efficient, and powerful applications.
205
205
206
206
> [!div class="nextstepaction"]
207
207
> [Build AI apps with Azure Cosmos DB for MongoDB vCore vector search](vector-search-ai.md)
208
+
* Learn more about [Azure OpenAI embeddings](../../../ai-services/openai/concepts/understand-embeddings.md)
209
+
* Learn how to [generate embeddings using Azure OpenAI](../../../ai-services/openai/tutorials/embeddings.md)
0 commit comments