Skip to content

Commit 883fd4c

Browse files
committed
Link Vector Search with Embedding Generation as a next step
1 parent 86f19c7 commit 883fd4c

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

articles/ai-services/openai/tutorials/embeddings.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: cognitive-services
88
ms.subservice: openai
99
ms.topic: tutorial
10-
ms.date: 06/14/2023
10+
ms.date: 09/12/2023
1111
author: mrbullwinkle #noabenefraim
1212
ms.author: mbullwin
1313
recommendations: false
@@ -333,7 +333,7 @@ len(decode)
333333
1466
334334
```
335335

336-
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an Azure Database. As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
336+
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an [Azure Database to support Vector Search](../../../cosmos-db/mongodb/vcore/vector-search.md). As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
337337

338338
```python
339339
df_bills['ada_v2'] = df_bills["text"].apply(lambda x : get_embedding(x, engine = 'text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model
@@ -398,3 +398,4 @@ If you created an OpenAI resource solely for completing this tutorial and want t
398398
Learn more about Azure OpenAI's models:
399399
> [!div class="nextstepaction"]
400400
> [Azure OpenAI Service models](../concepts/models.md)
401+
- Perform Vector (similarity) search with your embeddings using [Azure Cosmos DB for MongoDB vCore](../../../cosmos-db/mongodb/vcore/vector-search.md) or [Azure Cosmos DB for NoSQL](../../../cosmos-db/rag-data-openai.md)

0 commit comments

Comments
 (0)