You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Run the following code in your preferred Python IDE:
114
+
After setting the environment variables you may need to close and reopen jupyter notebooks or whatever IDE you are using in order for the environment variables to be accessible.
115
+
116
+
Run the following code in your preferred Python IDE:
115
117
116
118
## Import libraries and list models
117
119
@@ -128,7 +130,7 @@ from openai.embeddings_utils import get_embedding, cosine_similarity
Now that we understand more about how tokenization works we can move on to embedding. Before searching, we'll embed the text documents and save the corresponding embedding. We embed each chunk using a *doc* model, in this case `text-search-curie-doc-001`. These embeddings can be stored locally or in an Azure DB. As a result, each tech document has its corresponding embedding vector in the new curie search column on the right side of the DataFrame.
332
+
Now that we understand more about how tokenization works we can move on to embedding. Before searching, we'll embed the text documents and save the corresponding embedding. We embed each chunk using a **doc model**, in this case `text-search-curie-doc-001`. These embeddings can be stored locally or in an Azure DB. As a result, each tech document has its corresponding embedding vector in the new curie search column on the right side of the DataFrame.
:::image type="content" source="../media/tutorials/embed-text-documents.png" alt-text="Screenshot of the formatted results from df_bills command." lightbox="../media/tutorials/embed-text-documents.png":::
343
345
344
-
At the time of search (live compute), we'll embed the search query using the corresponding *query* model (`text-search-query-001`). Next find the closest embedding in the database, ranked by [cosine similarity](../concepts/understand-embeddings.md).
346
+
At the time of search (live compute), we'll embed the search query using the corresponding **query model** (`text-search-query-001`). Next find the closest embedding in the database, ranked by [cosine similarity](../concepts/understand-embeddings.md).
345
347
346
-
In our example, the user provides the query "can I get information on cable company tax revenue". The query is passed through a function that embeds the query with the corresponding *query model* and finds the embedding closest to it from the previously embedded documents in the previous step.
348
+
In our example, the user provides the query "can I get information on cable company tax revenue". The query is passed through a function that embeds the query with the corresponding **query model** and finds the embedding closest to it from the previously embedded documents in the previous step.
347
349
348
350
```python
349
351
# search through the reviews for a specific product
@@ -370,7 +372,7 @@ res = search_docs(df_bills, "can i get information on cable company tax revenue"
370
372
371
373
:::image type="content" source="../media/tutorials/query-result.png" alt-text="Screenshot of the formatted results of res once the search query has been run." lightbox="../media/tutorials/query-result.png":::
372
374
373
-
Finally, we'll show the top result from document search based on user query against the entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of 1993", as shown in Figure 4. This document has a cosine similarity score of 0.36 between the query and the document. :
375
+
Finally, we'll show the top result from document search based on user query against the entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of 1993", as shown in Figure 4. This document has a cosine similarity score of 0.36 between the query and the document:
0 commit comments