Skip to content

Commit ec76e94

Browse files
committed
updates
1 parent 49b8d64 commit ec76e94

File tree

1 file changed

+11
-9
lines changed

1 file changed

+11
-9
lines changed

articles/cognitive-services/openai/tutorials/embeddings.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"
8686
```
8787

8888
```CMD
89-
setx AZURE_OPENAI_API_KEY_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"
89+
setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"
9090
```
9191

9292
# [PowerShell](#tab/powershell)
@@ -96,7 +96,7 @@ setx AZURE_OPENAI_API_KEY_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"
9696
```
9797

9898
```powershell
99-
[System.Environment]::SetEnvironmentVariable('AZURE_OPENAI_API_KEY_ENDPOINT', 'REPLACE_WITH_YOUR_ENDPOINT_HERE', 'User')
99+
[System.Environment]::SetEnvironmentVariable('AZURE_OPENAI_ENDPOINT', 'REPLACE_WITH_YOUR_ENDPOINT_HERE', 'User')
100100
```
101101

102102
# [Bash](#tab/bash)
@@ -106,12 +106,14 @@ echo export AZURE_OPENAI_API_KEY="REPLACE_WITH_YOUR_KEY_VALUE_HERE" >> /etc/envi
106106
```
107107

108108
```Bash
109-
echo export AZURE_OPENAI_API_KEY_ENDPOINT="REPLACE_WITH_YOUR_ENDPOINT_HERE" >> /etc/environment && source /etc/environment
109+
echo export AZURE_OPENAI_ENDPOINT="REPLACE_WITH_YOUR_ENDPOINT_HERE" >> /etc/environment && source /etc/environment
110110
```
111111

112112
---
113113

114-
1. Run the following code in your preferred Python IDE:
114+
After setting the environment variables you may need to close and reopen jupyter notebooks or whatever IDE you are using in order for the environment variables to be accessible.
115+
116+
Run the following code in your preferred Python IDE:
115117

116118
## Import libraries and list models
117119

@@ -128,7 +130,7 @@ from openai.embeddings_utils import get_embedding, cosine_similarity
128130
from transformers import GPT2TokenizerFast
129131

130132
API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
131-
RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_API_KEY_ENDPOINT")
133+
RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
132134

133135
openai.api_type = "azure"
134136
openai.api_key = API_KEY
@@ -327,7 +329,7 @@ len(understand_tokenization)
327329
1480
328330
```
329331

330-
Now that we understand more about how tokenization works we can move on to embedding. Before searching, we'll embed the text documents and save the corresponding embedding. We embed each chunk using a *doc* model, in this case `text-search-curie-doc-001`. These embeddings can be stored locally or in an Azure DB. As a result, each tech document has its corresponding embedding vector in the new curie search column on the right side of the DataFrame.
332+
Now that we understand more about how tokenization works we can move on to embedding. Before searching, we'll embed the text documents and save the corresponding embedding. We embed each chunk using a **doc model**, in this case `text-search-curie-doc-001`. These embeddings can be stored locally or in an Azure DB. As a result, each tech document has its corresponding embedding vector in the new curie search column on the right side of the DataFrame.
331333

332334
```python
333335
df_bills['curie_search'] = df_bills["text"].apply(lambda x : get_embedding(x, engine = 'text-search-curie-doc-001'))
@@ -341,9 +343,9 @@ df_bills
341343

342344
:::image type="content" source="../media/tutorials/embed-text-documents.png" alt-text="Screenshot of the formatted results from df_bills command." lightbox="../media/tutorials/embed-text-documents.png":::
343345

344-
At the time of search (live compute), we'll embed the search query using the corresponding *query* model (`text-search-query-001`). Next find the closest embedding in the database, ranked by [cosine similarity](../concepts/understand-embeddings.md).
346+
At the time of search (live compute), we'll embed the search query using the corresponding **query model** (`text-search-query-001`). Next find the closest embedding in the database, ranked by [cosine similarity](../concepts/understand-embeddings.md).
345347

346-
In our example, the user provides the query "can I get information on cable company tax revenue". The query is passed through a function that embeds the query with the corresponding *query model* and finds the embedding closest to it from the previously embedded documents in the previous step.
348+
In our example, the user provides the query "can I get information on cable company tax revenue". The query is passed through a function that embeds the query with the corresponding **query model** and finds the embedding closest to it from the previously embedded documents in the previous step.
347349

348350
```python
349351
# search through the reviews for a specific product
@@ -370,7 +372,7 @@ res = search_docs(df_bills, "can i get information on cable company tax revenue"
370372

371373
:::image type="content" source="../media/tutorials/query-result.png" alt-text="Screenshot of the formatted results of res once the search query has been run." lightbox="../media/tutorials/query-result.png":::
372374

373-
Finally, we'll show the top result from document search based on user query against the entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of 1993", as shown in Figure 4. This document has a cosine similarity score of 0.36 between the query and the document. :
375+
Finally, we'll show the top result from document search based on user query against the entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of 1993", as shown in Figure 4. This document has a cosine similarity score of 0.36 between the query and the document:
374376

375377
```python
376378
res["summary"][9]

0 commit comments

Comments
 (0)