Skip to content

Commit 31626c2

Browse files
committed
update based on feedback
1 parent c4c65bb commit 31626c2

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

articles/cognitive-services/openai/concepts/models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,11 +222,11 @@ These models can only be used with Completions API requests.
222222
These models can only be used with Embedding API requests.
223223

224224
> [!NOTE]
225-
> We strongly recommend using `text-embedding-ada-002 (version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using version 1 you should migrate to version 2 to take advantage of the latest weights/updated token limit.
225+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
226226
227227
| Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
228228
| --- | --- | --- | --- | --- |
229-
| text-embedding-ada-002 (version 2) | East US, South Central US, West Europe | N/A |8,191 | Sep 2021 |
229+
| text-embedding-ada-002 (version 2) | East US, South Central US | N/A |8,191 | Sep 2021 |
230230
| text-embedding-ada-002 (version 1) | East US, South Central US, West Europe | N/A |4,095 | Sep 2021 |
231231
| text-similarity-ada-001| East US, South Central US, West Europe | N/A | 2,046 | Aug 2020 |
232232
| text-similarity-babbage-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |

articles/cognitive-services/openai/tutorials/embeddings.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ In this tutorial, you learn how to:
2525
> * Install Azure OpenAI and other dependent Python libraries.
2626
> * Download the BillSum dataset and prepare it for analysis.
2727
> * Create environment variables for your resources endpoint and API key.
28-
> * Use the **text-embedding-ada-002 (version 2)** model
28+
> * Use the **text-embedding-ada-002 (Version 2)** model
2929
> * Use [cosine similarity](../concepts/understand-embeddings.md) to rank search results.
3030
3131
> [!Important]
32-
> We strongly recommend using `text-embedding-ada-002 (version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using version 1 you should migrate to version 2 to take advantage of the latest weights/updated token limit.
32+
> We strongly recommend using `text-embedding-ada-002 (Version 2)`. It is the only model/version to provide parity with OpenAI's `text-embedding-ada-002`. To learn more about the improvements offered by this model, please refer to [OpenAI's blog post](https://openai.com/blog/new-and-improved-embedding-model). Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
3333
3434
## Prerequisites
3535

@@ -39,7 +39,7 @@ In this tutorial, you learn how to:
3939
* <a href="https://www.python.org/" target="_blank">Python 3.7.1 or later version</a>
4040
* The following Python libraries: openai, num2words, matplotlib, plotly, scipy, scikit-learn, pandas, tiktoken.
4141
* [Jupyter Notebooks](https://jupyter.org/)
42-
* An Azure OpenAI resource with the **text-embedding-ada-002 (version 2)** model deployed. This model is currently only available in [certain regions](../concepts/models.md#model-summary-table-and-region-availability). If you don't have a resource the process of creating one is documented in our [resource deployment guide](../how-to/create-resource.md).
42+
* An Azure OpenAI resource with the **text-embedding-ada-002 (Version 2)** model deployed. This model is currently only available in [certain regions](../concepts/models.md#model-summary-table-and-region-availability). If you don't have a resource the process of creating one is documented in our [resource deployment guide](../how-to/create-resource.md).
4343

4444
## Set up
4545

@@ -356,7 +356,7 @@ len(decode)
356356
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an Azure Database. As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
357357

358358
```python
359-
df_bills['ada_v2'] = df_bills["text"].apply(lambda x : get_embedding(x, engine = 'text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (version 2) model
359+
df_bills['ada_v2'] = df_bills["text"].apply(lambda x : get_embedding(x, engine = 'text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model
360360
```
361361

362362
```python
@@ -367,14 +367,14 @@ df_bills
367367

368368
:::image type="content" source="../media/tutorials/embed-text-documents.png" alt-text="Screenshot of the formatted results from df_bills command." lightbox="../media/tutorials/embed-text-documents.png":::
369369

370-
As we run the search code block below, we'll embed the search query *"Can I get information on cable company tax revenue?"* with the same **text-embedding-ada-002 (version 2)** model. Next we'll find the closest bill embedding to the newly embedded text from our query ranked by [cosine similarity](../concepts/understand-embeddings.md).
370+
As we run the search code block below, we'll embed the search query *"Can I get information on cable company tax revenue?"* with the same **text-embedding-ada-002 (Version 2)** model. Next we'll find the closest bill embedding to the newly embedded text from our query ranked by [cosine similarity](../concepts/understand-embeddings.md).
371371

372372
```python
373373
# search through the reviews for a specific product
374374
def search_docs(df, user_query, top_n=3, to_print=True):
375375
embedding = get_embedding(
376376
user_query,
377-
engine="text-embedding-ada-002" # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (version 2) model
377+
engine="text-embedding-ada-002" # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model
378378
)
379379
df["similarities"] = df.ada_v2.apply(lambda x: cosine_similarity(x, embedding))
380380

0 commit comments

Comments
 (0)