Skip to content

Commit 035e993

Browse files
committed
update
1 parent a9abead commit 035e993

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

articles/ai-services/openai/includes/embeddings-python.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ source /etc/environment
8686

8787
---
8888

89-
After setting the environment variables, you may need to close and reopen Jupyter notebooks or whatever IDE you're using in order for the environment variables to be accessible. While we strongly recommend using Jupyter Notebooks, if for some reason you cannot you'll need to modify any code that is returning a pandas dataframe by using `print(dataframe_name)` rather than just calling the `dataframe_name` directly as is often done at the end of a code block.
89+
After setting the environment variables, you might need to close and reopen Jupyter notebooks or whatever IDE you're using in order for the environment variables to be accessible. While we strongly recommend using Jupyter Notebooks, if for some reason you can't you'll need to modify any code that is returning a pandas dataframe by using `print(dataframe_name)` rather than just calling the `dataframe_name` directly as is often done at the end of a code block.
9090

9191
Run the following code in your preferred Python IDE:
9292

@@ -339,9 +339,9 @@ len(decode)
339339
1466
340340
```
341341

342-
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an [Azure Database to support Vector Search](../../../cosmos-db/mongodb/vcore/vector-search.md). As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
342+
Now that we understand more about how tokenization works we can move on to embedding. It's important to note, that we haven't actually tokenized the documents yet. The `n_tokens` column is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an [Azure Database to support Vector Search](../../../cosmos-db/mongodb/vcore/vector-search.md). As a result, each bill will have its own corresponding embedding vector in the new `ada_v2` column on the right side of the DataFrame.
343343

344-
In the example below we are calling the embedding model once per every item that we want to embed. When working with large embedding projects you can alternatively pass the model an array of inputs to embed rather than one input at a time. When you pass the model an array of inputs the max number of input items per call to the embedding endpoint is 2048.
344+
In the example below we're calling the embedding model once per every item that we want to embed. When working with large embedding projects you can alternatively pass the model an array of inputs to embed rather than one input at a time. When you pass the model an array of inputs the max number of input items per call to the embedding endpoint is 2048.
345345

346346
# [OpenAI Python 1.x](#tab/python-new)
347347

articles/ai-services/openai/tutorials/fine-tune.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Azure OpenAI Service fine-tuning gpt-3.5-turbo
33
titleSuffix: Azure OpenAI
4-
description: Learn how to use Azure OpenAI's latest fine-tuning capabilities with gpt-3.5-turbo
4+
description: Learn how to use Azure OpenAI's latest fine-tuning capabilities with gpt-3.5-turbo.
55
#services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
@@ -13,7 +13,7 @@ recommendations: false
1313
ms.custom:
1414
---
1515

16-
# Azure OpenAI GPT 3.5 Turbo fine-tuning tutorial
16+
# Azure OpenAI GPT-3.5 Turbo fine-tuning tutorial
1717

1818
This tutorial walks you through fine-tuning a `gpt-35-turbo-0613` model.
1919

@@ -118,7 +118,7 @@ For this example we'll modify this slightly by changing to:
118118

119119
While these three examples are helpful to give you the general format, if you want to steer your custom fine-tuned model to respond in a similar way you would need more examples. Generally you want **at least 50 high quality examples** to start out. However, it is entirely possible to have a use case that might require 1,000's of high quality training examples to be successful.
120120

121-
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
121+
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples, you could end up with a model that performs much worse than expected.
122122

123123
You'll need to create two files `training_set.jsonl` and `validation_set.jsonl`.
124124

@@ -564,7 +564,7 @@ Alternatively, you can deploy your fine-tuned model using any of the other commo
564564
| resource_group | The resource group name for your Azure OpenAI resource |
565565
| resource_name | The Azure OpenAI resource name |
566566
| model_deployment_name | The custom name for your new fine-tuned model deployment. This is the name that will be referenced in your code when making chat completion calls. |
567-
| fine_tuned_model | Retrieve this value from your fine-tuning job results in the previous step. It will look like `gpt-35-turbo-0613.ft-b044a9d3cf9c4228b5d393567f693b83`. You will need to add that value to the deploy_data json. |
567+
| fine_tuned_model | Retrieve this value from your fine-tuning job results in the previous step. It will look like `gpt-35-turbo-0613.ft-b044a9d3cf9c4228b5d393567f693b83`. You'll need to add that value to the deploy_data json. |
568568

569569
[!INCLUDE [Fine-tuning deletion](../includes/fine-tune.md)]
570570

@@ -667,7 +667,7 @@ print(response['choices'][0]['message']['content'])
667667

668668
## Delete deployment
669669

670-
Unlike other types of Azure OpenAI models, fine-tuned/customized models have [an hourly hosting cost](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing) associated with them once they are deployed. It is **strongly recommended** that once you're done with this tutorial and have tested a few chat completion calls against your fine-tuned model, that you **delete the model deployment**.
670+
Unlike other types of Azure OpenAI models, fine-tuned/customized models have [an hourly hosting cost](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing) associated with them once they're deployed. It's **strongly recommended** that once you're done with this tutorial and have tested a few chat completion calls against your fine-tuned model, that you **delete the model deployment**.
671671

672672
Deleting the deployment won't affect the model itself, so you can re-deploy the fine-tuned model that you trained for this tutorial at any time.
673673

0 commit comments

Comments
 (0)