Skip to content

Commit bcf322c

Browse files
Merge pull request #1987 from mrbullwinkle/mrb_12_12_2024_stored_completions
[Azure OpenAI] Stored Completions
2 parents eb1cc7e + a8c9da5 commit bcf322c

File tree

13 files changed

+215
-4
lines changed

13 files changed

+215
-4
lines changed

articles/ai-services/openai/how-to/fine-tuning.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ description: Learn how to create your own customized model with Azure OpenAI Ser
55
#services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
8-
ms.custom: build-2023, build-2023-dataai, devx-track-python
8+
ms.custom: build-2023, build-2023-dataai, devx-track-python, references_regions
99
ms.topic: how-to
10-
ms.date: 11/11/2024
10+
ms.date: 12/13/2024
1111
author: mrbullwinkle
1212
ms.author: mbullwin
1313
zone_pivot_groups: openai-fine-tuning
@@ -44,6 +44,20 @@ We use LoRA, or low rank approximation, to fine-tune models in a way that reduce
4444

4545
::: zone-end
4646

47+
## Global Standard
48+
49+
Azure OpenAI fine-tuning supports [global standard deployments](./deployment-types.md#global-standard) in East US2, North Central US, and Sweden Central for:
50+
51+
- `gpt-4o-2024-08-06`
52+
- `gpt-4o-mini-2024-07-18`
53+
54+
Global standard fine-tuned deployments offer [cost savings](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/), but custom model weights may temporarily be stored outside the geography of your Azure OpenAI resource.
55+
56+
:::image type="content" source="../media/fine-tuning/global-standard.png" alt-text="Screenshot of the global standard deployment user experience with a fine-tuned model." lightbox="../media/fine-tuning/global-standard.png":::
57+
58+
Global Standard fine-tuning deployments currently do not support vision and structured outputs.
59+
60+
4761
## Vision fine-tuning
4862

4963
Fine-tuning is also possible with images in your JSONL files. Just as you can send one or many image inputs to chat completions, you can include those same message types within your training data. Images can be provided either as publicly accessible URLs or data URIs containing [base64 encoded images](/azure/ai-services/openai/how-to/gpt-with-vision?tabs=rest#call-the-chat-completion-apis).
@@ -79,6 +93,10 @@ Images containing the following will be excluded from your dataset and not used
7993
> [!IMPORTANT]
8094
>For vision fine tuning face screening process: We screen for faces/people to skip those images from training the model. The screening capability leverages face detection **WITHOUT** Face identification which means we don't create facial templates or measure specific facial geometry, and the technology used to screen for faces is incapable of uniquely identifying the individuals. To know more about data and Privacy for face refer to - [Data and privacy for Face - Azure AI services | Microsoft Learn](/legal/cognitive-services/computer-vision/imageanalysis-data-privacy-security?context=%2Fazure%2Fai-services%2Fcomputer-vision%2Fcontext%2Fcontext).
8195
96+
## Prompt caching
97+
98+
Azure OpenAI fine-tuning supports prompt caching with select models. Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. To learn more about prompt caching, see [getting started with prompt caching](./prompt-caching.md).
99+
82100
## Troubleshooting
83101

84102
### How do I enable fine-tuning?

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: how-to
9-
ms.date: 10/18/2024
9+
ms.date: 12/15/2024
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
1313
---
1414

1515
# Prompt caching
1616

17-
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for Standard deployment types and up to [100% discount on input tokens](/azure/ai-services/openai/concepts/provisioned-throughput) for Provisioned deployment types.
17+
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for Standard deployment types and up to [100% discount on input tokens](/azure/ai-services/openai/concepts/provisioned-throughput) for Provisioned deployment types.
1818

1919
Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. Prompt caches are not shared between Azure subscriptions.
2020

@@ -28,6 +28,9 @@ Currently only the following models support prompt caching with Azure OpenAI:
2828
- `gpt-4o-2024-08-06`
2929
- `gpt-4o-mini-2024-07-18`
3030

31+
> [!NOTE]
32+
> Prompt caching is now also available as part of model fine-tuning for `gpt-4o` and `gpt-4o-mini`. Refer to the fine-tuning section of the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for details.
33+
3134
## API support
3235

3336
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o1 model family supports the `cached_tokens` API response parameter.
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
title: 'How to use Azure OpenAI Service stored completions & distillation'
3+
titleSuffix: Azure OpenAI
4+
description: Learn how to use stored completions & distillation with Azure OpenAI
5+
manager: nitinme
6+
ms.service: azure-ai-openai
7+
ms.topic: how-to
8+
ms.custom: references_regions
9+
ms.date: 12/12/2024
10+
author: mrbullwinkle
11+
ms.author: mbullwin
12+
recommendations: false
13+
---
14+
15+
# Azure OpenAI stored completions & distillation (preview)
16+
17+
Stored completions allow you to capture the conversation history from chat completions sessions to use as datasets for [evaluations](./evaluations.md) and [fine-tuning](./fine-tuning.md).
18+
19+
## Stored completions support
20+
21+
### API support
22+
23+
- `2024-10-01-preview`
24+
25+
### Model support
26+
27+
- `gpt-4o-2024-08-06`
28+
29+
### Regional availability
30+
31+
- Sweden Central
32+
33+
## Configure stored completions
34+
35+
To enable stored completions for your Azure OpenAI deployment set the `store` parameter to `True`. Use the `metadata` parameter to enrich your stored completion dataset with additional information.
36+
37+
38+
# [Python (Microsoft Entra ID)](#tab/python-secure)
39+
40+
```python
41+
import os
42+
from openai import AzureOpenAI
43+
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
44+
45+
token_provider = get_bearer_token_provider(
46+
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
47+
)
48+
49+
client = AzureOpenAI(
50+
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
51+
azure_ad_token_provider=token_provider,
52+
api_version="2024-10-01-preview"
53+
)
54+
55+
completion = client.chat.completions.create(
56+
57+
model="gpt-4o", # replace with model deployment name
58+
store= True,
59+
metadata = {
60+
"user": "admin",
61+
"category": "docs-test",
62+
},
63+
messages=[
64+
{"role": "system", "content": "Provide a clear and concise summary of the technical content, highlighting key concepts and their relationships. Focus on the main ideas and practical implications."},
65+
{"role": "user", "content": "Ensemble methods combine multiple machine learning models to create a more robust and accurate predictor. Common techniques include bagging (training models on random subsets of data), boosting (sequentially training models to correct previous errors), and stacking (using a meta-model to combine base model predictions). Random Forests, a popular bagging method, create multiple decision trees using random feature subsets. Gradient Boosting builds trees sequentially, with each tree focusing on correcting the errors of previous trees. These methods often achieve better performance than single models by reducing overfitting and variance while capturing different aspects of the data."}
66+
]
67+
)
68+
69+
print(completion.choices[0].message)
70+
71+
72+
```
73+
74+
# [Python (API Key)](#tab/python-key)
75+
76+
[!INCLUDE [Azure key vault](~/reusable-content/ce-skilling/azure/includes/ai-services/security/azure-key-vault.md)]
77+
78+
```python
79+
import os
80+
from openai import AzureOpenAI
81+
82+
client = AzureOpenAI(
83+
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
84+
api_version="2024-10-01-preview",
85+
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
86+
)
87+
88+
ompletion = client.chat.completions.create(
89+
90+
model="gpt-4o", # replace with model deployment name
91+
store= True,
92+
metadata = {
93+
"user": "admin",
94+
"category": "docs-test",
95+
},
96+
messages=[
97+
{"role": "system", "content": "Provide a clear and concise summary of the technical content, highlighting key concepts and their relationships. Focus on the main ideas and practical implications."},
98+
{"role": "user", "content": "Ensemble methods combine multiple machine learning models to create a more robust and accurate predictor. Common techniques include bagging (training models on random subsets of data), boosting (sequentially training models to correct previous errors), and stacking (using a meta-model to combine base model predictions). Random Forests, a popular bagging method, create multiple decision trees using random feature subsets. Gradient Boosting builds trees sequentially, with each tree focusing on correcting the errors of previous trees. These methods often achieve better performance than single models by reducing overfitting and variance while capturing different aspects of the data."}
99+
]
100+
)
101+
102+
print(completion.choices[0].message)
103+
```
104+
---
105+
106+
Once stored completions are enabled for an Azure OpenAI deployment, they'll begin to show up in the [Azure AI Foundry portal](https://oai.azure.com) in the **Stored Completions** pane.
107+
108+
:::image type="content" source="../media/stored-completions/stored-completions.png" alt-text="Screenshot of the stored completions User Experience." lightbox="../media/stored-completions/stored-completions.png":::
109+
110+
## Distillation
111+
112+
Distillation allows you to turn your stored completions into a fine-tuning dataset. A common use case is to use stored completions with a larger more powerful model for a particular task and then use the stored completions to train a smaller model on high quality examples of model interactions.
113+
114+
Distillation requires a minimum of 10 stored completions, though it's recommended to provide hundreds to thousands of stored completions for the best results.
115+
116+
1. From the **Stored Completions** pane in the [Azure AI Foundry portal](https://oai.azure.com) use the **Filter** options to select the completions you want to train your model with.
117+
118+
2. To begin distillation, select **Distill**
119+
120+
:::image type="content" source="../media/stored-completions/distill.png" alt-text="Screenshot of the stored completions User Experience with distill highlighted." lightbox="../media/stored-completions/distill.png":::
121+
122+
3. Pick which model you would like to fine-tune with your stored completion dataset.
123+
124+
:::image type="content" source="../media/stored-completions/fine-tune.png" alt-text="Screenshot of the stored completion distillation model selection." lightbox="../media/stored-completions/fine-tune.png":::
125+
126+
4. Confirm which version of the model you want to fine-tune:
127+
128+
:::image type="content" source="../media/stored-completions/version.png" alt-text="Screenshot of the stored completion distillation version." lightbox="../media/stored-completions/version.png":::
129+
130+
5. A `.jsonl` file with a randomly generated name will be created as a training dataset from your stored completions. Select the file > **Next**.
131+
132+
> [!Note]
133+
> Stored completion distillation training files cannot be accessed directly and cannot be exported externally/downloaded.
134+
135+
:::image type="content" source="../media/stored-completions/file-name.png" alt-text="Screenshot of the stored completion training dataset jsonl file." lightbox="../media/stored-completions/file-name.png":::
136+
137+
The rest of the steps correspond to the typical Azure OpenAI fine-tuning steps. To learn more, see our [fine-tuning getting started guide](./fine-tuning.md).
138+
139+
## Evaluation
140+
141+
The [evaluation](./evaluations.md) of large language models is a critical step in measuring their performance across various tasks and dimensions. This is especially important for fine-tuned models, where assessing the performance gains (or losses) from training is crucial. Thorough evaluations can help your understanding of how different versions of the model may impact your application or scenario.
142+
143+
Stored completions can be used as a dataset for running evaluations.
144+
145+
1. From the **Stored Completions** pane in the [Azure AI Foundry portal](https://oai.azure.com) use the **Filter** options to select the completions you want to be part of your evaluation dataset.
146+
147+
2. To configure the evaluation, select **Evaluate**
148+
149+
:::image type="content" source="../media/stored-completions/evaluate.png" alt-text="Screenshot of the stored completion pane with evaluate selected." lightbox="../media/stored-completions/evaluate.png":::
150+
151+
3. This launches the **Evaluations** pane with a prepopulated `.jsonl` file with a randomly generated name that is created as an evaluation dataset from your stored completions.
152+
153+
> [!Note]
154+
> Stored completion evaluation data files cannot be accessed directly and cannot be exported externally/downloaded.
155+
156+
:::image type="content" source="../media/stored-completions/evaluation-data.png" alt-text="Screenshot of the evaluations pane." lightbox="../media/stored-completions/evaluation-data.png":::
157+
158+
To learn more about evaluation see, [getting started with evaluations](./evaluations.md)
159+
160+
## Troubleshooting
161+
162+
### Do I need special permissions to use stored completions?
163+
164+
Stored completions access is controlled via two DataActions:
165+
166+
- `Microsoft.CognitiveServices/accounts/OpenAI/stored-completions/read`
167+
- `Microsoft.CognitiveServices/accounts/OpenAI/stored-completions/action`
168+
169+
By default `Cognitive Services OpenAI Contributor` has access to both these permissions:
170+
171+
:::image type="content" source="../media/stored-completions/permissions.png" alt-text="Screenshot of stored completions permissions." lightbox="../media/stored-completions/permissions.png":::
172+
173+
### How do I delete stored data?
174+
175+
Data can be deleted by deleting the associated Azure OpenAI resource. If you wish to only delete stored completion data you must open a case with customer support.
176+
177+
### How much stored completion data can I store?
178+
179+
You can store a maximum 10 GB of data.
180+
181+
### Can I prevent stored completions from ever being enabled on a subscription?
182+
183+
You'll need to open a case with customer support to disable stored completions at the subscription level.
184+
185+
### TypeError: Completions.create() got an unexpected argument 'store'
186+
187+
This error occurs when you're running an older version of the OpenAI client library that predates the stored completions feature being released. Run `pip install openai --upgrade`.
188+
59.2 KB
Loading
103 KB
Loading
69.7 KB
Loading
80.5 KB
Loading
45.9 KB
Loading
35.7 KB
Loading
147 KB
Loading

0 commit comments

Comments
 (0)