Skip to content

Commit 5cf2aec

Browse files
committed
update
1 parent c740954 commit 5cf2aec

File tree

2 files changed

+22
-21
lines changed

2 files changed

+22
-21
lines changed

articles/ai-foundry/how-to/fine-tuning-cost-management.md

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: 'Fine-tuning cost management'
33
titleSuffix: Azure OpenAI
4-
description: Learn about cost management
4+
description: Learn about the training and hosting costs associated with fine-tuning
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: how-to
@@ -16,7 +16,7 @@ show_latex: true
1616
Fine-tuning can be intimidating: unlike base models, where you're just paying for input and output tokens for inferencing, fine-tuning requires training your custom models and dealing with hosting. This guide is intended to help you better understand the costs of fine-tuning and how to manage them.
1717

1818
> [!NOTE]
19-
> The prices in this article are for example purposes only. In some cases they may match current pricing, but you should refer to the official [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service) for exact pricing details.
19+
> The prices in this article are for example purposes only. In some cases they may match current pricing, but you should refer to the official [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service) for exact pricing details to use in the formulas provided in this article.
2020
2121
## Upfront investment - training your model
2222

@@ -26,7 +26,7 @@ This is the one-time, fixed cost associated with teaching a base model your spec
2626

2727
**Supervised Fine-Tuning (SFT) & Preference Fine-Tuning (DPO)**
2828

29-
It's straightforward to estimate the costs for SFT & DPO. You are charged based on the number of tokens in your training file, and the number of epochs for your training job.
29+
It's straightforward to estimate the costs for SFT & DPO. You're charged based on the number of tokens in your training file, and the number of epochs for your training job.
3030

3131
$$
3232
\text{price} = \text{\# training tokens} \times \text{\# epochs} \times \text{price per token}
@@ -37,7 +37,7 @@ In general, smaller models and more recent models have lower prices per token th
3737
We offer both regional and global training for SFT; if you don't need data residency, global training allows you to train at a discounted rate.
3838

3939
> [!IMPORTANT]
40-
> We do not charge you for time spent in queue, failed jobs, jobs cancelled prior to training beginning, or data safety checks.
40+
> We don't charge you for time spent in queue, failed jobs, jobs canceled prior to training beginning, or data safety checks.
4141
4242
#### Example: Supervised Fine-Tuning
4343

@@ -47,7 +47,7 @@ Projecting the costs to fine-tune a model that takes natural language and output
4747
2. Configure training run:
4848
- Select base model (GPT-4.1)
4949
- Specify global training
50-
- Set hyperparameters to "default". Algorithmically, training is set to 2 epochs.
50+
- Set hyperparameters to **default**. Algorithmically, training is set to 2 epochs.
5151

5252
Training runs for 2 hours, 15 minutes
5353

@@ -76,10 +76,10 @@ $$
7676
Let's project the cost to train a customer service chatbot.
7777

7878
- Submit fine-tuning job: Time when the fine-tuning job was submitted: 02:00 hrs
79-
- Data pre-processing completed: It took 30mins for data pre-processing to be completed which includes data safety checks. This time is not used for billing.
79+
- Data preprocessing completed: It took 30mins for data preprocessing to be completed which includes data safety checks. This time isn't used for billing.
8080
- Training started: Training starts at 02:30 hours
8181
- Training completed: Training is completed at 06:30 hours
82-
- Model creation: A deployable model checkpoint is created after training completion which included post-training model safety steps. This time is not used for billing.
82+
- Model creation: A deployable model checkpoint is created after training completion, which included post-training model safety steps. This time isn't used for billing.
8383

8484
**Final Calculation**:
8585

@@ -94,13 +94,13 @@ Your one-time investment to create this `o4-mini` custom fine tuned model would
9494
Let's project the cost to train a customer service chatbot.
9595

9696
- Submit fine tuning job: Time when FT job has started say 02:00 hrs
97-
- Data pre-processing completed: It took 30mins for data pre-processing to be completed which includes data safety checks. This time is not used for billing.
97+
- Data preprocessing completed: It took 30mins for data preprocessing to be completed which includes data safety checks. This time isn't used for billing.
9898
- Training started: Training starts at 02:30 hours
9999
- Training completed: Training is completed at 06:30 hours
100100
- Model grader pricing:
101-
- Total Input tokens used for grading – 5 M tokens
102-
- Total Output tokens used for grading – 4.9 M tokens
103-
- Model creation: A deployable model checkpoint is created after training completion which included post-training model safety steps. This time is not used for billing.
101+
- Total Input tokens used for grading – 5 million tokens
102+
- Total Output tokens used for grading – 4.9 million tokens
103+
- Model creation: A deployable model checkpoint is created after training completion, which included post-training model safety steps. This time isn't used for billing.
104104

105105
**Final Calculation**:
106106

@@ -122,21 +122,21 @@ $$
122122

123123
Your one-time investment to create this o4-mini custom fine tuned model would be **$427.06**
124124

125-
### Managing costs and spend limits when using RFT
125+
### Managing costs and spending limits when using RFT
126126

127-
To control your spend we recommend:
127+
To control your spending, we recommend:
128128

129129
- Start with shorter runs to understand how your configuration affects time – Use configuration `reasoning effort` – Low, smaller validation data sets
130130
- Use a reasonable number of validation examples and `eval_samples`. Avoid validating more often than you need.
131131
- Choose the smallest grader model that meets your quality requirements.
132132
- Adjust `compute_multiplier` to balance convergence speed and cost.
133133
- Monitor your run in the Foundry portal or via the API. You can pause or cancel at any time.
134134

135-
As RFT jobs can lead to high training costs, we are capping the pricing for per training job billing to **$5000** which means this will be the maximum amount that a job can cost before we end the job. The training will be paused and a deployable checkpoint will be created. Users can validate the training job, metrics, logs and then decide to resume the job to complete further. If the user decides to resume the job, billing will continue for the job and subsequently no further price limits would be placed on the training job.
135+
As RFT jobs can lead to high training costs, we're capping the pricing for per training job billing to **$5000** which means this will be the maximum amount that a job can cost before we end the job. The training will be paused and a deployable checkpoint will be created. Users can validate the training job, metrics, logs and then decide to resume the job to complete further. If the user decides to resume the job, billing will continue for the job and subsequently no further price limits would be placed on the training job.
136136

137137
### Job failures and cancellations
138138

139-
You are not billed for work lost due to our error. However, if you cancel a run, you'll be charged for the work completed up to that point.
139+
You aren't billed for work lost due to our error. However, if you cancel a run, you'll be charged for the work completed up to that point.
140140

141141
**Example**: the run trains for 2 hours, writes a checkpoint, trains for 1 more hour, but then fails. Only the 2 hours of training up to the checkpoint are billable.
142142

@@ -147,9 +147,9 @@ After your model is trained, you can deploy it any number of times using the fol
147147
We have three options for hosting:
148148

149149
- **Standard**: pay per-token at the same rate as base model Standard deployments with an additional $1.70/hour hosting fee. Offers regional data residency guarantees.
150-
- **Global Standard**: pay per-token at the same rate as base model Global Standard deployments with an additional $1.70/hour hosting fee. Does not offer data residency guarantees. Offers higher throughput than Standard deployments.
150+
- **Global Standard**: pay per-token at the same rate as base model Global Standard deployments with an additional $1.70/hour hosting fee. Doesn't offer data residency guarantees. Offers higher throughput than Standard deployments.
151151
- **Regional Provisioned Throughput**: offers latency guarantees, designed for latency-sensitive workloads. Instead of paying per-token or an hourly hosting fee, deployments accrue PTU-hours based on the number of provisioned throughput units (PTU) assigned to the deployment, and billed at a rate determined by your agreements or reservations with Microsoft Azure.
152-
- **Developer Tier (Public Preview)**: pay per-token and without an hourly hosting fee but offers neither data residency nor availability guarantees. Designed for model candidate evaluation and proof of concepts, deployments are removed automatically after 24 hours regardless of usage but may be redeployed as needed.
152+
- **Developer Tier (Public Preview)**: pay per-token and without an hourly hosting fee but offers neither data residency nor availability guarantees. Developer Tier is designed for model candidate evaluation and proof of concepts, deployments are removed automatically after 24 hours regardless of usage but may be redeployed as needed.
153153

154154
The right deployment type for your use case depends on weighing your AI requirements and where you are on your fine-tuning journey.
155155

@@ -160,16 +160,15 @@ The right deployment type for your use case depends on weighing your AI requirem
160160
| Regional Provisioned Throughput | [SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) | [PTU](azure/ai-foundry/openai/concepts/provisioned-throughput?tabs=global-ptum#when-to-use-provisioned-throughput) | None | PTU/hour |
161161
| Developer Tier | None | - | Same as Global Standard | None |
162162

163-
Pricing Structure for all models are called out in the Pricing page - Azure OpenAI Service - Pricing
164-
Microsoft Azure
163+
Full pricing information for all models is available on the [Azure OpenAI pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service).
165164

166165
### Example for o4-mini
167166

168167
- Hosting charges: $1.70 per hour
169168
- Input Cost: $1.10 per 1M tokens
170169
- Output Cost: $4.40 per 1M tokens
171170

172-
#### Worked Example: Monthly Usage of a Fine-Tuned Chatbot
171+
#### Example: Monthly Usage of a Fine-Tuned Chatbot
173172

174173
Let's assume your chatbot handles 10,000 customer conversations in its first month:
175174

articles/ai-foundry/toc.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -714,7 +714,9 @@ items:
714714
displayName: finetuning, fine-tuning
715715
- name: Tool calling
716716
href: openai/how-to/fine-tuning-functions.md?context=/azure/ai-foundry/context/context
717-
displayName: finetuning, fine-tuning
717+
displayName: finetuning, fine-tuning
718+
- name: Fine-tuning cost management
719+
href: openai/how-to/fine-tuning-cost-management.md
718720
- name: Weights & Biases integration (preview)
719721
href: openai/how-to/weights-and-biases-integration.md?context=/azure/ai-foundry/context/context
720722
displayName: finetuning, fine-tuning

0 commit comments

Comments
 (0)