Skip to content

Commit c740954

Browse files
committed
update
1 parent 0b73d35 commit c740954

File tree

1 file changed

+93
-64
lines changed

1 file changed

+93
-64
lines changed

articles/ai-foundry/how-to/fine-tuning-cost-management.md

Lines changed: 93 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -11,75 +11,85 @@ ms.author: mbullwin
1111
show_latex: true
1212
---
1313

14-
# Cost Management for Fine Tuning
14+
# Cost management for fine-tuning
1515

16-
Fine tuning can be intimidating: unlike base models, where you're just paying for input and output tokens for inferencing, fine tuning requires training your custom models and dealing with hosting. This guide is intended to help you better understand the costs of fine tuning and how to manage them.
16+
Fine-tuning can be intimidating: unlike base models, where you're just paying for input and output tokens for inferencing, fine-tuning requires training your custom models and dealing with hosting. This guide is intended to help you better understand the costs of fine-tuning and how to manage them.
17+
18+
> [!NOTE]
19+
> The prices in this article are for example purposes only. In some cases they may match current pricing, but you should refer to the official [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service) for exact pricing details.
1720
1821
## Upfront investment - training your model
1922

2023
This is the one-time, fixed cost associated with teaching a base model your specific requirements using your training data.
2124

2225
### The calculation formula
2326

24-
**Supervised Fine Tuning (SFT) & Preference Fine Tuning (DPO)**
27+
**Supervised Fine-Tuning (SFT) & Preference Fine-Tuning (DPO)**
2528

26-
It's straightforward to estimate the costs for SFT & DPO: you are charged based on the number of tokens in your training file, and the number of epochs for your training job.
29+
It's straightforward to estimate the costs for SFT & DPO. You are charged based on the number of tokens in your training file, and the number of epochs for your training job.
2730

2831
$$
2932
\text{price} = \text{\# training tokens} \times \text{\# epochs} \times \text{price per token}
3033
$$
3134

32-
In general, smaller models and more recent models have lower prices per token than larger, older models. To estimate the number of tokens in your file, you can use the tiktoken library – or, for a less precise estimate, one word is roughly equivalent to four tokens.
35+
In general, smaller models and more recent models have lower prices per token than larger, older models. To estimate the number of tokens in your file, you can use the [tiktoken library](https://github.com/openai/tiktoken) – or, for a less precise estimate, one word is roughly equivalent to four tokens.
3336

3437
We offer both regional and global training for SFT; if you don't need data residency, global training allows you to train at a discounted rate.
3538

36-
We do not charge you for time spent in queue, failed or cancelled jobs, or data safety checks.
39+
> [!IMPORTANT]
40+
> We do not charge you for time spent in queue, failed jobs, jobs cancelled prior to training beginning, or data safety checks.
41+
42+
#### Example: Supervised Fine-Tuning
43+
44+
Projecting the costs to fine-tune a model that takes natural language and outputs code.
3745

38-
#### Worked example: Supervised Fine Tuning
46+
1. Prepare the training data file: 500 prompt / response pairs, with a total of 1 million tokens, and a validation file with 20 examples and 40,000 tokens.
47+
2. Configure training run:
48+
- Select base model (GPT-4.1)
49+
- Specify global training
50+
- Set hyperparameters to "default". Algorithmically, training is set to 2 epochs.
3951

40-
Projecting the costs to fine tune a model for natural language to code.
52+
Training runs for 2 hours, 15 minutes
4153

42-
- Prepare the training data file: 500 prompt / response pairs, with a total of 1M tokens, and a validation file with 20 examples and 40K tokens
43-
- Configure training run:
44-
- select base model (GPT-4.1)
45-
- specify global training
46-
- set hyperparameters to "default". Algorithmically, training is set to 2 epochs.
47-
- Training runs for 2 hours, 15 minutes
48-
**Total cost**:
49-
`$2/1M tokens * 1M training tokens * 2 epochs = $4`
54+
**Total cost**:
55+
56+
$$
57+
\$2 \div 1\text{M tokens} \times 1\text{M training tokens} \times 2\text{ epochs} = \$4
58+
$$
5059

51-
### Reinforcement Fine Tuning (RFT)
60+
### Reinforcement Fine-Tuning (RFT)
5261

5362
The cost is determined by the time spent on training the model for Reinforcement fine tuning technique.
5463

55-
#### The formula is:
64+
**The formula is:**
5665

57-
```
58-
price = Time taken for trainingHourly training price + Grader inferencing per token if model grader is used
59-
```
66+
$$
67+
\text{price} = \text{Time taken for training} \times \text{Hourly training cost} + \text{Grader inferencing per token (if model grader is used)}
68+
$$
6069

61-
- **Time**: Total time in hours rounded to two decimal places (e.g., 1.25 hours)
62-
- **Hourly Training cost**: $100 per hour of core training time for o4-mini-2025-04-16
63-
- **Model grading**: Tokens used to grade outputs during training are billed separately at datazone rates once training is complete.
70+
- **Time**: Total time in hours rounded to two decimal places (for example, 1.25 hours).
71+
- **Hourly Training cost**: $100 per hour of core training time for `o4-mini-2025-04-16`.
72+
- **Model grading**: Tokens used to grade outputs during training are billed separately at data zone rates once training is complete.
6473

65-
#### Worked Example: Training model with graders (without Model grader – Score_model)
74+
#### Example: Training model with graders (without Model grader – Score_model)
6675

6776
Let's project the cost to train a customer service chatbot.
6877

69-
- Submit fine tuning job: Time when FT job has submitted: 02:00 hrs
78+
- Submit fine-tuning job: Time when the fine-tuning job was submitted: 02:00 hrs
7079
- Data pre-processing completed: It took 30mins for data pre-processing to be completed which includes data safety checks. This time is not used for billing.
7180
- Training started: Training starts at 02:30 hours
7281
- Training completed: Training is completed at 06:30 hours
7382
- Model creation: A deployable model checkpoint is created after training completion which included post-training model safety steps. This time is not used for billing.
7483

7584
**Final Calculation**:
76-
`Total time taken for training * Hourly training cost = 4 * 100 = $200`
7785

78-
Your one-time investment to create this o4-mini custom fine tuned model would be **$400.00**.
86+
$$
87+
\text{Total time taken for training} \times \text{Hourly training cost} = 4 \times 100 = \$200
88+
$$
7989

80-
---
90+
Your one-time investment to create this `o4-mini` custom fine tuned model would be **$400.00**.
8191

82-
#### Worked Example: Training model with Model ‘Score_model’ grader (o3-mini being used as grader)
92+
#### Example: Training a model with Model ‘Score_model’ grader (o3-mini being used as grader)
8393

8494
Let's project the cost to train a customer service chatbot.
8595

@@ -94,55 +104,61 @@ Let's project the cost to train a customer service chatbot.
94104

95105
**Final Calculation**:
96106

97-
```
98-
Total time taken for training * Hourly training cost = 4 * 100 = $400
99-
Grading costs = # Input tokens * Price per i/p token + Output tokens * price per o/p token
100-
= (5 * 1.1) + (4.9 * 4.4) = 5.5 + 21.56 = $27.06
101-
Total training costs = $427.06
102-
```
107+
$$
108+
\text{Total time taken for training} \times \text{Hourly training cost} = 4 \times 100 = \$400
109+
$$
110+
111+
$$
112+
\text{Grading costs} = \text{\# Input tokens} \times \text{Price per input token} + \text{Output tokens} \times \text{Price per output token}
113+
$$
114+
115+
$$
116+
= (5 \times 1.1) + (4.9 \times 4.4) = 5.5 + 21.56 = \$27.06
117+
$$
118+
119+
$$
120+
\text{Total training costs} = \$427.06
121+
$$
103122

104123
Your one-time investment to create this o4-mini custom fine tuned model would be **$427.06**
105124

106125
### Managing costs and spend limits when using RFT
107126

108127
To control your spend we recommend:
109128

110-
- Start with shorter runs to understand how your configuration affects time – Use configuration reasoning effort – Low, smaller validation data sets
111-
- Use a reasonable number of validation examples and eval_samples. Avoid validating more often than you need.
129+
- Start with shorter runs to understand how your configuration affects time – Use configuration `reasoning effort` – Low, smaller validation data sets
130+
- Use a reasonable number of validation examples and `eval_samples`. Avoid validating more often than you need.
112131
- Choose the smallest grader model that meets your quality requirements.
113-
- Adjust compute_multiplier to balance convergence speed and cost.
132+
- Adjust `compute_multiplier` to balance convergence speed and cost.
114133
- Monitor your run in the Foundry portal or via the API. You can pause or cancel at any time.
115134

116-
As RFT job can lead to high training costs, we are capping the pricing for per training job billing to **$5000** which means this will be the maximum amount that a job can cost before we end the job. The training will be paused and a deployable checkpoint will be created. Users can validate the training job, metrics, logs and then decide to resume the job to complete further. If the user decides to resume the job, billing will continue for the job and subsequently no further price limits would be placed on the training job.
135+
As RFT jobs can lead to high training costs, we are capping the pricing for per training job billing to **$5000** which means this will be the maximum amount that a job can cost before we end the job. The training will be paused and a deployable checkpoint will be created. Users can validate the training job, metrics, logs and then decide to resume the job to complete further. If the user decides to resume the job, billing will continue for the job and subsequently no further price limits would be placed on the training job.
117136

118137
### Job failures and cancellations
119138

120-
You are not billed for work lost due to our error. If you cancel a run, you'll be charged for the work completed up to that point.
121-
Example: the run trains for 2 hours, writes a checkpoint, trains for 1 more hour, but then fails. Only the 2 hours of training up to the checkpoint are billable.
122-
123-
### Waiting on resources
139+
You are not billed for work lost due to our error. However, if you cancel a run, you'll be charged for the work completed up to that point.
140+
141+
**Example**: the run trains for 2 hours, writes a checkpoint, trains for 1 more hour, but then fails. Only the 2 hours of training up to the checkpoint are billable.
124142

125-
If job is queued
126-
127-
### Part 2: Ongoing Operational Costs – Using Your Model
143+
## Ongoing operational costs – using your model
128144

129145
After your model is trained, you can deploy it any number of times using the following deployment types:
130146

131147
We have three options for hosting:
132148

133-
- **Standard**: pay per-token at the same rate as base model Standard deployments with an additional $1.70/hour hosting fee. Offers a 99.9% availability SLA and regional data residency guarantees.
134-
- **Global Standard**: pay per-token at the same rate as base model Global Standard deployments with an additional $1.70/hour hosting fee. Offers a 99.9% availability SLA but does not offer data residency guarantees. Offers higher throughputs than Standard deployments.
135-
- **Regional Provisioned Throughput**: offers latency guarantees in addition to the availability SLA, designed for latency-sensitive workloads. Instead of paying per-token or an hourly hosting fee, deployments accrue PTU-hours based on the number of provisioned throughput units (PTU) assigned to the deployment, and billed at a rate determined by your agreements or reservations with Microsoft Azure.
136-
- **Developer Tier (Public Preview)**: pay per-token and without an hourly hosting fee but offers neither data residency nor availability SLAs. Designed for model candidate evaluation and proof of concepts, deployments are removed automatically after 24 hours regardless of usage but may be redeployed as needed.
149+
- **Standard**: pay per-token at the same rate as base model Standard deployments with an additional $1.70/hour hosting fee. Offers regional data residency guarantees.
150+
- **Global Standard**: pay per-token at the same rate as base model Global Standard deployments with an additional $1.70/hour hosting fee. Does not offer data residency guarantees. Offers higher throughput than Standard deployments.
151+
- **Regional Provisioned Throughput**: offers latency guarantees, designed for latency-sensitive workloads. Instead of paying per-token or an hourly hosting fee, deployments accrue PTU-hours based on the number of provisioned throughput units (PTU) assigned to the deployment, and billed at a rate determined by your agreements or reservations with Microsoft Azure.
152+
- **Developer Tier (Public Preview)**: pay per-token and without an hourly hosting fee but offers neither data residency nor availability guarantees. Designed for model candidate evaluation and proof of concepts, deployments are removed automatically after 24 hours regardless of usage but may be redeployed as needed.
137153

138154
The right deployment type for your use case depends on weighing your AI requirements and where you are on your fine-tuning journey.
139155

140-
| Deployment Type | Availability SLA | Latency SLA | Token Rate | Hourly Rate |
156+
| Deployment Type | Availability | Latency | Token Rate | Hourly Rate |
141157
|----------------------------|------------------|-------------|--------------------|-----------------|
142-
| Standard | 99.9% | None | Same as base model | $1.70/hour |
143-
| Global Standard | 99.9% | None | Same as base model | $1.70/hour |
144-
| Regional Provisioned Throughput | 99.9% | Same as base model | None | PTU/hour |
145-
| Developer Tier | None | None | Same as Global Standard | None |
158+
| Standard | [SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) | - | Same as base model | $1.70/hour |
159+
| Global Standard | [SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) | - | Same as base model | $1.70/hour |
160+
| Regional Provisioned Throughput | [SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) | [PTU](azure/ai-foundry/openai/concepts/provisioned-throughput?tabs=global-ptum#when-to-use-provisioned-throughput) | None | PTU/hour |
161+
| Developer Tier | None | - | Same as Global Standard | None |
146162

147163
Pricing Structure for all models are called out in the Pricing page - Azure OpenAI Service - Pricing
148164
Microsoft Azure
@@ -161,13 +177,26 @@ Let's assume your chatbot handles 10,000 customer conversations in its first mon
161177
- Total Input: The user queries sent to the model total 20 million tokens.
162178
- Total Output: The model's responses to users total 40 million tokens.
163179

164-
**Input Cost Calculation**:
165-
`20 × $1.10 = $22.00`
180+
$$
181+
\textbf{Input Cost Calculation:} \quad 20 \times \$1.10 = \$22.00
182+
$$
183+
184+
$$
185+
\textbf{Output Cost Calculation:} \quad 40 \times \$4.40 = \$176.00
186+
$$
187+
188+
$$
189+
\textbf{Your total operational cost for the month would be:}
190+
$$
166191

167-
**Output Cost Calculation**:
168-
`40 × $4.40 = $176.00`
192+
$$
193+
= \text{Hosting charges} + \text{Token usage cost}
194+
$$
169195

170-
**Your total operational cost for the month would be**:
171-
`= Hosting charges + Token usage cost`
172-
`= ($1.70 * 30 days * 24 hours) + ($22 (Input) + $176 (Output))`
173-
`= $1422.00`
196+
$$
197+
= (1.70 \times 30 \times 24) + (22 + 176) = 1422.00
198+
$$
199+
200+
$$
201+
\text{Total Monthly Cost} = \$1422.00
202+
$$

0 commit comments

Comments
 (0)