Skip to content

Commit bcc13ec

Browse files
authored
Merge pull request #4980 from ssalgadodev/patch-369258
Update fine-tuning-overview.md
2 parents 01df958 + 0f38400 commit bcc13ec

File tree

3 files changed

+99
-23
lines changed

3 files changed

+99
-23
lines changed

articles/ai-foundry/concepts/fine-tuning-overview.md

Lines changed: 99 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Fine-tuning in Azure AI Foundry portal
2+
title: Fine-tune models with Azure AI Foundry
33
titleSuffix: Azure AI Foundry
44
description: This article explains what fine-tuning is and under what circumstances you should consider doing it.
55
manager: scottpolly
@@ -8,7 +8,7 @@ ms.custom:
88
- build-2024
99
- code01
1010
ms.topic: concept-article
11-
ms.date: 02/21/2025
11+
ms.date: 05/14/2025
1212
ms.reviewer: keli19
1313
ms.author: sgilley
1414
author: sdgilley
@@ -17,23 +17,20 @@ author: sdgilley
1717

1818
# Fine-tune models with Azure AI Foundry
1919

20-
Fine-tuning customizes a pretrained AI model with additional training on a specific task or dataset to improve performance, add new skills, or enhance accuracy. The result is a new, optimized GenAI model based on the provided examples.
20+
Fine-tuning customizes a pretrained AI model with additional training on a specific task or dataset to improve performance, add new skills, or enhance accuracy. The result is a new, optimized GenAI model based on the provided examples. This article walks you through use-cases for fine-tuning and how it helps you in your GenAI journey.
2121

22-
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
22+
Consider fine-tuning GenAI models to:
2323

24-
Consider fine-tuning GenAI models to:
2524
- Scale and adapt to specific enterprise needs
2625
- Reduce false positives as tailored models are less likely to produce inaccurate or irrelevant responses
2726
- Enhance the model's accuracy for domain-specific tasks
2827
- Save time and resources with faster and more precise results
2928
- Get more relevant and context-aware outcomes as models are fine-tuned for specific use cases
3029

31-
[Azure AI Foundry](https://ai.azure.com) offers several models across model providers enabling you to get access to the latest and greatest in the market. You can discover supported models for fine-tuning through our model catalog by using the **Fine-tuning tasks** filter and selecting the model card to learn detailed information about each model. Specific models might be subjected to regional constraints. [View this list for more details](#supported-models-for-fine-tuning).
30+
[Azure AI Foundry](https://ai.azure.com) offers several models across model providers enabling you to get access to the latest and greatest in the market. [View this list for more details](#supported-models-for-fine-tuning).
3231

3332
:::image type="content" source="../media/concepts/model-catalog-fine-tuning.png" alt-text="Screenshot of Azure AI Foundry model catalog and filtering by Fine-tuning tasks." lightbox="../media/concepts/model-catalog-fine-tuning.png":::
3433

35-
This article walks you through use-cases for fine-tuning and how it helps you in your GenAI journey.
36-
3734
## Getting started with fine-tuning
3835

3936
When starting out on your generative AI journey, we recommend you begin with prompt engineering and RAG to familiarize yourself with base models and its capabilities.
@@ -47,39 +44,107 @@ As you get comfortable and begin building your solution, it's important to under
4744
- Is it difficult to fit enough examples in the context window to steer the model?
4845
- Is there high latency?
4946

50-
Examples of failure with the base model and prompt engineering can help you identify the data to collect for fine-tuning and establish a performance baseline that you can evaluate and compare your fine-tuned model against. Having a baseline for performance without fine-tuning is essential for knowing whether or not fine-tuning improves model performance.
47+
Examples of failure with the base model and prompt engineering can help you identify the data to collect for fine-tuning and establish a performance baseline that you can evaluate and compare your fine-tuned model against. Having a baseline for performance without fine-tuning is essential for knowing whether or not fine-tuning improves model performance.
5148

5249
Here's an example:
5350

54-
_A customer wants to use GPT-3.5 Turbo to turn natural language questions into queries in a specific, nonstandard query language. The customer provides guidance in the prompt ("Always return GQL") and uses RAG to retrieve the database schema. However, the syntax isn't always correct and often fails for edge cases. The customer collects thousands of examples of natural language questions and the equivalent queries for the database, including cases where the model failed before. The customer then uses that data to fine-tune the model. Combining the newly fine-tuned model with the engineered prompt and retrieval brings the accuracy of the model outputs up to acceptable standards for use._
51+
_A customer wants to use GPT-4o-Mini to turn natural language questions into queries in a specific, nonstandard query language. The customer provides guidance in the prompt ("Always return GQL") and uses RAG to retrieve the database schema. However, the syntax isn't always correct and often fails for edge cases. The customer collects thousands of examples of natural language questions and the equivalent queries for the database, including cases where the model failed before. The customer then uses that data to fine-tune the model. Combining the newly fine-tuned model with the engineered prompt and retrieval brings the accuracy of the model outputs up to acceptable standards for use._
5552

56-
### Use cases
53+
## Use cases
5754

5855
Base models are already pretrained on vast amounts of data. Most times you add instructions and examples to the prompt to get the quality responses that you're looking for - this process is called "few-shot learning." Fine-tuning allows you to train a model with many more examples that you can tailor to meet your specific use-case, thus improving on few-shot learning. Fine-tuning can reduce the number of tokens in the prompt leading to potential cost savings and requests with lower latency.
5956

60-
Turning natural language into a query language is just one use case where you can "_show not tell_" the model how to behave. Here are some other use cases:
57+
Turning natural language into a query language is just one use case where you can "show not tell" the model how to behave. Here are some other use cases:
6158

6259
- Improve the model's handling of retrieved data
6360
- Steer model to output content in a specific style, tone, or format
6461
- Improve the accuracy when you look up information
6562
- Reduce the length of your prompt
6663
- Teach new skills (that is, natural language to code)
6764

68-
If you identify cost as your primary motivator, proceed with caution. Fine-tuning might reduce costs for certain use cases by shortening prompts or allowing you to use a smaller model. But there might be a higher upfront cost to training, and you have to pay for hosting your own custom model.
65+
If you identify cost as your primary motivator, proceed with caution. Fine-tuning might reduce costs for certain use cases by shortening prompts or allowing you to use a smaller model. But there might be a higher upfront cost to training, and you have to pay for hosting your own custom model.
66+
67+
## Steps to fine-tune a model
68+
69+
At a high level, finetuning requires you to:
70+
71+
- Prepare and upload training data,
72+
- Train a new fine-tuned model,
73+
- Evaluate your newly trained model,
74+
- Deploy that model for inferencing, and
75+
- Use the fine-tuned model in your application
76+
77+
It's important to call out that fine-tuning is heavily dependent on the quality of data that you can provide. It's best practice to provide hundreds, if not thousands, of training examples to be successful and get your desired results.
78+
79+
:::image type="content" source="../media/concepts/data-pipeline.png" alt-text="Screenshot of the fine-tuning data pipeline for adapting pre-trained models to a specific task." lightbox="../media/concepts/data-pipeline.png":::
80+
81+
## Data preparation
82+
83+
### What data are you going to use for fine-tuning?
84+
85+
The fine-tuning process begins by selecting a pretrained model and preparing a relevant dataset tailored to the target task. This dataset should reflect the kind of inputs the model will see in deployment.
86+
87+
Follow this link to view and download [example datasets](https://github.com/Azure-Samples/AIFoundry-Customization-Datasets) to try out fine-tuning.
88+
89+
For example, if the goal is to fine-tune a model for sentiment analysis, the dataset would include labeled text examples categorized by sentiment (positive, negative, neutral). The model is then retrained on this dataset, adjusting its parameters to better align with the new task. This retraining process usually requires fewer computational resources compared to training a model from scratch, as it builds upon the existing capabilities.
90+
91+
Even with a great use case, fine-tuning is only as good as the quality of the data that you're able to provide. Different models will require different data volumes, but you often need to provide fairly large quantities of high-quality curated data in the correct formats. You can use this samples repository to understand formatting conditions and data preparation.
92+
93+
To fine-tune a model for chat or question answering, your training dataset should reflect the types of interactions the model will handle. Here are some key elements to include in your dataset:
94+
95+
- **Prompts and responses**: Each entry should contain a prompt (e.g., a user question) and a corresponding response (e.g., the model’s reply).
96+
- **Contextual information**: For multi-turn conversations, include previous exchanges to help the model understand context and maintain coherence.
97+
- **Diverse examples**: Cover a range of topics and scenarios to improve generalization and robustness.
98+
- **Human-generated responses**: Use responses written by humans to teach the model how to generate natural and accurate replies.
99+
- **Formatting**: Use a clear structure to separate prompts and responses. For example, `\n\n###\n\n` and ensure the delimiter doesn't appear in the content.
100+
101+
### Best Practices for data preparation
102+
103+
The more training examples you have, the better. Fine tuning jobs will not proceed without at least 10 training examples, but such a small number isn't enough to noticeably influence model responses. It is best practice to provide hundreds, if not thousands, of training examples to be successful. 100 good-quality examples are better than 1000 poor examples.
69104

70-
### Steps to fine-tune a model
105+
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
71106

72-
Here are the general steps to fine-tune a model:
107+
### Best practices for data labelling
73108

74-
1. Choose a model that supports your task.
75-
1. Prepare and upload training data.
76-
1. (Optional) Prepare and upload validation data.
77-
1. (Optional) Configure task parameters.
78-
1. Train your model.
79-
1. Once completed, review metrics and evaluate model. If the results don't meet your benchmark, then go back to step 2.
80-
1. Use your fine-tuned model.
109+
Accurate and consistent labelling is crucial for training the model. Follow these best practices:
81110

82-
It's important to call out that fine-tuning is heavily dependent on the quality of data that you can provide. It's best practice to provide hundreds, if not thousands, of training examples to be successful and get your desired results.
111+
- **Ensure data diversity**: Include all the typical variations such as document formats (digital vs. scanned), layout differences, varying table sizes, and optional fields.
112+
- **Define fields clearly**: Use semantically meaningful field names (e.g., effective_date), especially for custom models, and follow consistent naming conventions like Pascal or camel case.
113+
- **Maintain label consistency**: Ensure uniform labelling across documents, particularly for repeated values.
114+
- **Split your data**: Separate training and validation sets to evaluate the model on unseen data and avoid overfitting.
115+
- **Label at scale**: Aim for at least 50 labelled documents per class, where applicable.
116+
- **Combine automation and review**: Use AI-generated labels to accelerate the process, focusing manual effort on complex or critical fields.
117+
118+
## Model selection
119+
120+
Selecting the right model for fine-tuning is a critical decision that impacts performance, efficiency, and cost. Before making a choice, it is essential to clearly define the task and establish the desired performance metrics. A well-defined task ensures that the selected model aligns with specific requirements, optimizing effort and resources.
121+
122+
### Best practices for model selection
123+
124+
- **Choose models based on domain specificity and use case**: Start by evaluating industry-standard models for general capabilities, then assess models fine-tuned for your specific use case. If your task requires deep domain expertise, selecting a model tailored to your industry can improve accuracy and efficiency while reducing the need for extensive fine-tuning.
125+
- **Assess model performance on leaderboards**: Review benchmark leaderboards to evaluate how pre-trained models perform on relevant tasks. Focus on key metrics such as accuracy, coherence, latency, and domain-specific benchmarks to identify a strong foundation for fine-tuning.
126+
- **Experiment with model playgrounds**: Utilize interactive testing environments to assess the base model’s performance on real-world use cases. By adjusting prompts, temperature, and other parameters, you can identify performance gaps before investing in fine-tuning.
127+
- **Weigh trade-offs between model size, complexity, cost, and performance**: Larger models may offer superior accuracy but come with higher computational costs and latency. Consider the balance between efficiency and precision based on your deployment needs.
128+
129+
## Training and evaluation
130+
131+
Fine-tuning isn't merely a matter of retraining on a new dataset; it also involves careful consideration of various hyperparameters and techniques to balance accuracy and generalization. A key risk is overfitting, where a model becomes too narrowly adapted to training data, reducing its effectiveness on unseen inputs. To mitigate overfitting and optimize performance, fine-tuning requires adjusting parameters such as learning rate, regularization, batch size, number of epochs, and seed settings.
132+
133+
## Use evaluations in fine-tuning
134+
135+
You should have clearly defined goals for what success with fine-tuning looks like. Ideally, these should go beyond qualitative measures and include quantitative metrics, such as using a holdout validation set, conducting user acceptance testing, or A/B tests comparing the fine-tuned model to the base model.
136+
137+
Model training can be guided by metrics. For example, BLEU-4 was used to evaluate training when fine-tuning a model to generate chest X-Ray reports, as seen in this paper. Additionally you can also monitor metrics while you train. If the loss curves are not converging as expected, you can pause the jobs, analyze and resume.
138+
139+
:::image type="content" source="../media/concepts/hyperparameter-tuning.png" alt-text="Screenshot of the fine-tuning data hyperparameter tuning and metrics used to guide model training." lightbox="../media/concepts/hyperparameter-tuning.png":::
140+
141+
**Use intermediate checkpoints for better model selection**. Save checkpoints at regular intervals (e.g., every few epochs) and evaluate their performance. In some cases, an intermediate checkpoint may outperform the final model, allowing you to select the best version rather than relying solely on the last trained iteration.
142+
143+
## Deployment and monitoring
144+
145+
- Choose a suitable deployment infrastructure, such as cloud-based platforms or on-premises servers.
146+
- Continuously monitor the model's performance and make necessary adjustments to ensure optimal performance.
147+
- Consider regional deployment needs and latency requirements to meet enterprise SLAs. Implement security guardrails, such as private links, encryption, and access controls, to protect sensitive data and maintain compliance with organizational policies.
83148

84149
## Supported models for fine-tuning
85150

@@ -98,6 +163,17 @@ For the Azure OpenAI models that you can fine tune, supported regions for fine-t
98163

99164
[!INCLUDE [Fine-tune models](../../ai-services/openai/includes/fine-tune-models.md)]
100165

166+
## Best practices for fine-tuning
167+
168+
Here are some best practices that can help improve the efficiency and effectiveness of fine-tuning LLMs for various applications:
169+
170+
- **Start with a smaller model**: A common mistake is assuming that your application needs the newest, biggest, most expensive model. Especially for simpler tasks, start with smaller models and only try larger models if needed.
171+
- **Select models based on domain needs**: Start with industry-standard models before considering fine-tuned versions for specific use cases. Use benchmark leaderboards to assess performance and test real-world scenarios in model playgrounds. Balance accuracy, cost, and efficiency to ensure the best fit for your deployment.
172+
- **Collect a large, high-quality dataset**: LLMs are data-hungry and can benefit from having more diverse and representative data to fine-tune on. However, collecting and annotating large datasets can be costly and time-consuming. Therefore, you can also use synthetic data generation techniques to increase the size and variety of your dataset. However, you should also ensure that the synthetic data is relevant and consistent with your task and domain. Also ensure that it does not introduce noise or bias to the model.
173+
- **Try fine-tuning subsets first**: To assess the value of getting more data, you can fine-tune models on subsets of your current dataset to see how performance scales with dataset size. This fine-tuning can help you estimate the learning curve of your model and decide whether adding more data is worth the effort and cost. You can also compare the performance of your model with the pre-trained model or a baseline. This comparison shows how much improvement you can achieve with fine-tuning.
174+
- **Experiment with hyperparameters**: Iteratively adjust hyperparameters to optimize the model performance. Hyperparameters, such as the learning rate, the batch size and the number of epochs, can have significant effect on the model’s performance. Therefore, you should experiment with different values and combinations of hyperparameters to find the best ones for your task and dataset.
175+
- **Try different data formats**: Depending on the task, different data formats can have different impacts on the model’s performance. For example, for a classification task, you can use a format that separates the prompt and the completion with a special token, such as {"prompt": "Paris##\n", "completion": " city\n###\n"}. Be sure to use formats suitable for your application.
176+
101177
## Related content
102178

103179
- [Fine-tune models using managed compute (preview)](../how-to/fine-tune-managed-compute.md)
307 KB
Loading
151 KB
Loading

0 commit comments

Comments
 (0)