You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/fine-tuning-overview.md
+5-87Lines changed: 5 additions & 87 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,75 +19,27 @@ author: sdgilley
19
19
20
20
Fine-tuning customizes a pretrained AI model with additional training on a specific task or dataset to improve performance, add new skills, or enhance accuracy. The result is a new, optimized GenAI model based on the provided examples. This article walks you through use-cases for fine-tuning and how it helps you in your GenAI journey.
21
21
22
-
Consider fine-tuning GenAI models to:
23
-
24
-
- Scale and adapt to specific enterprise needs
25
-
- Reduce false positives as tailored models are less likely to produce inaccurate or irrelevant responses
26
-
- Enhance the model's accuracy for domain-specific tasks
27
-
- Save time and resources with faster and more precise results
28
-
- Get more relevant and context-aware outcomes as models are fine-tuned for specific use cases
29
-
30
22
[Azure AI Foundry](https://ai.azure.com/?cid=learnDocs) offers several models across model providers enabling you to get access to the latest and greatest in the market. [View this list for more details](#supported-models-for-fine-tuning).
31
23
32
-
## Getting started with fine-tuning
33
-
34
-
When starting out on your generative AI journey, we recommend you begin with prompt engineering and RAG to familiarize yourself with base models and its capabilities.
35
-
-[Prompt engineering](../../ai-services/openai/concepts/prompt-engineering.md) is a technique that involves designing prompts using tone and style details, example responses, and intent mapping for natural language processing models. This process improves accuracy and relevancy in responses, to optimize the performance of the model.
36
-
-[Retrieval-augmented generation (RAG)](../concepts/retrieval-augmented-generation.md) improves LLM performance by retrieving data from external sources and incorporating it into a prompt. RAG can help businesses achieve customized solutions while maintaining data relevance and optimizing costs.
37
-
38
-
As you get comfortable and begin building your solution, it's important to understand where prompt engineering falls short and when you should try fine-tuning.
39
-
40
-
- Is the base model failing on edge cases or exceptions?
41
-
- Is the base model not consistently providing output in the right format?
42
-
- Is it difficult to fit enough examples in the context window to steer the model?
43
-
- Is there high latency?
44
-
45
-
Examples of failure with the base model and prompt engineering can help you identify the data to collect for fine-tuning and establish a performance baseline that you can evaluate and compare your fine-tuned model against. Having a baseline for performance without fine-tuning is essential for knowing whether or not fine-tuning improves model performance.
46
-
47
-
Here's an example:
24
+
:::image type="content" source="../media/concepts/model-catalog-fine-tuning.png" alt-text="Screenshot of Azure AI Foundry model catalog and filtering by Fine-tuning tasks." lightbox="../media/concepts/model-catalog-fine-tuning.png":::
48
25
49
-
_A customer wants to use GPT-4o-Mini to turn natural language questions into queries in a specific, nonstandard query language. The customer provides guidance in the prompt ("Always return GQL") and uses RAG to retrieve the database schema. However, the syntax isn't always correct and often fails for edge cases. The customer collects thousands of examples of natural language questions and the equivalent queries for the database, including cases where the model failed before. The customer then uses that data to fine-tune the model. Combining the newly fine-tuned model with the engineered prompt and retrieval brings the accuracy of the model outputs up to acceptable standards for use._
50
-
51
-
## Use cases
52
-
53
-
Base models are already pretrained on vast amounts of data. Most times you add instructions and examples to the prompt to get the quality responses that you're looking for - this process is called "few-shot learning." Fine-tuning allows you to train a model with many more examples that you can tailor to meet your specific use-case, thus improving on few-shot learning. Fine-tuning can reduce the number of tokens in the prompt leading to potential cost savings and requests with lower latency.
54
-
55
-
Turning natural language into a query language is just one use case where you can "show not tell" the model how to behave. Here are some other use cases:
26
+
## Getting started with fine-tuning
56
27
57
-
- Improve the model's handling of retrieved data
58
-
- Steer model to output content in a specific style, tone, or format
59
-
- Improve the accuracy when you look up information
60
-
- Reduce the length of your prompt
61
-
- Teach new skills (that is, natural language to code)
28
+
To find steps
62
29
63
-
If you identify cost as your primary motivator, proceed with caution. Fine-tuning might reduce costs for certain use cases by shortening prompts or allowing you to use a smaller model. But there might be a higher upfront cost to training, and you have to pay for hosting your own custom model.
64
30
65
31
## Steps to fine-tune a model
66
32
67
-
At a high level, finetuning requires you to:
68
-
69
-
- Prepare and upload training data,
70
-
- Train a new fine-tuned model,
71
-
- Evaluate your newly trained model,
72
-
- Deploy that model for inferencing, and
73
-
- Use the fine-tuned model in your application
74
-
75
-
It's important to call out that fine-tuning is heavily dependent on the quality of data that you can provide. It's best practice to provide hundreds, if not thousands, of training examples to be successful and get your desired results.
33
+
For steps on fine-tuning a model within AI Foundry, see foundry models for serverless API Steps and foundry models for managed compute steps.
76
34
77
35
:::image type="content" source="../media/concepts/data-pipeline.png" alt-text="Screenshot of the fine-tuning data pipeline for adapting pre-trained models to a specific task." lightbox="../media/concepts/data-pipeline.png":::
78
36
79
37
## Data preparation
80
38
81
-
### What data are you going to use for fine-tuning?
82
-
83
39
The fine-tuning process begins by selecting a pretrained model and preparing a relevant dataset tailored to the target task. This dataset should reflect the kind of inputs the model will see in deployment.
84
40
85
41
Follow this link to view and download [example datasets](https://github.com/Azure-Samples/AIFoundry-Customization-Datasets) to try out fine-tuning.
86
42
87
-
For example, if the goal is to fine-tune a model for sentiment analysis, the dataset would include labeled text examples categorized by sentiment (positive, negative, neutral). The model is then retrained on this dataset, adjusting its parameters to better align with the new task. This retraining process usually requires fewer computational resources compared to training a model from scratch, as it builds upon the existing capabilities.
88
-
89
-
Even with a great use case, fine-tuning is only as good as the quality of the data that you're able to provide. Different models will require different data volumes, but you often need to provide fairly large quantities of high-quality curated data in the correct formats. You can use this samples repository to understand formatting conditions and data preparation.
90
-
91
43
To fine-tune a model for chat or question answering, your training dataset should reflect the types of interactions the model will handle. Here are some key elements to include in your dataset:
92
44
93
45
-**Prompts and responses**: Each entry should contain a prompt (e.g., a user question) and a corresponding response (e.g., the model’s reply).
@@ -96,38 +48,10 @@ To fine-tune a model for chat or question answering, your training dataset shoul
96
48
-**Human-generated responses**: Use responses written by humans to teach the model how to generate natural and accurate replies.
97
49
-**Formatting**: Use a clear structure to separate prompts and responses. For example, `\n\n###\n\n` and ensure the delimiter doesn't appear in the content.
98
50
99
-
### Best Practices for data preparation
100
-
101
-
The more training examples you have, the better. Fine tuning jobs will not proceed without at least 10 training examples, but such a small number isn't enough to noticeably influence model responses. It is best practice to provide hundreds, if not thousands, of training examples to be successful. 100 good-quality examples are better than 1000 poor examples.
102
-
103
-
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
104
-
105
-
### Best practices for data labelling
106
-
107
-
Accurate and consistent labelling is crucial for training the model. Follow these best practices:
108
-
109
-
-**Ensure data diversity**: Include all the typical variations such as document formats (digital vs. scanned), layout differences, varying table sizes, and optional fields.
110
-
-**Define fields clearly**: Use semantically meaningful field names (e.g., effective_date), especially for custom models, and follow consistent naming conventions like Pascal or camel case.
111
-
-**Maintain label consistency**: Ensure uniform labelling across documents, particularly for repeated values.
112
-
-**Split your data**: Separate training and validation sets to evaluate the model on unseen data and avoid overfitting.
113
-
-**Label at scale**: Aim for at least 50 labelled documents per class, where applicable.
114
-
-**Combine automation and review**: Use AI-generated labels to accelerate the process, focusing manual effort on complex or critical fields.
115
-
116
51
## Model selection
117
52
118
53
Selecting the right model for fine-tuning is a critical decision that impacts performance, efficiency, and cost. Before making a choice, it is essential to clearly define the task and establish the desired performance metrics. A well-defined task ensures that the selected model aligns with specific requirements, optimizing effort and resources.
119
54
120
-
### Best practices for model selection
121
-
122
-
-**Choose models based on domain specificity and use case**: Start by evaluating industry-standard models for general capabilities, then assess models fine-tuned for your specific use case. If your task requires deep domain expertise, selecting a model tailored to your industry can improve accuracy and efficiency while reducing the need for extensive fine-tuning.
123
-
-**Assess model performance on leaderboards**: Review benchmark leaderboards to evaluate how pre-trained models perform on relevant tasks. Focus on key metrics such as accuracy, coherence, latency, and domain-specific benchmarks to identify a strong foundation for fine-tuning.
124
-
-**Experiment with model playgrounds**: Utilize interactive testing environments to assess the base model’s performance on real-world use cases. By adjusting prompts, temperature, and other parameters, you can identify performance gaps before investing in fine-tuning.
125
-
-**Weigh trade-offs between model size, complexity, cost, and performance**: Larger models may offer superior accuracy but come with higher computational costs and latency. Consider the balance between efficiency and precision based on your deployment needs.
126
-
127
-
## Training and evaluation
128
-
129
-
Fine-tuning isn't merely a matter of retraining on a new dataset; it also involves careful consideration of various hyperparameters and techniques to balance accuracy and generalization. A key risk is overfitting, where a model becomes too narrowly adapted to training data, reducing its effectiveness on unseen inputs. To mitigate overfitting and optimize performance, fine-tuning requires adjusting parameters such as learning rate, regularization, batch size, number of epochs, and seed settings.
130
-
131
55
## Use evaluations in fine-tuning
132
56
133
57
You should have clearly defined goals for what success with fine-tuning looks like. Ideally, these should go beyond qualitative measures and include quantitative metrics, such as using a holdout validation set, conducting user acceptance testing, or A/B tests comparing the fine-tuned model to the base model.
@@ -153,13 +77,8 @@ Fine-tuning is available in specific Azure regions for some models that are depl
153
77
154
78
For more information on fine-tuning using a managed compute (preview), see [Fine-tune models using managed compute (preview)](../how-to/fine-tune-managed-compute.md).
155
79
156
-
For details about Azure OpenAI in Azure AI Foundry Models that are available for fine-tuning, see the [Azure OpenAI in Foundry Models documentation](../../ai-services/openai/concepts/models.md#fine-tuning-models) or the [Azure OpenAI models table](#fine-tuning-azure-openai-models) later in this guide.
80
+
For details about Azure OpenAI in Azure AI Foundry Models that are available for fine-tuning, see the [Azure OpenAI in Foundry Models documentation.](../../ai-services/openai/concepts/models.md#fine-tuning-models)
157
81
158
-
For the Azure OpenAI models that you can fine tune, supported regions for fine-tuning include North Central US, Sweden Central, and more.
0 commit comments