Update fine-tuning-overview.md

ssalgadodev · web-flow · commit 90a9cd460944 · 2025-06-18T13:34:02.000-04:00
diff --git a/articles/ai-foundry/concepts/fine-tuning-overview.md b/articles/ai-foundry/concepts/fine-tuning-overview.md
@@ -19,64 +19,78 @@ author: sdgilley
 
 Fine-tuning customizes a pretrained AI model with additional training on a specific task or dataset to improve performance, add new skills, or enhance accuracy. The result is a new, optimized GenAI model based on the provided examples. This article walks you through use-cases for fine-tuning and how it helps you in your GenAI journey.
 
-[Azure AI Foundry](https://ai.azure.com/?cid=learnDocs) offers several models across model providers enabling you to get access to the latest and greatest in the market. [View this list for more details](#supported-models-for-fine-tuning). 
+If you're just getting started with fine-tuning, we recommend **GPT-4.1** for complex skills like language translation, domain adaptation, or advanced code generation. For more focused tasks (such as classification, sentiment analysis, or content moderation) or when distilling knowledge from a more sophisticated model, start with **GPT-4.1-mini** for faster iteration and lower costs.
 
 :::image type="content" source="../media/concepts/model-catalog-fine-tuning.png" alt-text="Screenshot of Azure AI Foundry model catalog and filtering by Fine-tuning tasks." lightbox="../media/concepts/model-catalog-fine-tuning.png":::
 
-## Getting started 
+## Serverless or Managed Compute?
+
+- **Serverless** lets you customize models using our capacity with consumption-based pricing starting at $1.70 per million input tokens. We optimize training for speed and scalability while handling all infrastructure management. This approach requires no GPU quotas and provides exclusive access to OpenAI models, though with fewer hyperparameter options than managed compute.
+- **Managed compute** offers a wider range of models and advanced customization through AzureML, but requires you to provide your own VMs for training and hosting. While this gives full control over resources, it demands high quotas that many customers lack, doesn't include OpenAI models, and can't leverage our multi-tenancy optimizations.
+
+For most customers, serverless provides the best balance of ease-of-use, cost efficiency, and access to premium models. This document focuses on serverless options.
 
 To find steps to fine-tuning a model in AI Foundry, see [Fine-tune Models in AI Foundry](../how-to/fine-tune-serverless.md) or [Fine-tune models using managed compute](how-to/fine-tune-managed-compute.md).
 
-### Data preparation
+## Training Techniques
 
-The fine-tuning process begins by selecting a pretrained model and preparing a relevant dataset tailored to the target task. This dataset should reflect the kind of inputs the model will see in deployment. 
+We offer three training techniques to optimize your models:
+- **Supervised Fine Tuning (SFT):** Foundational technique that trains your model on input-output pairs, teaching it to produce desired responses for specific inputs.
+  - *Best for:* Most use cases including classification, generation, and task-specific adaptation.
+  - *When to use:* Start here for most projects. SFT addresses the broadest number of fine-tuning scenarios and provides reliable results with clear input-output training data.
+  - *Supported Models:* GPT 4o, 4o-mini, 4.1, 4.1-mini, 4.1-nano; Llama 2 and Llama 3.1; Phi 4, Phi-4-mini-instruct; Mistral Nemo, Ministral-3B, Mistral Large (2411); NTT Tsuzumi-7b
 
-Follow this link to view and download [example datasets](https://github.com/Azure-Samples/AIFoundry-Customization-Datasets) to try out fine-tuning.
+- **Direct Preference Optimization (DPO):** Trains models to prefer certain types of responses over others by learning from comparative feedback, without requiring a separate reward model.
+  - *Best for:* Improving response quality, safety, and alignment with human preferences.
+  - *When to use:* When you have examples of preferred vs. non-preferred outputs, or when you need to optimize for subjective qualities like helpfulness, harmlessness, or style.
+  - *Supported Models:* GPT 4o, 4.1, 4.1-mini, 4.1-nano
+
+- **Reinforcement Fine Tuning (RFT):** Uses reinforcement learning to optimize models based on reward signals, allowing for more complex optimization objectives.
+  - *Best for:* Complex optimization scenarios where simple input-output pairs aren't sufficient.
+  - *When to use:* Advanced use cases requiring optimization for metrics like user engagement, task completion rates, or other measurable outcomes. Requires more ML expertise to implement effectively.
+  - *Supported Models:* o4-mini
 
-To fine-tune a model for chat or question answering, your training dataset should reflect the types of interactions the model will handle. Here are some key elements to include in your dataset: 
+> Most customers should start with SFT, as it addresses the broadest number of fine-tuning use cases.
 
-- **Prompts and responses**: Each entry should contain a prompt (e.g., a user question) and a corresponding response (e.g., the model’s reply). 
-- **Contextual information**: For multi-turn conversations, include previous exchanges to help the model understand context and maintain coherence. 
-- **Diverse examples**: Cover a range of topics and scenarios to improve generalization and robustness. 
-- **Human-generated responses**: Use responses written by humans to teach the model how to generate natural and accurate replies. 
-- **Formatting**: Use a clear structure to separate prompts and responses. For example, `\n\n###\n\n` and ensure the delimiter doesn't appear in the content. 
+Follow this link to view and download [example datasets](https://github.com/Azure-Samples/AIFoundry-Customization-Datasets) to try out fine-tuning.
 
-### Model selection
+## Training Modalities
 
-Selecting the right model for fine-tuning is a critical decision that impacts performance, efficiency, and cost. Before making a choice, it is essential to clearly define the task and establish the desired performance metrics. A well-defined task ensures that the selected model aligns with specific requirements, optimizing effort and resources. 
+- **Text-to-Text (All Models):** All our models support standard text-to-text fine-tuning for language-based tasks.
+- **Vision + Text (GPT 4o, 4.1):** Some models support vision fine-tuning, accepting both image and text inputs while producing text outputs. Use cases for vision fine-tuning include interpreting charts, graphs, and visual data; content moderation; visual quality assessment; document processing with mixed text and image; and product cataloging from photographs.
 
-## Use evaluations in fine-tuning
+## Model Comparison Table
 
-You should have clearly defined goals for what success with fine-tuning looks like. Ideally, these should go beyond qualitative measures and include quantitative metrics, such as using a holdout validation set, conducting user acceptance testing, or A/B tests comparing the fine-tuned model to the base model. 
+| Model                | Modalities     | Techniques   | Strengths                        |
+|----------------------|---------------|--------------|--------------------------------------|
+| GPT 4.1              | Text, Vision  | SFT, DPO     | Superior performance on sophisticated tasks, nuanced understanding |
+| GPT 4.1-mini         | Text          | SFT, DPO     | Fast iteration, cost-effective, good for simple tasks  |
+| GPT 4.1-nano         | Text          | SFT, DPO     | Extremely fast and cheap, minimal resource usage        |
+| o4-mini              | Text          | RFT          | Reasoning model suited for complex logical tasks        |
+| Phi 4                | Text          | SFT          | Cost effective option for simpler tasks                |
+| Ministral 3B         | Text          | SFT          | Low-cost option for faster iteration                   |
+| Mistral Nemo         | Text          | SFT          | Balance between size and capability                    |
+| Mistral Large (2411) | Text          | SFT          | Most capable Mistral model, better for complex tasks   |
 
-Model training can be guided by metrics. For example, BLEU-4 was used to evaluate training when fine-tuning a model to generate chest X-Ray reports, as seen in this paper. Additionally you can also monitor metrics while you train. If the loss curves are not converging as expected, you can pause the jobs, analyze and resume. 
+## Model selection
 
-:::image type="content" source="../media/concepts/hyperparameter-tuning.png" alt-text="Screenshot of the fine-tuning data hyperparameter tuning and metrics used to guide model training." lightbox="../media/concepts/hyperparameter-tuning.png":::
+1. **Define your use case:** Identify whether you need a highly capable general-purpose model (e.g. GPT 4.1), a smaller cost-effective model for a specific task (GPT 4.1-mini or nano), or a complex reasoning model (o4-mini).
+2. **Prepare your data:** Start with 50-100 high-quality examples for initial testing, scaling to 500+ examples for production models.
+3. **Choose your technique:** Begin with Supervised Fine Tuning (SFT) unless you have specific requirements for reasoning models / RFT.
+4. **Iterate and evaluate:** Fine-tuning is an iterative process—start with a baseline, measure performance, and refine your approach based on results.
 
-**Use intermediate checkpoints for better model selection**. Save checkpoints at regular intervals (e.g., every few epochs) and evaluate their performance. In some cases, an intermediate checkpoint may outperform the final model, allowing you to select the best version rather than relying solely on the last trained iteration. 
+For additional guidance on data preparation, evaluation strategies, and advanced techniques, see the main documentation page.
 
 ## Supported models for fine-tuning
 
 Now that you know when to use fine-tuning for your use case, you can go to Azure AI Foundry to find models available to fine-tune.
-For some models in the model catalog, fine-tuning is available by using a standard deployment, or a managed compute (preview), or both.
 
 Fine-tuning is available in specific Azure regions for some models that are deployed via standard deployments. To fine-tune such models, a user must have a hub/project in the region where the model is available for fine-tuning. See [Region availability for models in standard deployment](../how-to/deploy-models-serverless-availability.md) for detailed information.
 
 For more information on fine-tuning using a managed compute (preview), see [Fine-tune models using managed compute (preview)](../how-to/fine-tune-managed-compute.md).
 
 For details about Azure OpenAI in Azure AI Foundry Models that are available for fine-tuning, see the [Azure OpenAI in Foundry Models documentation.](../../ai-services/openai/concepts/models.md#fine-tuning-models)
 
-## Best practices for fine-tuning
-
-Here are some best practices that can help improve the efficiency and effectiveness of fine-tuning LLMs for various applications: 
-
-- **Start with a smaller model**: A common mistake is assuming that your application needs the newest, biggest, most expensive model. Especially for simpler tasks, start with smaller models and only try larger models if needed. 
-- **Select models based on domain needs**: Start with industry-standard models before considering fine-tuned versions for specific use cases. Use benchmark leaderboards to assess performance and test real-world scenarios in model playgrounds. Balance accuracy, cost, and efficiency to ensure the best fit for your deployment. 
-- **Collect a large, high-quality dataset**: LLMs are data-hungry and can benefit from having more diverse and representative data to fine-tune on. However, collecting and annotating large datasets can be costly and time-consuming. Therefore, you can also use synthetic data generation techniques to increase the size and variety of your dataset. However, you should also ensure that the synthetic data is relevant and consistent with your task and domain. Also ensure that it does not introduce noise or bias to the model. 
-- **Try fine-tuning subsets first**: To assess the value of getting more data, you can fine-tune models on subsets of your current dataset to see how performance scales with dataset size. This fine-tuning can help you estimate the learning curve of your model and decide whether adding more data is worth the effort and cost. You can also compare the performance of your model with the pre-trained model or a baseline. This comparison shows how much improvement you can achieve with fine-tuning. 
-- **Experiment with hyperparameters**: Iteratively adjust hyperparameters to optimize the model performance. Hyperparameters, such as the learning rate, the batch size and the number of epochs, can have significant effect on the model’s performance. Therefore, you should experiment with different values and combinations of hyperparameters to find the best ones for your task and dataset. 
-- **Try different data formats**: Depending on the task, different data formats can have different impacts on the model’s performance. For example, for a classification task, you can use a format that separates the prompt and the completion with a special token, such as {"prompt": "Paris##\n", "completion": " city\n###\n"}. Be sure to use formats suitable for your application. 
-
 ## Related content
 
 - [Fine-tune models using managed compute (preview)](../how-to/fine-tune-managed-compute.md)