Update fine-tuning-considerations.md

ssalgadodev · web-flow · commit cb421a7edc9f · 2025-06-09T16:12:45.000-04:00
diff --git a/articles/ai-services/openai/concepts/fine-tuning-considerations.md b/articles/ai-services/openai/concepts/fine-tuning-considerations.md
@@ -55,15 +55,15 @@ Fine-tuning is suited for times when you have a small amount of data and want to
 
 ## Types of fine-tuning
 
-Azure offers multiple types of fine-tuning techniques:
+Azure AI Foundry offers multiple types of fine -tuning techniques:
 
-* **Supervised fine-tuning**: This allows you to provide custom data (prompt/completion or conversational chat, depending on the model) to teach the base model new skills. This process involves further training the model on a high-quality labelled dataset, where each data point is associated with the correct output or answer. The goal is to enhance the model's performance on a particular task by adjusting its parameters based on the labelled data.
+* **Supervised fine-tuning**: This allows you to provide custom data (prompt/completion or conversational chat, depending on the model) to teach the base model new skills. This process involves further training the model on a high-quality labelled dataset, where each data point is associated with the correct output or answer. The goal is to enhance the model's performance on a particular task by adjusting its parameters based on the labelled data. This technique works best when there are finite ways of solving a problemand you want to teach the model a particular task and improve its accuracy andconciseness.
 
-* **Reinforcement fine-tuning**: This is a model customization technique, particularly beneficial for optimizing model behaviour in highly complex or dynamic environments, enabling the model to learn and adapt through iterative feedback and decision-making. For example, financial services providers can optimize the model for faster, more accurate risk assessments or personalized investment advice. In healthcare and pharmaceuticals, o3-mini can be tailored to accelerate drug discovery, enabling more efficient data analysis, hypothesis generation, and identification of promising compounds.
+* **Reinforcement fine-tuning**: This is a model customization technique, particularly beneficial for optimizing model behaviour in highly complex or dynamic environments, enabling the model to learn and adapt through iterative feedback and decision-making. For example, financial services providers can optimize the model for faster, more accurate risk assessments or personalized investment advice. In healthcare and pharmaceuticals, o3-mini can be tailored to accelerate drug discovery, enabling more efficient data analysis, hypothesis generation, and identification of promising compounds. RFT is a great way to finetune when there are infinite or high number ofways to solve a problem. The grader rewards the model incrementally and makesreasoning better.
 
-* **Direct Preference Optimization (DPO)**: This is another new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike Reinforcement Learning from Human Feedback (RLHF), DPO does not require fitting a reward model and uses binary preferences for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important.
+* **Direct Preference Optimization (DPO)**: This is another new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike Reinforcement Learning from Human Feedback (RLHF), DPO does not require fitting a reward model and uses binary preferences for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. You will need to share thenon-preferred and preferred response to the training set and use the DPO technique.
 
-You can stack techniques: first using SFT to create a customized model – optimized for your use case – then using preference fine tuning to align the responses to your specific preferences. During the SFT step, you will focus on data quality and representativeness of the tasks, while the DPO step adjusts responses with specific comparisons. 
+You can also stack techniques: first using SFT to create a customized model – optimized for your use case – then using preference fine tuning to align the responses to your specific preferences. During the SFT step, you will focus on data quality and representativeness of the tasks, while the DPO step adjusts responses with specific comparisons. 
 
 ## Challenges and limitations of fine-tuning