Skip to content

Commit cb421a7

Browse files
authored
Update fine-tuning-considerations.md
1 parent 3e802be commit cb421a7

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/ai-services/openai/concepts/fine-tuning-considerations.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ Fine-tuning is suited for times when you have a small amount of data and want to
5555

5656
## Types of fine-tuning
5757

58-
Azure offers multiple types of fine-tuning techniques:
58+
Azure AI Foundry offers multiple types of fine -tuning techniques:
5959

60-
* **Supervised fine-tuning**: This allows you to provide custom data (prompt/completion or conversational chat, depending on the model) to teach the base model new skills. This process involves further training the model on a high-quality labelled dataset, where each data point is associated with the correct output or answer. The goal is to enhance the model's performance on a particular task by adjusting its parameters based on the labelled data.
60+
* **Supervised fine-tuning**: This allows you to provide custom data (prompt/completion or conversational chat, depending on the model) to teach the base model new skills. This process involves further training the model on a high-quality labelled dataset, where each data point is associated with the correct output or answer. The goal is to enhance the model's performance on a particular task by adjusting its parameters based on the labelled data. This technique works best when there are finite ways of solving a problemand you want to teach the model a particular task and improve its accuracy andconciseness.
6161

62-
* **Reinforcement fine-tuning**: This is a model customization technique, particularly beneficial for optimizing model behaviour in highly complex or dynamic environments, enabling the model to learn and adapt through iterative feedback and decision-making. For example, financial services providers can optimize the model for faster, more accurate risk assessments or personalized investment advice. In healthcare and pharmaceuticals, o3-mini can be tailored to accelerate drug discovery, enabling more efficient data analysis, hypothesis generation, and identification of promising compounds.
62+
* **Reinforcement fine-tuning**: This is a model customization technique, particularly beneficial for optimizing model behaviour in highly complex or dynamic environments, enabling the model to learn and adapt through iterative feedback and decision-making. For example, financial services providers can optimize the model for faster, more accurate risk assessments or personalized investment advice. In healthcare and pharmaceuticals, o3-mini can be tailored to accelerate drug discovery, enabling more efficient data analysis, hypothesis generation, and identification of promising compounds. RFT is a great way to finetune when there are infinite or high number ofways to solve a problem. The grader rewards the model incrementally and makesreasoning better.
6363

64-
* **Direct Preference Optimization (DPO)**: This is another new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike Reinforcement Learning from Human Feedback (RLHF), DPO does not require fitting a reward model and uses binary preferences for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important.
64+
* **Direct Preference Optimization (DPO)**: This is another new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike Reinforcement Learning from Human Feedback (RLHF), DPO does not require fitting a reward model and uses binary preferences for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. You will need to share thenon-preferred and preferred response to the training set and use the DPO technique.
6565

66-
You can stack techniques: first using SFT to create a customized model – optimized for your use case – then using preference fine tuning to align the responses to your specific preferences. During the SFT step, you will focus on data quality and representativeness of the tasks, while the DPO step adjusts responses with specific comparisons.
66+
You can also stack techniques: first using SFT to create a customized model – optimized for your use case – then using preference fine tuning to align the responses to your specific preferences. During the SFT step, you will focus on data quality and representativeness of the tasks, while the DPO step adjusts responses with specific comparisons.
6767

6868
## Challenges and limitations of fine-tuning
6969

0 commit comments

Comments
 (0)