Skip to content

Commit 238b809

Browse files
Merge pull request #5108 from mrbullwinkle/patch-47
[Azure OpenAI] Update reinforcement-fine-tuning (Preview)
2 parents e1fb3fd + 6a5f0bb commit 238b809

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/ai-services/openai/how-to/reinforcement-fine-tuning.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Customize o4-mini model with Azure OpenAI and reinforcement fine-tuning'
2+
title: 'Customize o4-mini model with Azure OpenAI and reinforcement fine-tuning (Preview)'
33
description: Learn how to use reinforcement fine-tuning with Azure OpenAI
44
manager: nitinme
55
ms.service: azure-ai-openai
@@ -10,7 +10,7 @@ author: mrbullwinkle
1010
ms.author: mbullwin
1111
---
1212

13-
# Reinforcement fine-tuning (RFT) with Azure OpenAI o4-mini
13+
# Reinforcement fine-tuning (RFT) with Azure OpenAI o4-mini (Preview)
1414

1515
Reinforcement fine-tuning (RFT) is a technique for improving reasoning models like o4-mini by training them through a reward-based process, rather than relying only on labeled data. By using feedback or "rewards" to guide learning, RFT helps models develop better reasoning and problem-solving skills, especially in cases where labeled examples are limited or complex behaviors are desired.
1616

@@ -404,4 +404,4 @@ We also provide a grader check API that you can use to check the validity of you
404404

405405
Aim for a few hundred examples initially and consider scaling up to around 1,000 examples if necessary. The dataset should be balanced, in terms of classes predicted, to avoid bias and ensure generalization.
406406

407-
For the prompts, make sure to provide clear and detailed instructions, including specifying the response format and any constraints on the outputs (e.g. minimum length for explanations, only respond with true/false etc.)
407+
For the prompts, make sure to provide clear and detailed instructions, including specifying the response format and any constraints on the outputs (e.g. minimum length for explanations, only respond with true/false etc.)

0 commit comments

Comments
 (0)