Skip to content

Commit aae5ea8

Browse files
cline docs
1 parent ad42bb8 commit aae5ea8

File tree

3 files changed

+447
-1
lines changed

3 files changed

+447
-1
lines changed

docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -835,7 +835,8 @@
835835
"guides/use-cases/comparing-top10-lmsys-models-with-portkey",
836836
"guides/use-cases/track-costs-using-metadata",
837837
"guides/use-cases/deepseek-r1",
838-
"guides/use-cases/openai-computer-use"
838+
"guides/use-cases/openai-computer-use",
839+
"guides/use-cases/reinforcement-learning"
839840
]
840841
}
841842
]
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: "New cookbook"
3+
---
4+
5+
This guide is for developers and ML practitioners who already know their way around OpenAIʼs APIs, have a basic understanding of reinforcement fine-tuning (RFT), and wish to use their fine-tuned models for research or other appropriate uses. OpenAI’s services are not intended for the personalized treatment or diagnosis of any medical condition and are subject to our applicable terms.
6+
7+
Reinforcement fine-tuning (RFT) of reasoning models consists in running reinforcement learning on of top the models to improve their reasoning performance by exploring the solution space and reinforcing strategies that result in a higher reward. RFT helps the model make sharper decisions and interpret context more effectively.
8+
9+
In this guide, weʼll walk through how to apply RFT to the OpenAI o4-mini reasoning model, using a task from the life sciences research domain: predicting outcomes from doctor-patient transcripts and descriptions, which is a necessary assessment in many health research studies. We'll use a subset of the medical-o1-verifiable-problem dataset. You will learn key steps to take in order to succesfully run RFT jobs for your use-cases.
10+
11+
Here’s what we’ll cover:
12+
13+
1. Setup
14+
2. Gathering the dataset
15+
3. Benchmarking the base model
16+
4. Defining your grader
17+
5. Training
18+
6. Using your fine-tuned model
19+
20+
21+
1. Setup
22+
Even strong reasoning models can miss the mark when it comes to expert-level behavior-especially in domains like medicine, where nuance and exactness matter. Imagine a model trying to extract ICD-10 codes from a transcript: even if it understands the gist, it may not use the precise terminology expected by medical professionals.
23+
24+
Other great candidates for RFT include topics like ledger normalization or tiering fraud risk- settings in which you want precise, reliable, and repeatable reasoning. Checkout our RFT use-cases guide for great examples.
25+
26+
In our case, weʼll focus on teaching o4-mini to become better at predicting the outcomes of clinical conversations and descriptions. Specifically, we want to see if RFT can boost the accuracy of the prediction.
27+
28+
Along the way, weʼll talk about how to write effective graders, how they guide the modelʼs learning, and how to watch out for classic reward-hacking pitfalls.

0 commit comments

Comments
 (0)