cline docs

siddharthsambharia-portkey · siddharthsambharia-portkey · commit aae5ea8543cc · 2025-05-28T14:25:20.000+05:30
diff --git a/docs.json b/docs.json
@@ -835,7 +835,8 @@
               "guides/use-cases/comparing-top10-lmsys-models-with-portkey",
               "guides/use-cases/track-costs-using-metadata",
               "guides/use-cases/deepseek-r1",
-              "guides/use-cases/openai-computer-use"
+              "guides/use-cases/openai-computer-use",
+              "guides/use-cases/reinforcement-learning"
             ]
           }
         ]
diff --git a/guides/use-cases/reinforcement-learning.mdx b/guides/use-cases/reinforcement-learning.mdx
@@ -0,0 +1,28 @@
+---
+title: "New cookbook"
+---
+
+This guide is for developers and ML practitioners who already know their way around OpenAIʼs APIs, have a basic understanding of reinforcement fine-tuning (RFT), and wish to use their fine-tuned models for research or other appropriate uses. OpenAI’s services are not intended for the personalized treatment or diagnosis of any medical condition and are subject to our applicable terms.
+
+Reinforcement fine-tuning (RFT) of reasoning models consists in running reinforcement learning on of top the models to improve their reasoning performance by exploring the solution space and reinforcing strategies that result in a higher reward. RFT helps the model make sharper decisions and interpret context more effectively.
+
+In this guide, weʼll walk through how to apply RFT to the OpenAI o4-mini reasoning model, using a task from the life sciences research domain: predicting outcomes from doctor-patient transcripts and descriptions, which is a necessary assessment in many health research studies. We'll use a subset of the medical-o1-verifiable-problem dataset. You will learn key steps to take in order to succesfully run RFT jobs for your use-cases.
+
+Here’s what we’ll cover:
+
+1. Setup
+2. Gathering the dataset
+3. Benchmarking the base model
+4. Defining your grader
+5. Training
+6. Using your fine-tuned model
+
+
+1. Setup
+Even strong reasoning models can miss the mark when it comes to expert-level behavior-especially in domains like medicine, where nuance and exactness matter. Imagine a model trying to extract ICD-10 codes from a transcript: even if it understands the gist, it may not use the precise terminology expected by medical professionals.
+
+Other great candidates for RFT include topics like ledger normalization or tiering fraud risk- settings in which you want precise, reliable, and repeatable reasoning. Checkout our RFT use-cases guide for great examples.
+
+In our case, weʼll focus on teaching o4-mini to become better at predicting the outcomes of clinical conversations and descriptions. Specifically, we want to see if RFT can boost the accuracy of the prediction.
+
+Along the way, weʼll talk about how to write effective graders, how they guide the modelʼs learning, and how to watch out for classic reward-hacking pitfalls.
diff --git a/integrations/libraries/cline.mdx b/integrations/libraries/cline.mdx

Original file line number	Diff line number	Diff line change
`@@ -835,7 +835,8 @@`
`835`	`835`	`"guides/use-cases/comparing-top10-lmsys-models-with-portkey",`
`836`	`836`	`"guides/use-cases/track-costs-using-metadata",`
`837`	`837`	`"guides/use-cases/deepseek-r1",`
`838`		`- "guides/use-cases/openai-computer-use"`
	`838`	`+ "guides/use-cases/openai-computer-use",`
	`839`	`+ "guides/use-cases/reinforcement-learning"`
`839`	`840`	`]`
`840`	`841`	`}`
`841`	`842`	`]`