update

mrbullwinkle · mrbullwinkle · commit 6a9ae925840c · 2024-12-17T16:08:10.000-05:00
diff --git a/articles/ai-services/openai/how-to/fine-tuning.md b/articles/ai-services/openai/how-to/fine-tuning.md
@@ -97,7 +97,7 @@ Images containing the following will be excluded from your dataset and not used
 
 Azure OpenAI fine-tuning supports prompt caching with select models. Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. To learn more about prompt caching, see [getting started with prompt caching](./prompt-caching.md).
 
-## Direct preference optimization (DPO)
+## Direct preference optimization (DPO) (preview)
 
 Direct preference optimization (DPO) is an alignment technique for large language models, used to adjust model weights based on human preferences. It differs from reinforcement learning from human feedback (RLHF) in that it does not require fitting a reward model and uses simpler binary data preferences for training. It is computationally lighter weight and faster than RLHF, while being equally effective at alignment.
 
diff --git a/articles/ai-services/openai/whats-new.md b/articles/ai-services/openai/whats-new.md
@@ -21,7 +21,7 @@ This article provides a summary of the latest releases and major documentation u
 
 ## December 2024
 
-### Preference fine-tuning (direct preference optimization)
+### Preference fine-tuning (preview)
 
 [Direct preference optimization (DPO)](./how-to/fine-tuning.md#direct-preference-optimization-dpo) is a new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike reinforcement learning from human feedback (RLHF), DPO does not require fitting a reward model and uses simpler data (binary preferences) for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important. We’re excited to announce the public preview of DPO in Azure OpenAI Service, starting with the `gpt-4o-2024-08-06` model.