Skip to content

Commit 6a9ae92

Browse files
committed
update
1 parent 5924b11 commit 6a9ae92

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

articles/ai-services/openai/how-to/fine-tuning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ Images containing the following will be excluded from your dataset and not used
9797

9898
Azure OpenAI fine-tuning supports prompt caching with select models. Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. To learn more about prompt caching, see [getting started with prompt caching](./prompt-caching.md).
9999

100-
## Direct preference optimization (DPO)
100+
## Direct preference optimization (DPO) (preview)
101101

102102
Direct preference optimization (DPO) is an alignment technique for large language models, used to adjust model weights based on human preferences. It differs from reinforcement learning from human feedback (RLHF) in that it does not require fitting a reward model and uses simpler binary data preferences for training. It is computationally lighter weight and faster than RLHF, while being equally effective at alignment.
103103

articles/ai-services/openai/whats-new.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This article provides a summary of the latest releases and major documentation u
2121

2222
## December 2024
2323

24-
### Preference fine-tuning (direct preference optimization)
24+
### Preference fine-tuning (preview)
2525

2626
[Direct preference optimization (DPO)](./how-to/fine-tuning.md#direct-preference-optimization-dpo) is a new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike reinforcement learning from human feedback (RLHF), DPO does not require fitting a reward model and uses simpler data (binary preferences) for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important. We’re excited to announce the public preview of DPO in Azure OpenAI Service, starting with the `gpt-4o-2024-08-06` model.
2727

0 commit comments

Comments
 (0)