Skip to content

Commit d8cf48c

Browse files
committed
update
1 parent 19fd822 commit d8cf48c

File tree

2 files changed

+168
-0
lines changed

2 files changed

+168
-0
lines changed

articles/ai-foundry/openai/how-to/fine-tuning-direct-preference-optimization.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,16 @@ Users can use preference fine tuning with base models as well as models that hav
6565
4. Select hyperparameters, defaults are recommended for initial experimentation.
6666
5. Review the selections and create a fine tuning job.
6767

68+
## Direct preference optimization - REST API
69+
70+
```bash
71+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs'
72+
-H "api-key: $AZURE_OPENAI_API_KEY"
73+
-H 'Content-Type: application/json'
74+
-H 'task_type: chat'
75+
--data '{ "model": "gpt-4.1-mini-2025-04-14", "training_file": "file-d02c607351994d29987aece550ac81c0", "validation_file": "file-d02c607351994d29987aece550ac81c0", "prompt_loss_weight": 0.1, "suffix": "Pause_Resume", "method":{ "type":"dpo", "dpo":{ "beta":0.1, "l2_multiplier":0.1 }}}'
76+
77+
```
6878
6979
## Next steps
7080

articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,164 @@ You can deploy the fine tuning job which is completed or any intermittent checkp
379379

380380
When using your model, make sure to use the same instructions and structure as used during training. This keeps the model in distribution, and ensures that you see the same performance on your problems during inference as you achieved during training.
381381

382+
## REST API
383+
384+
### Create a RFT job
385+
386+
#### With Score model grader
387+
388+
```json
389+
{
390+
"model": "o4-mini-2025-04-16",
391+
"training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
392+
"validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
393+
"suffix": "TEST",
394+
"method": {
395+
"type": "reinforcement",
396+
"reinforcement": {
397+
"hyperparameters": {
398+
"eval_interval": 1,
399+
"eval_samples": 1,
400+
"compute_multiplier": 1,
401+
"reasoning_effort": "medium",
402+
"n_epochs": 1,
403+
"batch_size": 10,
404+
"learning_rate_multiplier": 1
405+
},
406+
"grader": {
407+
"type": "score_model",
408+
"name": "custom_grader",
409+
"input": [
410+
{
411+
"role": "developer",
412+
"content": "You are a mathematical evaluator. Given a reference target number, a list of input numbers, and a model\u0027s output (an arithmetic expression and its reported result), your task is to evaluate the correctness and closeness of the model\u0027s answer.\n\nInput values are passed in as **strings**, including the number list and target. You must:\n1. Convert the \u0060target\u0060 string to a number.\n2. Convert the \u0060numbers\u0060 string into a list of numbers.\n3. Parse and validate the \u0060output_expression\u0060 \u2014 ensure it is a valid arithmetic expression.\n4. Evaluate the expression and confirm it matches the model\u0027s reported \u0060output_result\u0060.\n5. Check that **all input numbers are used exactly once**.\n6. Compare the evaluated result with the target and assign a score.\n\nScoring Rules:\n- 5: Valid expression, correct number usage, exact match to target\n- 4: Off by \u00B11\n- 3: Off by \u00B12 to \u00B15\n- 2: Off by \u003E5\n- 1: Minor issues (e.g., small mismatch in numbers used)\n- 0: Major issues \u2014 invalid expression or number usage\n\nOutput Format:\nScore: \u003C0 - 5\u003E\nReasoning: \u003Cbrief justification\u003E\n\nOnly respond with the score and reasoning."
413+
},
414+
{
415+
"role": "user",
416+
"content": "{ \u0022target\u0022: {{item.target}}, \u0022numbers\u0022: {{item.nums}}, \u0022output\u0022: {{sample.output_text}} }"
417+
}
418+
],
419+
"pass_threshold": 5,
420+
"range": [
421+
0,
422+
5
423+
],
424+
"model": "o3-mini"
425+
},
426+
"response_format": {
427+
"type": "json_schema",
428+
"json_schema": {
429+
"name": "math_expression",
430+
"schema": {
431+
"type": "object",
432+
"required": [
433+
"expression",
434+
"result"
435+
],
436+
"properties": {
437+
"expression": {
438+
"type": "string",
439+
"description": "The mathematical expression to be evaluated."
440+
},
441+
"result": {
442+
"type": "string",
443+
"description": "The result of evaluating the mathematical expression."
444+
}
445+
},
446+
"additionalProperties": false
447+
},
448+
"strict": true
449+
}
450+
}
451+
}
452+
}
453+
}
454+
```
455+
456+
#### With String Check grader
457+
458+
```json
459+
{
460+
"model": "o4-mini-2025-04-16",
461+
"training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
462+
"validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
463+
"suffix": "TEST",
464+
"method": {
465+
"type": "reinforcement",
466+
"reinforcement": {
467+
"hyperparameters": {
468+
"eval_interval": 1,
469+
"eval_samples": 1,
470+
"compute_multiplier": 1,
471+
"reasoning_effort": "medium",
472+
"n_epochs": 1,
473+
"batch_size": 10,
474+
"learning_rate_multiplier": 1
475+
},
476+
"grader": {
477+
"name":"answer_string_check",
478+
"type":"string_check",
479+
"input":"{{item.reference_answer.final_answer}}",
480+
"operation":"eq",
481+
"reference":"{{sample.output_json.final_answer}}"
482+
}
483+
}
484+
}
485+
}
486+
```
487+
488+
#### Text similarity grader
489+
490+
```json
491+
{
492+
"model": "o4-mini-2025-04-16",
493+
"training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
494+
"validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
495+
"suffix": "TEST",
496+
"method": {
497+
"type": "reinforcement",
498+
"reinforcement": {
499+
"hyperparameters": {
500+
"eval_interval": 1,
501+
"eval_samples": 1,
502+
"compute_multiplier": 1,
503+
"reasoning_effort": "medium",
504+
"n_epochs": 1,
505+
"batch_size": 10,
506+
"learning_rate_multiplier": 1
507+
},
508+
"grader": {
509+
"name":"solution_similarity",
510+
"type":"text_similarity",
511+
"input":"{{sample.output_json.solution}}",
512+
"reference":"{{item.reference_answer.solution}}",
513+
"evaluation_metric":"bleu"
514+
}
515+
}
516+
}
517+
}
518+
```
519+
520+
**Examples:** [Reference Jupyter Notebook](https://github.com/azure-ai-foundry/build-2025-demos/tree/main/Azure%20AI%20Model%20Customization/MSBuildRFTDemo).
521+
522+
### Validate Grader
523+
524+
```bash
525+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/validate \
526+
-H "Content-Type: application/json" \
527+
-H "api-key: $AZURE_OPENAI_API_KEY" \
528+
-d '{ "grader": { "name":"answer_string_check", "type":"string_check", "input":" {{item.reference_answer.final_answer}}", "operation":"eq", "reference":" {{sample.output_json.final_answer}}" } }'
529+
```
530+
531+
### Run Grader
532+
533+
```bash
534+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/run \
535+
-H "Content-Type: application/json" \
536+
-H "api-key: $AZURE_OPENAI_API_KEY" \
537+
-d '{ "grader": { "name":"solution_similarity", "type":"string_check", "input": " {{item.reference_answer}}", "reference": " {{sample.output_text}}", "operation": "eq" }, "reference_answer": "yes", "model_sample": "yes" }'
538+
```
539+
382540
## Best practices
383541

384542
### Grader selection

0 commit comments

Comments
 (0)