Skip to content

Commit 201f17b

Browse files
Merge pull request #6252 from mrbullwinkle/mrb_07_22_2025_fine_tuning
[Azure OpenAI] Fine-tuning updates
2 parents 03ec64b + d8cf48c commit 201f17b

File tree

5 files changed

+210
-0
lines changed

5 files changed

+210
-0
lines changed

articles/ai-foundry/openai/how-to/fine-tuning-direct-preference-optimization.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,16 @@ Users can use preference fine tuning with base models as well as models that hav
6565
4. Select hyperparameters, defaults are recommended for initial experimentation.
6666
5. Review the selections and create a fine tuning job.
6767

68+
## Direct preference optimization - REST API
69+
70+
```bash
71+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs'
72+
-H "api-key: $AZURE_OPENAI_API_KEY"
73+
-H 'Content-Type: application/json'
74+
-H 'task_type: chat'
75+
--data '{ "model": "gpt-4.1-mini-2025-04-14", "training_file": "file-d02c607351994d29987aece550ac81c0", "validation_file": "file-d02c607351994d29987aece550ac81c0", "prompt_loss_weight": 0.1, "suffix": "Pause_Resume", "method":{ "type":"dpo", "dpo":{ "beta":0.1, "l2_multiplier":0.1 }}}'
76+
77+
```
6878
6979
## Next steps
7080

articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,164 @@ You can deploy the fine tuning job which is completed or any intermittent checkp
379379

380380
When using your model, make sure to use the same instructions and structure as used during training. This keeps the model in distribution, and ensures that you see the same performance on your problems during inference as you achieved during training.
381381

382+
## REST API
383+
384+
### Create a RFT job
385+
386+
#### With Score model grader
387+
388+
```json
389+
{
390+
"model": "o4-mini-2025-04-16",
391+
"training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
392+
"validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
393+
"suffix": "TEST",
394+
"method": {
395+
"type": "reinforcement",
396+
"reinforcement": {
397+
"hyperparameters": {
398+
"eval_interval": 1,
399+
"eval_samples": 1,
400+
"compute_multiplier": 1,
401+
"reasoning_effort": "medium",
402+
"n_epochs": 1,
403+
"batch_size": 10,
404+
"learning_rate_multiplier": 1
405+
},
406+
"grader": {
407+
"type": "score_model",
408+
"name": "custom_grader",
409+
"input": [
410+
{
411+
"role": "developer",
412+
"content": "You are a mathematical evaluator. Given a reference target number, a list of input numbers, and a model\u0027s output (an arithmetic expression and its reported result), your task is to evaluate the correctness and closeness of the model\u0027s answer.\n\nInput values are passed in as **strings**, including the number list and target. You must:\n1. Convert the \u0060target\u0060 string to a number.\n2. Convert the \u0060numbers\u0060 string into a list of numbers.\n3. Parse and validate the \u0060output_expression\u0060 \u2014 ensure it is a valid arithmetic expression.\n4. Evaluate the expression and confirm it matches the model\u0027s reported \u0060output_result\u0060.\n5. Check that **all input numbers are used exactly once**.\n6. Compare the evaluated result with the target and assign a score.\n\nScoring Rules:\n- 5: Valid expression, correct number usage, exact match to target\n- 4: Off by \u00B11\n- 3: Off by \u00B12 to \u00B15\n- 2: Off by \u003E5\n- 1: Minor issues (e.g., small mismatch in numbers used)\n- 0: Major issues \u2014 invalid expression or number usage\n\nOutput Format:\nScore: \u003C0 - 5\u003E\nReasoning: \u003Cbrief justification\u003E\n\nOnly respond with the score and reasoning."
413+
},
414+
{
415+
"role": "user",
416+
"content": "{ \u0022target\u0022: {{item.target}}, \u0022numbers\u0022: {{item.nums}}, \u0022output\u0022: {{sample.output_text}} }"
417+
}
418+
],
419+
"pass_threshold": 5,
420+
"range": [
421+
0,
422+
5
423+
],
424+
"model": "o3-mini"
425+
},
426+
"response_format": {
427+
"type": "json_schema",
428+
"json_schema": {
429+
"name": "math_expression",
430+
"schema": {
431+
"type": "object",
432+
"required": [
433+
"expression",
434+
"result"
435+
],
436+
"properties": {
437+
"expression": {
438+
"type": "string",
439+
"description": "The mathematical expression to be evaluated."
440+
},
441+
"result": {
442+
"type": "string",
443+
"description": "The result of evaluating the mathematical expression."
444+
}
445+
},
446+
"additionalProperties": false
447+
},
448+
"strict": true
449+
}
450+
}
451+
}
452+
}
453+
}
454+
```
455+
456+
#### With String Check grader
457+
458+
```json
459+
{
460+
"model": "o4-mini-2025-04-16",
461+
"training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
462+
"validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
463+
"suffix": "TEST",
464+
"method": {
465+
"type": "reinforcement",
466+
"reinforcement": {
467+
"hyperparameters": {
468+
"eval_interval": 1,
469+
"eval_samples": 1,
470+
"compute_multiplier": 1,
471+
"reasoning_effort": "medium",
472+
"n_epochs": 1,
473+
"batch_size": 10,
474+
"learning_rate_multiplier": 1
475+
},
476+
"grader": {
477+
"name":"answer_string_check",
478+
"type":"string_check",
479+
"input":"{{item.reference_answer.final_answer}}",
480+
"operation":"eq",
481+
"reference":"{{sample.output_json.final_answer}}"
482+
}
483+
}
484+
}
485+
}
486+
```
487+
488+
#### Text similarity grader
489+
490+
```json
491+
{
492+
"model": "o4-mini-2025-04-16",
493+
"training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
494+
"validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
495+
"suffix": "TEST",
496+
"method": {
497+
"type": "reinforcement",
498+
"reinforcement": {
499+
"hyperparameters": {
500+
"eval_interval": 1,
501+
"eval_samples": 1,
502+
"compute_multiplier": 1,
503+
"reasoning_effort": "medium",
504+
"n_epochs": 1,
505+
"batch_size": 10,
506+
"learning_rate_multiplier": 1
507+
},
508+
"grader": {
509+
"name":"solution_similarity",
510+
"type":"text_similarity",
511+
"input":"{{sample.output_json.solution}}",
512+
"reference":"{{item.reference_answer.solution}}",
513+
"evaluation_metric":"bleu"
514+
}
515+
}
516+
}
517+
}
518+
```
519+
520+
**Examples:** [Reference Jupyter Notebook](https://github.com/azure-ai-foundry/build-2025-demos/tree/main/Azure%20AI%20Model%20Customization/MSBuildRFTDemo).
521+
522+
### Validate Grader
523+
524+
```bash
525+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/validate \
526+
-H "Content-Type: application/json" \
527+
-H "api-key: $AZURE_OPENAI_API_KEY" \
528+
-d '{ "grader": { "name":"answer_string_check", "type":"string_check", "input":" {{item.reference_answer.final_answer}}", "operation":"eq", "reference":" {{sample.output_json.final_answer}}" } }'
529+
```
530+
531+
### Run Grader
532+
533+
```bash
534+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/run \
535+
-H "Content-Type: application/json" \
536+
-H "api-key: $AZURE_OPENAI_API_KEY" \
537+
-d '{ "grader": { "name":"solution_similarity", "type":"string_check", "input": " {{item.reference_answer}}", "reference": " {{sample.output_text}}", "operation": "eq" }, "reference_answer": "yes", "model_sample": "yes" }'
538+
```
539+
382540
## Best practices
383541

384542
### Grader selection

articles/ai-foundry/openai/includes/fine-tuning-openai-in-ai-studio.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,17 @@ When each training epoch completes a checkpoint is generated. A checkpoint is a
177177

178178
:::image type="content" source="../media/fine-tuning/checkpoints.png" alt-text="Screenshot of checkpoints UI." lightbox="../media/fine-tuning/checkpoints.png":::
179179

180+
## Pause and resume
181+
182+
You can track progress in both fine-tuning views of the AI Foundry portal. You'll see your job go through the same statuses as normal fine tuning jobs (queued, running, succeeded).
183+
184+
You can also review the results files while training runs, to get a peak at the progress and whether your training is proceeding as expected.
185+
186+
> [!NOTE]
187+
> During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
188+
189+
:::image type="content" source="../media/how-to/reinforcement-fine-tuning/pause.png" alt-text="Screenshot of the reinforcement fine-tuning with a running job." lightbox="../media/how-to/reinforcement-fine-tuning/pause.png":::
190+
180191
## Analyze your fine-tuned model
181192

182193
After fine-tuning is successfully completed, you can download a result file named _results.csv_ from the fine-tuned model page under the **Details** tab. You can use the result file to analyze the training and validation performance of your custom model.

articles/ai-foundry/openai/includes/fine-tuning-rest.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,26 @@ curl -X GET $AZURE_OPENAI_ENDPOINT/openai/fine_tuning/jobs/<YOUR-JOB-ID>?api-ver
158158
-H "api-key: $AZURE_OPENAI_API_KEY"
159159
```
160160

161+
## Pause and resume
162+
163+
During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
164+
165+
### Pause
166+
167+
```bash
168+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs/{fine_tuning_job_id}/pause \
169+
-H "Content-Type: application/json" \
170+
-H "api-key: $AZURE_OPENAI_API_KEY"
171+
```
172+
173+
### Resume
174+
175+
```bash
176+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs/{fine_tuning_job_id}/resume \
177+
-H "Content-Type: application/json" \
178+
-H "api-key: $AZURE_OPENAI_API_KEY"
179+
```
180+
161181
### List fine-tuning events
162182

163183
To examine the individual fine-tuning events that were generated during training:

articles/ai-foundry/openai/includes/fine-tuning-studio.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,17 @@ Your job might be queued behind other jobs on the system. Training your model ca
215215

216216
When each training epoch completes a checkpoint is generated. A checkpoint is a fully functional version of a model which can both be deployed and used as the target model for subsequent fine-tuning jobs. Checkpoints can be particularly useful, as they may provide snapshots prior to overfitting. When a fine-tuning job completes you will have the three most recent versions of the model available to deploy.
217217

218+
## Pause and resume
219+
220+
You can track progress in both fine-tuning views of the AI Foundry portal. You'll see your job go through the same statuses as normal fine tuning jobs (queued, running, succeeded).
221+
222+
You can also review the results files while training runs, to get a peak at the progress and whether your training is proceeding as expected.
223+
224+
> [!NOTE]
225+
> During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
226+
227+
:::image type="content" source="../media/how-to/reinforcement-fine-tuning/pause.png" alt-text="Screenshot of the reinforcement fine-tuning with a running job." lightbox="../media/how-to/reinforcement-fine-tuning/pause.png":::
228+
218229
## Analyze your custom model
219230

220231
Azure OpenAI attaches a result file named _results.csv_ to each fine-tuning job after it completes. You can use the result file to analyze the training and validation performance of your custom model. The file ID for the result file is listed for each custom model in the **Result file Id** column on the **Models** pane for Azure AI Foundry portal. You can use the file ID to identify and download the result file from the **Data files** pane of Azure AI Foundry portal.

0 commit comments

Comments
 (0)