Merge pull request #6252 from mrbullwinkle/mrb_07_22_2025_fine_tuning

prmerger-automator[bot] · web-flow · commit 201f17ba473d · 2025-08-04T12:59:14.000Z
[Azure OpenAI] Fine-tuning updates
diff --git a/articles/ai-foundry/openai/how-to/fine-tuning-direct-preference-optimization.md b/articles/ai-foundry/openai/how-to/fine-tuning-direct-preference-optimization.md
@@ -65,6 +65,16 @@ Users can use preference fine tuning with base models as well as models that hav
 4. Select hyperparameters, defaults are recommended for initial experimentation.
 5. Review the selections and create a fine tuning job.
 
+## Direct preference optimization - REST API
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs'
+-H "api-key: $AZURE_OPENAI_API_KEY" 
+-H 'Content-Type: application/json' 
+-H 'task_type: chat' 
+--data '{ "model": "gpt-4.1-mini-2025-04-14", "training_file": "file-d02c607351994d29987aece550ac81c0", "validation_file": "file-d02c607351994d29987aece550ac81c0", "prompt_loss_weight": 0.1, "suffix": "Pause_Resume", "method":{ "type":"dpo", "dpo":{ "beta":0.1, "l2_multiplier":0.1 }}}'
+
+```
 
 ## Next steps
 
diff --git a/articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md b/articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md
@@ -379,6 +379,164 @@ You can deploy the fine tuning job which is completed or any intermittent checkp
 
 When using your model, make sure to use the same instructions and structure as used during training. This keeps the model in distribution, and ensures that you see the same performance on your problems during inference as you achieved during training.
 
+## REST API
+
+### Create a RFT job
+
+#### With Score model grader
+
+```json
+{
+    "model": "o4-mini-2025-04-16",
+    "training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
+    "validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
+    "suffix": "TEST",
+    "method": {
+        "type": "reinforcement",
+        "reinforcement": {
+            "hyperparameters": {
+                "eval_interval": 1,
+                "eval_samples": 1,
+                "compute_multiplier": 1,
+                "reasoning_effort": "medium",
+                "n_epochs": 1,
+                "batch_size": 10,
+                "learning_rate_multiplier": 1
+            },
+            "grader": {
+                "type": "score_model",
+                "name": "custom_grader",
+                "input": [
+                    {
+                        "role": "developer",
+                        "content": "You are a mathematical evaluator. Given a reference target number, a list of input numbers, and a model\u0027s output (an arithmetic expression and its reported result), your task is to evaluate the correctness and closeness of the model\u0027s answer.\n\nInput values are passed in as **strings**, including the number list and target. You must:\n1. Convert the \u0060target\u0060 string to a number.\n2. Convert the \u0060numbers\u0060 string into a list of numbers.\n3. Parse and validate the \u0060output_expression\u0060 \u2014 ensure it is a valid arithmetic expression.\n4. Evaluate the expression and confirm it matches the model\u0027s reported \u0060output_result\u0060.\n5. Check that **all input numbers are used exactly once**.\n6. Compare the evaluated result with the target and assign a score.\n\nScoring Rules:\n- 5: Valid expression, correct number usage, exact match to target\n- 4: Off by \u00B11\n- 3: Off by \u00B12 to \u00B15\n- 2: Off by \u003E5\n- 1: Minor issues (e.g., small mismatch in numbers used)\n- 0: Major issues \u2014 invalid expression or number usage\n\nOutput Format:\nScore: \u003C0 - 5\u003E\nReasoning: \u003Cbrief justification\u003E\n\nOnly respond with the score and reasoning."
+                    },
+                    {
+                        "role": "user",
+                        "content": "{ \u0022target\u0022: {{item.target}}, \u0022numbers\u0022: {{item.nums}}, \u0022output\u0022: {{sample.output_text}} }"
+                    }
+                ],
+                "pass_threshold": 5,
+                "range": [
+                    0,
+                    5
+                ],
+                "model": "o3-mini"
+            },
+            "response_format": {
+                "type": "json_schema",
+                "json_schema": {
+                    "name": "math_expression",
+                    "schema": {
+                        "type": "object",
+                        "required": [
+                            "expression",
+                            "result"
+                        ],
+                        "properties": {
+                            "expression": {
+                                "type": "string",
+                                "description": "The mathematical expression to be evaluated."
+                            },
+                            "result": {
+                                "type": "string",
+                                "description": "The result of evaluating the mathematical expression."
+                            }
+                        },
+                        "additionalProperties": false
+                    },
+                    "strict": true
+                }
+            }
+        }
+    }
+}
+```
+
+#### With String Check grader
+
+```json
+{
+    "model": "o4-mini-2025-04-16",
+    "training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
+    "validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
+    "suffix": "TEST",
+    "method": {
+        "type": "reinforcement",
+        "reinforcement": {
+            "hyperparameters": {
+                "eval_interval": 1,
+                "eval_samples": 1,
+                "compute_multiplier": 1,
+                "reasoning_effort": "medium",
+                "n_epochs": 1,
+                "batch_size": 10,
+                "learning_rate_multiplier": 1
+            },
+            "grader": {
+                "name":"answer_string_check",
+                "type":"string_check",
+                "input":"{{item.reference_answer.final_answer}}",
+                "operation":"eq",
+                "reference":"{{sample.output_json.final_answer}}"
+            }
+        }
+    }
+}
+```
+
+#### Text similarity grader
+
+```json
+{
+    "model": "o4-mini-2025-04-16",
+    "training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
+    "validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
+    "suffix": "TEST",
+    "method": {
+        "type": "reinforcement",
+        "reinforcement": {
+            "hyperparameters": {
+                "eval_interval": 1,
+                "eval_samples": 1,
+                "compute_multiplier": 1,
+                "reasoning_effort": "medium",
+                "n_epochs": 1,
+                "batch_size": 10,
+                "learning_rate_multiplier": 1
+            },
+            "grader": {
+              "name":"solution_similarity",
+              "type":"text_similarity",
+              "input":"{{sample.output_json.solution}}",
+              "reference":"{{item.reference_answer.solution}}",
+              "evaluation_metric":"bleu"
+            }
+        }
+    }
+}
+```
+
+**Examples:** [Reference Jupyter Notebook](https://github.com/azure-ai-foundry/build-2025-demos/tree/main/Azure%20AI%20Model%20Customization/MSBuildRFTDemo).
+
+### Validate Grader
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/validate \
+  -H "Content-Type: application/json" \
+  -H "api-key: $AZURE_OPENAI_API_KEY" \
+  -d '{ "grader": { "name":"answer_string_check", "type":"string_check", "input":" {{item.reference_answer.final_answer}}", "operation":"eq", "reference":" {{sample.output_json.final_answer}}" } }' 
+```
+
+### Run Grader
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/run \
+  -H "Content-Type: application/json" \
+  -H "api-key: $AZURE_OPENAI_API_KEY" \
+  -d '{ "grader": { "name":"solution_similarity", "type":"string_check", "input": " {{item.reference_answer}}", "reference": " {{sample.output_text}}", "operation": "eq" }, "reference_answer": "yes", "model_sample": "yes" }'
+```
+
 ## Best practices
 
 ### Grader selection
diff --git a/articles/ai-foundry/openai/includes/fine-tuning-openai-in-ai-studio.md b/articles/ai-foundry/openai/includes/fine-tuning-openai-in-ai-studio.md
@@ -177,6 +177,17 @@ When each training epoch completes a checkpoint is generated. A checkpoint is a
 
 :::image type="content" source="../media/fine-tuning/checkpoints.png" alt-text="Screenshot of checkpoints UI." lightbox="../media/fine-tuning/checkpoints.png":::
 
+## Pause and resume
+
+You can track progress in both fine-tuning views of the AI Foundry portal. You'll see your job go through the same statuses as normal fine tuning jobs (queued, running, succeeded).
+
+You can also review the results files while training runs, to get a peak at the progress and whether your training is proceeding as expected.
+
+> [!NOTE]
+> During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
+
+:::image type="content" source="../media/how-to/reinforcement-fine-tuning/pause.png" alt-text="Screenshot of the reinforcement fine-tuning with a running job." lightbox="../media/how-to/reinforcement-fine-tuning/pause.png":::
+
 ## Analyze your fine-tuned model
 
 After fine-tuning is successfully completed, you can download a result file named _results.csv_ from the fine-tuned model page under the **Details** tab. You can use the result file to analyze the training and validation performance of your custom model. 
diff --git a/articles/ai-foundry/openai/includes/fine-tuning-rest.md b/articles/ai-foundry/openai/includes/fine-tuning-rest.md
@@ -158,6 +158,26 @@ curl -X GET $AZURE_OPENAI_ENDPOINT/openai/fine_tuning/jobs/<YOUR-JOB-ID>?api-ver
   -H "api-key: $AZURE_OPENAI_API_KEY"
 ```
 
+## Pause and resume
+
+During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
+
+### Pause
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs/{fine_tuning_job_id}/pause \
+  -H "Content-Type: application/json" \
+  -H "api-key: $AZURE_OPENAI_API_KEY" 
+```
+
+### Resume
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs/{fine_tuning_job_id}/resume \
+  -H "Content-Type: application/json" \
+  -H "api-key: $AZURE_OPENAI_API_KEY" 
+```
+
 ### List fine-tuning events
 
 To examine the individual fine-tuning events that were generated during training:
diff --git a/articles/ai-foundry/openai/includes/fine-tuning-studio.md b/articles/ai-foundry/openai/includes/fine-tuning-studio.md
@@ -215,6 +215,17 @@ Your job might be queued behind other jobs on the system. Training your model ca
 
 When each training epoch completes a checkpoint is generated. A checkpoint is a fully functional version of a model which can both be deployed and used as the target model for subsequent fine-tuning jobs. Checkpoints can be particularly useful, as they may provide snapshots prior to overfitting. When a fine-tuning job completes you will have the three most recent versions of the model available to deploy.
 
+## Pause and resume
+
+You can track progress in both fine-tuning views of the AI Foundry portal. You'll see your job go through the same statuses as normal fine tuning jobs (queued, running, succeeded).
+
+You can also review the results files while training runs, to get a peak at the progress and whether your training is proceeding as expected.
+
+> [!NOTE]
+> During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
+
+:::image type="content" source="../media/how-to/reinforcement-fine-tuning/pause.png" alt-text="Screenshot of the reinforcement fine-tuning with a running job." lightbox="../media/how-to/reinforcement-fine-tuning/pause.png":::
+
 ## Analyze your custom model
 
 Azure OpenAI attaches a result file named _results.csv_ to each fine-tuning job after it completes. You can use the result file to analyze the training and validation performance of your custom model. The file ID for the result file is listed for each custom model in the **Result file Id** column on the **Models** pane for Azure AI Foundry portal. You can use the file ID to identify and download the result file from the **Data files** pane of Azure AI Foundry portal.