update

mrbullwinkle · mrbullwinkle · commit d8cf48cc5836 · 2025-07-28T14:09:57.000-04:00
diff --git a/articles/ai-foundry/openai/how-to/fine-tuning-direct-preference-optimization.md b/articles/ai-foundry/openai/how-to/fine-tuning-direct-preference-optimization.md
@@ -65,6 +65,16 @@ Users can use preference fine tuning with base models as well as models that hav
 4. Select hyperparameters, defaults are recommended for initial experimentation.
 5. Review the selections and create a fine tuning job.
 
+## Direct preference optimization - REST API
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs'
+-H "api-key: $AZURE_OPENAI_API_KEY" 
+-H 'Content-Type: application/json' 
+-H 'task_type: chat' 
+--data '{ "model": "gpt-4.1-mini-2025-04-14", "training_file": "file-d02c607351994d29987aece550ac81c0", "validation_file": "file-d02c607351994d29987aece550ac81c0", "prompt_loss_weight": 0.1, "suffix": "Pause_Resume", "method":{ "type":"dpo", "dpo":{ "beta":0.1, "l2_multiplier":0.1 }}}'
+
+```
 
 ## Next steps
 
diff --git a/articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md b/articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md
@@ -379,6 +379,164 @@ You can deploy the fine tuning job which is completed or any intermittent checkp
 
 When using your model, make sure to use the same instructions and structure as used during training. This keeps the model in distribution, and ensures that you see the same performance on your problems during inference as you achieved during training.
 
+## REST API
+
+### Create a RFT job
+
+#### With Score model grader
+
+```json
+{
+    "model": "o4-mini-2025-04-16",
+    "training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
+    "validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
+    "suffix": "TEST",
+    "method": {
+        "type": "reinforcement",
+        "reinforcement": {
+            "hyperparameters": {
+                "eval_interval": 1,
+                "eval_samples": 1,
+                "compute_multiplier": 1,
+                "reasoning_effort": "medium",
+                "n_epochs": 1,
+                "batch_size": 10,
+                "learning_rate_multiplier": 1
+            },
+            "grader": {
+                "type": "score_model",
+                "name": "custom_grader",
+                "input": [
+                    {
+                        "role": "developer",
+                        "content": "You are a mathematical evaluator. Given a reference target number, a list of input numbers, and a model\u0027s output (an arithmetic expression and its reported result), your task is to evaluate the correctness and closeness of the model\u0027s answer.\n\nInput values are passed in as **strings**, including the number list and target. You must:\n1. Convert the \u0060target\u0060 string to a number.\n2. Convert the \u0060numbers\u0060 string into a list of numbers.\n3. Parse and validate the \u0060output_expression\u0060 \u2014 ensure it is a valid arithmetic expression.\n4. Evaluate the expression and confirm it matches the model\u0027s reported \u0060output_result\u0060.\n5. Check that **all input numbers are used exactly once**.\n6. Compare the evaluated result with the target and assign a score.\n\nScoring Rules:\n- 5: Valid expression, correct number usage, exact match to target\n- 4: Off by \u00B11\n- 3: Off by \u00B12 to \u00B15\n- 2: Off by \u003E5\n- 1: Minor issues (e.g., small mismatch in numbers used)\n- 0: Major issues \u2014 invalid expression or number usage\n\nOutput Format:\nScore: \u003C0 - 5\u003E\nReasoning: \u003Cbrief justification\u003E\n\nOnly respond with the score and reasoning."
+                    },
+                    {
+                        "role": "user",
+                        "content": "{ \u0022target\u0022: {{item.target}}, \u0022numbers\u0022: {{item.nums}}, \u0022output\u0022: {{sample.output_text}} }"
+                    }
+                ],
+                "pass_threshold": 5,
+                "range": [
+                    0,
+                    5
+                ],
+                "model": "o3-mini"
+            },
+            "response_format": {
+                "type": "json_schema",
+                "json_schema": {
+                    "name": "math_expression",
+                    "schema": {
+                        "type": "object",
+                        "required": [
+                            "expression",
+                            "result"
+                        ],
+                        "properties": {
+                            "expression": {
+                                "type": "string",
+                                "description": "The mathematical expression to be evaluated."
+                            },
+                            "result": {
+                                "type": "string",
+                                "description": "The result of evaluating the mathematical expression."
+                            }
+                        },
+                        "additionalProperties": false
+                    },
+                    "strict": true
+                }
+            }
+        }
+    }
+}
+```
+
+#### With String Check grader
+
+```json
+{
+    "model": "o4-mini-2025-04-16",
+    "training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
+    "validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
+    "suffix": "TEST",
+    "method": {
+        "type": "reinforcement",
+        "reinforcement": {
+            "hyperparameters": {
+                "eval_interval": 1,
+                "eval_samples": 1,
+                "compute_multiplier": 1,
+                "reasoning_effort": "medium",
+                "n_epochs": 1,
+                "batch_size": 10,
+                "learning_rate_multiplier": 1
+            },
+            "grader": {
+                "name":"answer_string_check",
+                "type":"string_check",
+                "input":"{{item.reference_answer.final_answer}}",
+                "operation":"eq",
+                "reference":"{{sample.output_json.final_answer}}"
+            }
+        }
+    }
+}
+```
+
+#### Text similarity grader
+
+```json
+{
+    "model": "o4-mini-2025-04-16",
+    "training_file": "file-c6578ee2c5194ae99e33711e677d7aa9",
+    "validation_file": "file-7ead313cc49e4e0480b9700bbd513bbc",
+    "suffix": "TEST",
+    "method": {
+        "type": "reinforcement",
+        "reinforcement": {
+            "hyperparameters": {
+                "eval_interval": 1,
+                "eval_samples": 1,
+                "compute_multiplier": 1,
+                "reasoning_effort": "medium",
+                "n_epochs": 1,
+                "batch_size": 10,
+                "learning_rate_multiplier": 1
+            },
+            "grader": {
+              "name":"solution_similarity",
+              "type":"text_similarity",
+              "input":"{{sample.output_json.solution}}",
+              "reference":"{{item.reference_answer.solution}}",
+              "evaluation_metric":"bleu"
+            }
+        }
+    }
+}
+```
+
+**Examples:** [Reference Jupyter Notebook](https://github.com/azure-ai-foundry/build-2025-demos/tree/main/Azure%20AI%20Model%20Customization/MSBuildRFTDemo).
+
+### Validate Grader
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/validate \
+  -H "Content-Type: application/json" \
+  -H "api-key: $AZURE_OPENAI_API_KEY" \
+  -d '{ "grader": { "name":"answer_string_check", "type":"string_check", "input":" {{item.reference_answer.final_answer}}", "operation":"eq", "reference":" {{sample.output_json.final_answer}}" } }' 
+```
+
+### Run Grader
+
+```bash
+curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/alpha/graders/run \
+  -H "Content-Type: application/json" \
+  -H "api-key: $AZURE_OPENAI_API_KEY" \
+  -d '{ "grader": { "name":"solution_similarity", "type":"string_check", "input": " {{item.reference_answer}}", "reference": " {{sample.output_text}}", "operation": "eq" }, "reference_answer": "yes", "model_sample": "yes" }'
+```
+
 ## Best practices
 
 ### Grader selection