You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md
+158Lines changed: 158 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -379,6 +379,164 @@ You can deploy the fine tuning job which is completed or any intermittent checkp
379
379
380
380
When using your model, make sure to use the same instructions and structure as used during training. This keeps the model in distribution, and ensures that you see the same performance on your problems during inference as you achieved during training.
"content": "You are a mathematical evaluator. Given a reference target number, a list of input numbers, and a model\u0027s output (an arithmetic expression and its reported result), your task is to evaluate the correctness and closeness of the model\u0027s answer.\n\nInput values are passed in as **strings**, including the number list and target. You must:\n1. Convert the \u0060target\u0060 string to a number.\n2. Convert the \u0060numbers\u0060 string into a list of numbers.\n3. Parse and validate the \u0060output_expression\u0060 \u2014 ensure it is a valid arithmetic expression.\n4. Evaluate the expression and confirm it matches the model\u0027s reported \u0060output_result\u0060.\n5. Check that **all input numbers are used exactly once**.\n6. Compare the evaluated result with the target and assign a score.\n\nScoring Rules:\n- 5: Valid expression, correct number usage, exact match to target\n- 4: Off by \u00B11\n- 3: Off by \u00B12 to \u00B15\n- 2: Off by \u003E5\n- 1: Minor issues (e.g., small mismatch in numbers used)\n- 0: Major issues \u2014 invalid expression or number usage\n\nOutput Format:\nScore: \u003C0 - 5\u003E\nReasoning: \u003Cbrief justification\u003E\n\nOnly respond with the score and reasoning."
0 commit comments