Merge pull request #7011 from NandiniMurali/patch-7

prmerger-automator[bot] · web-flow · commit b3b51dff2783 · 2025-09-10T16:16:17.000Z
Python grader addition
diff --git a/articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md b/articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md
@@ -188,6 +188,18 @@ Models which we're supporting as grader models are:
 
 To use a score model grader, the input is a list of chat messages, each containing a role, and content. The output of the grader will be truncated to the given range, and default to 0 for all non-numeric outputs.
 
+### Custom Code Grader
+
+Custom code  grader allows you to execute arbitrary python code to grade the model output. The grader expects a grade function to be present that takes in two arguments and outputs a float value. Any other result (exception, invalid float value, etc.) will be marked as invalid and return a 0 grade.
+
+```json
+{
+    "type": "python",
+    "source": "def grade(sample, item):\n    return 1.0",
+    "image_tag": "2025-05-08"
+}
+```
+
 ### Multi Grader
 
 A multigrader object combines the output of multiple graders to produce a single score.	
@@ -272,6 +284,19 @@ Models which we're supporting as grader models are `gpt-4o-2024-08-06`and `o3-mi
 }
 ```
 
+**Custom code grader** - This is python code grader where you can use any python code to grader the training output.
+
+The python libraries which are supported by custom code grader are 
+
+```json
+{
+"type": "python", 
+"image_tag": "alpha", 
+"source": "import json\nimport re\n\ndef extract_numbers_from_expression(expression: str):\n    return [int(num) for num in re.findall(r'-?\\d+', expression)]\n\ndef grade(sample, item) -> float:\n    expression_str = sample['output_json']['expression']\n    try:\n        math_expr_eval = eval(expression_str)\n    except Exception:\n        return 0\n    expr_nums_list = extract_numbers_from_expression(expression_str)\n    input_nums_list = [int(x) for x in json.loads(item['nums'])]\n    if sorted(expr_nums_list) != sorted(input_nums_list):\n        return 0\n    sample_result_int = int(sample['output_json']['result'])\n    item_result_int = int(item['target'])\n    if math_expr_eval != sample_result_int:\n        return 1\n    if sample_result_int == item_result_int:\n        return 5\n    if abs(sample_result_int - item_result_int) <= 1:\n        return 4\n    if abs(sample_result_int - item_result_int) <= 5:\n        return 3\n    return 2""
+	}
+```
+If you don't want to manually put your grading function in a string, you can also load it from a Python file using importlib and inspect
+
 **Multi Grader** - A multigrader object combines the output of multiple graders to produce a single score.
 
 ```json
@@ -294,9 +319,6 @@ Models which we're supporting as grader models are `gpt-4o-2024-08-06`and `o3-mi
 }
 ```
 
-> [!Note]
-> : Currently we don’t support `multi` with model grader as a sub grader. `Multi` grader is supported only with `text_Similarity` and `string_check`.
-
 Example of response format which is an optional field:
 
 If we need the response for the same puzzles problem used in training data example then can add the response format as shown below where fields ‘solution’ and ‘final answer’ are shared in structured outputs.