Skip to content

Commit b3b51df

Browse files
Merge pull request #7011 from NandiniMurali/patch-7
Python grader addition
2 parents 14c0fc9 + bf15faa commit b3b51df

File tree

1 file changed

+25
-3
lines changed

1 file changed

+25
-3
lines changed

articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,18 @@ Models which we're supporting as grader models are:
188188

189189
To use a score model grader, the input is a list of chat messages, each containing a role, and content. The output of the grader will be truncated to the given range, and default to 0 for all non-numeric outputs.
190190

191+
### Custom Code Grader
192+
193+
Custom code grader allows you to execute arbitrary python code to grade the model output. The grader expects a grade function to be present that takes in two arguments and outputs a float value. Any other result (exception, invalid float value, etc.) will be marked as invalid and return a 0 grade.
194+
195+
```json
196+
{
197+
"type": "python",
198+
"source": "def grade(sample, item):\n return 1.0",
199+
"image_tag": "2025-05-08"
200+
}
201+
```
202+
191203
### Multi Grader
192204

193205
A multigrader object combines the output of multiple graders to produce a single score.
@@ -272,6 +284,19 @@ Models which we're supporting as grader models are `gpt-4o-2024-08-06`and `o3-mi
272284
}
273285
```
274286

287+
**Custom code grader** - This is python code grader where you can use any python code to grader the training output.
288+
289+
The python libraries which are supported by custom code grader are
290+
291+
```json
292+
{
293+
"type": "python",
294+
"image_tag": "alpha",
295+
"source": "import json\nimport re\n\ndef extract_numbers_from_expression(expression: str):\n return [int(num) for num in re.findall(r'-?\\d+', expression)]\n\ndef grade(sample, item) -> float:\n expression_str = sample['output_json']['expression']\n try:\n math_expr_eval = eval(expression_str)\n except Exception:\n return 0\n expr_nums_list = extract_numbers_from_expression(expression_str)\n input_nums_list = [int(x) for x in json.loads(item['nums'])]\n if sorted(expr_nums_list) != sorted(input_nums_list):\n return 0\n sample_result_int = int(sample['output_json']['result'])\n item_result_int = int(item['target'])\n if math_expr_eval != sample_result_int:\n return 1\n if sample_result_int == item_result_int:\n return 5\n if abs(sample_result_int - item_result_int) <= 1:\n return 4\n if abs(sample_result_int - item_result_int) <= 5:\n return 3\n return 2""
296+
}
297+
```
298+
If you don't want to manually put your grading function in a string, you can also load it from a Python file using importlib and inspect
299+
275300
**Multi Grader** - A multigrader object combines the output of multiple graders to produce a single score.
276301

277302
```json
@@ -294,9 +319,6 @@ Models which we're supporting as grader models are `gpt-4o-2024-08-06`and `o3-mi
294319
}
295320
```
296321

297-
> [!Note]
298-
> : Currently we don’t support `multi` with model grader as a sub grader. `Multi` grader is supported only with `text_Similarity` and `string_check`.
299-
300322
Example of response format which is an optional field:
301323

302324
If we need the response for the same puzzles problem used in training data example then can add the response format as shown below where fields ‘solution’ and ‘final answer’ are shared in structured outputs.

0 commit comments

Comments
 (0)