You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/reinforcement-fine-tuning.md
+25-3Lines changed: 25 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -188,6 +188,18 @@ Models which we're supporting as grader models are:
188
188
189
189
To use a score model grader, the input is a list of chat messages, each containing a role, and content. The output of the grader will be truncated to the given range, and default to 0 for all non-numeric outputs.
190
190
191
+
### Custom Code Grader
192
+
193
+
Custom code grader allows you to execute arbitrary python code to grade the model output. The grader expects a grade function to be present that takes in two arguments and outputs a float value. Any other result (exception, invalid float value, etc.) will be marked as invalid and return a 0 grade.
A multigrader object combines the output of multiple graders to produce a single score.
@@ -272,6 +284,19 @@ Models which we're supporting as grader models are `gpt-4o-2024-08-06`and `o3-mi
272
284
}
273
285
```
274
286
287
+
**Custom code grader** - This is python code grader where you can use any python code to grader the training output.
288
+
289
+
The python libraries which are supported by custom code grader are
290
+
291
+
```json
292
+
{
293
+
"type": "python",
294
+
"image_tag": "alpha",
295
+
"source": "import json\nimport re\n\ndef extract_numbers_from_expression(expression: str):\n return [int(num) for num in re.findall(r'-?\\d+', expression)]\n\ndef grade(sample, item) -> float:\n expression_str = sample['output_json']['expression']\n try:\n math_expr_eval = eval(expression_str)\n except Exception:\n return 0\n expr_nums_list = extract_numbers_from_expression(expression_str)\n input_nums_list = [int(x) for x in json.loads(item['nums'])]\n if sorted(expr_nums_list) != sorted(input_nums_list):\n return 0\n sample_result_int = int(sample['output_json']['result'])\n item_result_int = int(item['target'])\n if math_expr_eval != sample_result_int:\n return 1\n if sample_result_int == item_result_int:\n return 5\n if abs(sample_result_int - item_result_int) <= 1:\n return 4\n if abs(sample_result_int - item_result_int) <= 5:\n return 3\n return 2""
296
+
}
297
+
```
298
+
If you don't want to manually put your grading function in a string, you can also load it from a Python file using importlib and inspect
299
+
275
300
**Multi Grader** - A multigrader object combines the output of multiple graders to produce a single score.
276
301
277
302
```json
@@ -294,9 +319,6 @@ Models which we're supporting as grader models are `gpt-4o-2024-08-06`and `o3-mi
294
319
}
295
320
```
296
321
297
-
> [!Note]
298
-
> : Currently we don’t support `multi` with model grader as a sub grader. `Multi` grader is supported only with `text_Similarity` and `string_check`.
299
-
300
322
Example of response format which is an optional field:
301
323
302
324
If we need the response for the same puzzles problem used in training data example then can add the response format as shown below where fields ‘solution’ and ‘final answer’ are shared in structured outputs.
0 commit comments