You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/reinforcement-fine-tuning.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -152,7 +152,7 @@ To evaluate how close the model-generated output is to the reference, scored wit
152
152
153
153
***Supported operations:***
154
154
155
-
-`bleu` – computes bleu score between strings
155
+
-`bleu` – computes BLEU score between strings
156
156
-`Fuzzy_match` – fuzzy string match, using rapidfuzz
157
157
-`gleu` – computes google BLEU score between strings
158
158
-`meteor` – computes METEOR score between strings
@@ -209,7 +209,7 @@ A multigrader object combines the output of multiple graders to produce a single
209
209
-`/` (division)
210
210
-`^` (power)
211
211
212
-
*Functions:*0
212
+
*Functions:*
213
213
-`min`
214
214
-`max`
215
215
-`abs`
@@ -219,7 +219,7 @@ A multigrader object combines the output of multiple graders to produce a single
219
219
-`sqrt`
220
220
-`log`
221
221
222
-
When using the UX you're able to write a prompt and generate a valid grader and response format in json as needed. Grader is mandatory field to be entered while submitting a finetuning job. Response format is optional.
222
+
When using the UX you're able to write a prompt and generate a valid grader and response format in json as needed. Grader is mandatory field to be entered while submitting a fine-tuning job. Response format is optional.
223
223
224
224
> [!IMPORTANT]
225
225
> Generating correct grader schema requires careful prompt authoring. You may find that your first few attempts generate invalid schemas or don't create a schema that will properly handle your training data. Grader is a mandatory field that must be entered while submitting a fine-tuning job. Response format is optional.
@@ -340,7 +340,7 @@ During the training you can view the logs and RFT metrics and pause the job as n
340
340
341
341
### Guardrails on training spending
342
342
343
-
As a RFT job can lead to high training costs, we automatically pause jobs once they have hit $5K in total training costs (training + grading). Users may deploy the most recent checkpoint or resume the training job. If the the user decides to resume the job, billing will continue for the job and subsequently no further price limits would be placed on the training job.
343
+
As an RFT job can lead to high training costs, we automatically pause jobs once they have hit $5K in total training costs (training + grading). Users may deploy the most recent checkpoint or resume the training job. If the user decides to resume the job, billing will continue for the job and subsequently no further price limits would be placed on the training job.
0 commit comments