Skip to content

Commit 19fd822

Browse files
committed
update
1 parent f8b905c commit 19fd822

File tree

3 files changed

+42
-0
lines changed

3 files changed

+42
-0
lines changed

articles/ai-foundry/openai/includes/fine-tuning-openai-in-ai-studio.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,17 @@ When each training epoch completes a checkpoint is generated. A checkpoint is a
177177

178178
:::image type="content" source="../media/fine-tuning/checkpoints.png" alt-text="Screenshot of checkpoints UI." lightbox="../media/fine-tuning/checkpoints.png":::
179179

180+
## Pause and resume
181+
182+
You can track progress in both fine-tuning views of the AI Foundry portal. You'll see your job go through the same statuses as normal fine tuning jobs (queued, running, succeeded).
183+
184+
You can also review the results files while training runs, to get a peak at the progress and whether your training is proceeding as expected.
185+
186+
> [!NOTE]
187+
> During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
188+
189+
:::image type="content" source="../media/how-to/reinforcement-fine-tuning/pause.png" alt-text="Screenshot of the reinforcement fine-tuning with a running job." lightbox="../media/how-to/reinforcement-fine-tuning/pause.png":::
190+
180191
## Analyze your fine-tuned model
181192

182193
After fine-tuning is successfully completed, you can download a result file named _results.csv_ from the fine-tuned model page under the **Details** tab. You can use the result file to analyze the training and validation performance of your custom model.

articles/ai-foundry/openai/includes/fine-tuning-rest.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,26 @@ curl -X GET $AZURE_OPENAI_ENDPOINT/openai/fine_tuning/jobs/<YOUR-JOB-ID>?api-ver
158158
-H "api-key: $AZURE_OPENAI_API_KEY"
159159
```
160160

161+
## Pause and resume
162+
163+
During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
164+
165+
### Pause
166+
167+
```bash
168+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs/{fine_tuning_job_id}/pause \
169+
-H "Content-Type: application/json" \
170+
-H "api-key: $AZURE_OPENAI_API_KEY"
171+
```
172+
173+
### Resume
174+
175+
```bash
176+
curl -X POST $AZURE_OPENAI_ENDPOINT/openai/v1/fine_tuning/jobs/{fine_tuning_job_id}/resume \
177+
-H "Content-Type: application/json" \
178+
-H "api-key: $AZURE_OPENAI_API_KEY"
179+
```
180+
161181
### List fine-tuning events
162182

163183
To examine the individual fine-tuning events that were generated during training:

articles/ai-foundry/openai/includes/fine-tuning-studio.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,17 @@ Your job might be queued behind other jobs on the system. Training your model ca
215215

216216
When each training epoch completes a checkpoint is generated. A checkpoint is a fully functional version of a model which can both be deployed and used as the target model for subsequent fine-tuning jobs. Checkpoints can be particularly useful, as they may provide snapshots prior to overfitting. When a fine-tuning job completes you will have the three most recent versions of the model available to deploy.
217217

218+
## Pause and resume
219+
220+
You can track progress in both fine-tuning views of the AI Foundry portal. You'll see your job go through the same statuses as normal fine tuning jobs (queued, running, succeeded).
221+
222+
You can also review the results files while training runs, to get a peak at the progress and whether your training is proceeding as expected.
223+
224+
> [!NOTE]
225+
> During the training you can view the logs and metrics and pause the job as needed. Pausing can be useful, if metrics aren't converging or if you feel model isn't learning at the right pace. Once the training job is paused, a deployable checkpoint will be created once safety evals are complete. This checkpoint available for you to deploy and use for inference or resume the job further to completion. Pause operation is only applicable for jobs which have been trained for at least one step and are in *Running* state.
226+
227+
:::image type="content" source="../media/how-to/reinforcement-fine-tuning/pause.png" alt-text="Screenshot of the reinforcement fine-tuning with a running job." lightbox="../media/how-to/reinforcement-fine-tuning/pause.png":::
228+
218229
## Analyze your custom model
219230

220231
Azure OpenAI attaches a result file named _results.csv_ to each fine-tuning job after it completes. You can use the result file to analyze the training and validation performance of your custom model. The file ID for the result file is listed for each custom model in the **Result file Id** column on the **Models** pane for Azure AI Foundry portal. You can use the file ID to identify and download the result file from the **Data files** pane of Azure AI Foundry portal.

0 commit comments

Comments
 (0)