You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/fine-tuning-deploy.md
+7-61Lines changed: 7 additions & 61 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -366,7 +366,9 @@ Azure OpenAI fine-tuning supports the following deployment types.
366
366
|GPT-35-Turbo-1106-finetune|East US2, North Central US, Sweden Central, Switzerland West|
367
367
|GPT-35-Turbo-0125-finetune|East US2, North Central US, Sweden Central, Switzerland West|
368
368
369
-
### Global Standard (preview)
369
+
### Global Standard
370
+
371
+
[Global standard](./deployment-types.md#global-standard) fine-tuned deployments offer [cost savings](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/), but custom model weights may temporarily be stored outside the geography of your Azure OpenAI resource.
370
372
371
373
| Models | Region |
372
374
|--|--|
@@ -375,72 +377,16 @@ Azure OpenAI fine-tuning supports the following deployment types.
375
377
|GPT-4o-finetune|East US2, North Central US, and Sweden Central|
376
378
|GPT-4o-mini-finetune|East US2, North Central US, and Sweden Central|
377
379
378
-
[Global standard](./deployment-types.md#global-standard) fine-tuned deployments offer [cost savings](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/), but custom model weights may temporarily be stored outside the geography of your Azure OpenAI resource.
379
-
380
380
:::image type="content" source="../media/fine-tuning/global-standard.png" alt-text="Screenshot of the global standard deployment user experience with a fine-tuned model." lightbox="../media/fine-tuning/global-standard.png":::
381
381
382
-
### Provisioned Managed (preview)
382
+
### Provisioned Managed
383
383
384
384
| Models | Region |
385
385
|--|--|
386
-
|GPT-4o-finetune|North Central US, Switzerland West|
387
-
|GPT-4o-mini-finetune|North Central US, Switzerland West|
388
-
389
-
-`gpt-4o-mini-2024-07-18`
390
-
-`gpt-4o-2024-08-06`
391
-
392
-
[Provisioned managed](./deployment-types.md#provisioned) fine-tuned deployments offer [predictable performance](../concepts/provisioned-throughput.md) for fine-tuned deployments. As part of public preview, provisioned managed deployments may be created regionally via the data-plane [REST API](../reference.md#data-plane-inference) version `2024-10-01` or newer. See below for examples.
393
-
394
-
#### Creating a Provisioned Managed deployment
395
-
396
-
To create a new deployment, make an HTTP PUT call via the [Deployments - Create or Update REST API](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP&preserve-view=true). The approach is similar to performing [cross region deployment](#cross-region-deployment) with the following exceptions:
397
-
398
-
- You must provide a `sku` name of `ProvisionedManaged`.
399
-
- The capacity must be declared in PTUs.
400
-
- The `api-version` must be `2024-10-01` or newer.
401
-
- The HTTP method should be `PUT`.
402
-
403
-
For example, to deploy a gpt-4o-mini model:
404
-
405
-
```bash
406
-
curl -X PUT "https://management.azure.com/subscriptions/<SUBSCRIPTION>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.CognitiveServices/accounts/<RESOURCE_NAME>/deployments/<MODEL_DEPLOYMENT_NAME>api-version=2024-10-01" \
#### Scaling a fine-tuned model on Provisioned Managed
386
+
|GPT-4o-finetune|North Central US, Sweden Central|
387
+
|GPT-4o-mini-finetune|North Central US, Sweden Central|
423
388
424
-
To scale a fine-tuned provision managed deployment to increase or decrease PTU capacity, perform the same `PUT` REST API call as you did when [creating the deployment](#creating-a-provisioned-managed-deployment) and provide an updated `capacity` value for the `sku`. Keep in mind, provisioned deployments must scale in [minimum increments](../how-to/provisioned-throughput-onboarding.md#how-much-throughput-per-ptu-you-get-for-each-model).
425
-
426
-
For example, to scale the model deployed in the previous section from 25 to 40 PTU, make another `PUT` call and increase the capacity:
427
-
428
-
```bash
429
-
curl -X PUT "https://management.azure.com/subscriptions/<SUBSCRIPTION>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.CognitiveServices/accounts/<RESOURCE_NAME>/deployments/<MODEL_DEPLOYMENT_NAME>api-version=2024-10-01" \
[Provisioned managed](./deployment-types.md#provisioned) fine-tuned deployments offer [predictable performance](../concepts/provisioned-throughput.md) for latency-sensitive agents and applications. They use the same regional provisioned throughput (PTU) capacity as base models, so if you already have regional PTU quota you can deploy your fine-tuned model in support regions.
0 commit comments