Skip to content

Commit bc33227

Browse files
authored
Merge pull request #3176 from MicrosoftDocs/main
2/25/2025 PM Publish
2 parents 3cc23ef + 0acdc01 commit bc33227

24 files changed

+1073
-830
lines changed

articles/ai-foundry/model-inference/quotas-limits.md

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ This article contains a quick reference and a detailed description of the quotas
1717

1818
## Quotas and limits reference
1919

20-
The following sections provide you with a quick guide to the default quotas and limits that apply to Azure AI model's inference service in Azure AI services:
20+
Azure uses quotas and limits to prevent budget overruns due to fraud, and to honor Azure capacity constraints. Consider these limits as you scale for production workloads. The following sections provide you with a quick guide to the default quotas and limits that apply to Azure AI model's inference service in Azure AI services:
2121

2222
### Resource limits
2323

@@ -28,12 +28,18 @@ The following sections provide you with a quick guide to the default quotas and
2828

2929
### Rate limits
3030

31-
| Limit name | Limit value |
32-
| ---------- | ----------- |
33-
| Tokens per minute (Azure OpenAI models) | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
34-
| Tokens per minute (rest of models) | 200.000 |
35-
| Requests per minute (Azure OpenAI models) | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
36-
| Requests per minute (rest of models) | 1.000 |
31+
| Limit name | Applies to | Limit value |
32+
| -------------------- | ------------------- | ----------- |
33+
| Tokens per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
34+
| Requests per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
35+
| Tokens per minute | DeepSeek models | 5.000.000 |
36+
| Requests per minute | DeepSeek models | 5.000 |
37+
| Concurrent requests | DeepSeek models | 300 |
38+
| Tokens per minute | Rest of models | 200.000 |
39+
| Requests per minute | Rest of models | 1.000 |
40+
| Concurrent requests | Rest of models | 300 |
41+
42+
You can [request increases to the default limits](#request-increases-to-the-default-limits). Due to high demand, limit increase requests can be submitted and evaluated per request.
3743

3844
### Other limits
3945

@@ -49,6 +55,28 @@ Global Standard deployments use Azure's global infrastructure, dynamically routi
4955

5056
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
5157

58+
## Request increases to the default limits
59+
60+
Limit increase requests can be submitted and evaluated per request. [Open an online customer support request](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest/). When requesting for endpoint limit increase, provide the following information:
61+
62+
1. When opening the support request, select **Service and subscription limits (quotas)** as the **Issue type**.
63+
64+
1. Select the subscription of your choice.
65+
66+
1. Select **Cognitive Services** as **Quota type**.
67+
68+
1. Select **Next**.
69+
70+
1. On the **Additional details** tab, you need to provide detailed reasons for the limit increase in order for your request to be processed. Be sure to add the following information into the reason for limit increase:
71+
72+
* Model name, model version (if applicable), and deployment type (SKU).
73+
* Description of your scenario and workload.
74+
* Rationale for the requested increase.
75+
* Provide the target throughput: Tokens per minute, requests per minute, etc.
76+
* Provide planned time plan (by when you need increased limits).
77+
78+
1. Finally, select **Save and continue** to continue.
79+
5280
## General best practices to remain within rate limits
5381

5482
To minimize issues related to rate limits, it's a good idea to use the following techniques:
@@ -58,10 +86,6 @@ To minimize issues related to rate limits, it's a good idea to use the following
5886
- Test different load increase patterns.
5987
- Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.
6088

61-
### Request increases to the default quotas and limits
62-
63-
Quota increase requests can be submitted and evaluated per request. [Submit a service request](../../ai-services/cognitive-services-support-options.md?context=/azure/ai-services/openai/context/context).
64-
6589
## Next steps
6690

67-
* Learn more about the [models available in the Azure AI model's inference service](./concepts/models.md)
91+
* Learn more about the [models available in the Azure AI model's inference service](./concepts/models.md)

articles/ai-services/openai/concepts/model-retirements.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
44
description: Learn about the model deprecations and retirements in Azure OpenAI.
55
ms.service: azure-ai-openai
66
ms.topic: conceptual
7-
ms.date: 02/24/2025
7+
ms.date: 02/25/2025
88
ms.custom:
99
manager: nitinme
1010
author: mrbullwinkle
@@ -93,7 +93,7 @@ These models are currently available for use in Azure OpenAI Service.
9393

9494
| Model | Version | Retirement date | Suggested replacements |
9595
| ---- | ---- | ---- | --- |
96-
| `dall-e-3` | 3 | No earlier than April 30, 2025 | |
96+
| `dall-e-3` | 3 | No earlier than June 30, 2025 | |
9797
| `gpt-35-turbo-16k`| 0613 | April, 30, 2025 | `gpt-35-turbo` (0125) <br><br> `gpt-4o-mini`|
9898
| `gpt-35-turbo` | 1106 | No earlier than May 31, 2025 <br><br> Deployments set to [**Auto-update to default**](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#auto-update-to-default) will be automatically upgraded to version: `0125`, starting on January 21, 2025. | `gpt-35-turbo` (0125) <br><br> `gpt-4o-mini` |
9999
| `gpt-35-turbo` | 0125 | No earlier than May 31, 2025 | `gpt-4o-mini` |
@@ -171,6 +171,10 @@ If you're an existing customer looking for information about these models, see [
171171

172172
## Retirement and deprecation history
173173

174+
## February 25, 2025
175+
176+
- `dalle-3` updated to no earlier than June 30, 2025.
177+
174178
## February 20, 2025
175179

176180
- `o1-preview` updated to no earlier than April 2, 2025.

articles/ai-services/openai/how-to/fine-tuning-functions.md

Lines changed: 103 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,118 @@
11
---
22
title: Fine-tuning function calls with Azure OpenAI Service
3-
description: Learn how to improve function calling performance with Azure OpenAI fine-tuning
3+
description: Learn how to improve tool calling performance with Azure OpenAI fine-tuning
44
#services: cognitive-services
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: how-to
8-
ms.date: 09/05/2024
8+
ms.date: 02/20/2025
99
author: mrbullwinkle
1010
ms.author: mbullwin
1111
---
1212

1313

14-
# Fine-tuning and function calling
14+
# Fine-tuning and tool calling
1515

16-
Models that use the chat completions API support [function calling](../how-to/function-calling.md). Unfortunately, functions defined in your chat completion calls don't always perform as expected. Fine-tuning your model with function calling examples can improve model output by enabling you to:
16+
Models that use the chat completions API support [tool calling](../how-to/function-calling.md). Unfortunately, functions defined in your chat completion calls don't always perform as expected. Fine-tuning your model with tool calling examples can improve model output by enabling you to:
1717

1818
* Get similarly formatted responses even when the full function definition isn't present. (Allowing you to potentially save money on prompt tokens.)
1919
* Get more accurate and consistent outputs.
2020

21-
## Constructing a training file
21+
> [!NOTE]
22+
> `function_call` and `functions` have been deprecated in favor of `tools`.
23+
> It is recommended to use the `tools` parameter instead.
24+
25+
26+
## Tool calling (recommended)
27+
### Constructing a training file
28+
29+
When constructing a training file of tool calling examples, you would take a function definition like this:
30+
31+
```json
32+
{
33+
"messages": [
34+
{ "role": "user", "content": "What is the weather in San Francisco?" },
35+
{
36+
"role": "assistant",
37+
"tool_calls": [
38+
{
39+
"id": "call_id",
40+
"type": "function",
41+
"function": {
42+
"name": "get_current_weather",
43+
"arguments": "{\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}"
44+
}
45+
}
46+
]
47+
}
48+
],
49+
"tools": [
50+
{
51+
"type": "function",
52+
"function": {
53+
"name": "get_current_weather",
54+
"description": "Get the current weather",
55+
"parameters": {
56+
"type": "object",
57+
"properties": {
58+
"location": {
59+
"type": "string",
60+
"description": "The city and country, eg. San Francisco, USA"
61+
},
62+
"format": { "type": "string", "enum": ["celsius", "fahrenheit"] }
63+
},
64+
"required": ["location", "format"]
65+
}
66+
}
67+
}
68+
]
69+
}
70+
```
71+
72+
And express the information as a single line within your `.jsonl` training file as below:
73+
74+
```jsonl
75+
{"messages":[{"role":"user","content":"What is the weather in San Francisco?"},{"role":"assistant","tool_calls":[{"id":"call_id","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}"}}]}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and country, eg. San Francisco, USA"},"format":{"type":"string","enum":["celsius","fahrenheit"]}},"required":["location","format"]}}}]}
76+
```
77+
78+
As with all fine-tuning training your example file requires at least 10 examples.
79+
80+
### Optimize for cost
81+
82+
OpenAI recommends that if you're trying to optimize to use fewer prompt tokens post fine-tuning your model on the full function definitions you can experiment with:
83+
84+
* Omit function and parameter descriptions: remove the description field from function and parameters.
85+
* Omit parameters: remove the entire properties field from the parameters object.
86+
* Omit function entirely: remove the entire function object from the functions array.
87+
88+
### Optimize for quality
89+
90+
Alternatively, if you're trying to improve the quality of the tool calling output, it's recommended that the function definitions present in the fine-tuning training dataset and subsequent chat completion calls remain identical.
91+
92+
### Customize model responses to function outputs
93+
94+
Fine-tuning based on tool calling examples can also be used to improve the model's response to function outputs. To accomplish this, you include examples consisting of function response messages and assistant response messages where the function response is interpreted and put into context by the assistant.
95+
96+
```json
97+
{
98+
"messages": [
99+
{"role": "user", "content": "What is the weather in San Francisco?"},
100+
{"role": "assistant", "tool_calls": [{"id": "call_id", "type": "function", "function": {"name": "get_current_weather", "arguments": "{\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}"}}]}
101+
{"role": "tool", "tool_call_id": "call_id", "content": "21.0"},
102+
{"role": "assistant", "content": "It is 21 degrees celsius in San Francisco, CA"}
103+
],
104+
"tools": [] // same as before
105+
}
106+
```
107+
108+
As with the example before, this example is artificially expanded for readability. The actual entry in the `.jsonl` training file would be a single line:
109+
110+
```jsonl
111+
{"messages":[{"role":"user","content":"What is the weather in San Francisco?"},{"role":"assistant","tool_calls":[{"id":"call_id","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}"}}]},{"role":"tool","tool_call_id":"call_id","content":"21.0"},{"role":"assistant","content":"It is 21 degrees celsius in San Francisco, CA"}],"tools":[]}
112+
```
113+
114+
## Function calling
115+
### Constructing a training file
22116

23117
When constructing a training file of function calling examples, you would take a function definition like this:
24118

@@ -51,19 +145,19 @@ And express the information as a single line within your `.jsonl` training file
51145

52146
As with all fine-tuning training your example file requires at least 10 examples.
53147

54-
## Optimize for cost
148+
### Optimize for cost
55149

56150
OpenAI recommends that if you're trying to optimize to use fewer prompt tokens post fine-tuning your model on the full function definitions you can experiment with:
57151

58152
* Omit function and parameter descriptions: remove the description field from function and parameters.
59153
* Omit parameters: remove the entire properties field from the parameters object.
60154
* Omit function entirely: remove the entire function object from the functions array.
61155

62-
## Optimize for quality
156+
### Optimize for quality
63157

64158
Alternatively, if you're trying to improve the quality of the function calling output, it's recommended that the function definitions present in the fine-tuning training dataset and subsequent chat completion calls remain identical.
65159

66-
## Customize model responses to function outputs
160+
### Customize model responses to function outputs
67161

68162
Fine-tuning based on function calling examples can also be used to improve the model's response to function outputs. To accomplish this, you include examples consisting of function response messages and assistant response messages where the function response is interpreted and put into context by the assistant.
69163

@@ -85,6 +179,7 @@ As with the example before, this example is artificially expanded for readabilit
85179
{"messages": [{"role": "user", "content": "What is the weather in San Francisco?"}, {"role": "assistant", "function_call": {"name": "get_current_weather", "arguments": "{\"location\": \"San Francisco, USA\", \"format\": \"celcius\"}"}}, {"role": "function", "name": "get_current_weather", "content": "21.0"}, {"role": "assistant", "content": "It is 21 degrees celsius in San Francisco, CA"}], "functions": []}
86180
```
87181

182+
88183
## Next steps
89184

90185
* [Function calling fine-tuning scenarios](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/fine-tuning-with-function-calling-on-azure-openai-service/ba-p/4065968).

0 commit comments

Comments
 (0)