Skip to content

Commit 85b16b8

Browse files
authored
Merge pull request #279874 from MicrosoftDocs/main
Merge main to live, 4 AM
2 parents 7722b9d + c5a43b4 commit 85b16b8

File tree

261 files changed

+926
-535
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

261 files changed

+926
-535
lines changed

articles/ai-studio/reference/reference-model-inference-api.md

Lines changed: 227 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,13 @@ Models deployed to [serverless API endpoints](../how-to/deploy-models-serverless
4949
> * [Mistral-Large](../how-to/deploy-models-mistral.md)
5050
> * [Phi-3](../how-to/deploy-models-phi-3.md) family of models
5151
52+
Models deployed to [managed inference](../concepts/deployments-overview.md):
53+
54+
> [!div class="checklist"]
55+
> * [Meta Llama 3 instruct](../how-to/deploy-models-llama.md) family of models
56+
> * [Phi-3](../how-to/deploy-models-phi-3.md) family of models
57+
> * Mixtral famility of models
58+
5259
The API is compatible with Azure OpenAI model deployments.
5360

5461
## Capabilities
@@ -66,6 +73,65 @@ The API indicates how developers can consume predictions for the following modal
6673
* [Image embeddings](reference-model-inference-images-embeddings.md): Creates an embedding vector representing the input text and image.
6774

6875

76+
### Inference SDK support
77+
78+
You can use streamlined inference clients in the language of your choice to consume predictions from models running the Azure AI model inference API.
79+
80+
# [Python](#tab/python)
81+
82+
Install the package `azure-ai-inference` using your package manager, like pip:
83+
84+
```bash
85+
pip install azure-ai-inference
86+
```
87+
88+
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
89+
90+
```python
91+
import os
92+
from azure.ai.inference import ChatCompletionsClient
93+
from azure.core.credentials import AzureKeyCredential
94+
95+
model = ChatCompletionsClient(
96+
endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
97+
credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
98+
)
99+
```
100+
101+
# [JavaScript](#tab/javascript)
102+
103+
Install the package `@azure-rest/ai-inference` using npm:
104+
105+
```bash
106+
npm install @azure-rest/ai-inference
107+
```
108+
109+
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
110+
111+
```javascript
112+
import ModelClient from "@azure-rest/ai-inference";
113+
import { isUnexpected } from "@azure-rest/ai-inference";
114+
import { AzureKeyCredential } from "@azure/core-auth";
115+
116+
const client = new ModelClient(
117+
process.env.AZUREAI_ENDPOINT_URL,
118+
new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
119+
);
120+
```
121+
122+
# [REST](#tab/rest)
123+
124+
Use the reference section to explore the API design and which parameters are available. For example, the reference section for [Chat completions](reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions:
125+
126+
__Request__
127+
128+
```HTTP/1.1
129+
POST /chat/completions?api-version=2024-04-01-preview
130+
Authorization: Bearer <bearer-token>
131+
Content-Type: application/json
132+
```
133+
---
134+
69135
### Extensibility
70136

71137
The Azure AI Model Inference API specifies a set of modalities and parameters that models can subscribe to. However, some models may have further capabilities that the ones the API indicates. On those cases, the API allows the developer to pass them as extra parameters in the payload.
@@ -74,6 +140,38 @@ By setting a header `extra-parameters: allow`, the API will attempt to pass any
74140

75141
The following example shows a request passing the parameter `safe_prompt` supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API:
76142

143+
# [Python](#tab/python)
144+
145+
```python
146+
response = model.complete(
147+
messages=[
148+
SystemMessage(content="You are a helpful assistant."),
149+
UserMessage(content="How many languages are in the world?"),
150+
],
151+
model_extras={
152+
"safe_mode": True
153+
}
154+
)
155+
```
156+
157+
# [JavaScript](#tab/javascript)
158+
159+
```javascript
160+
var messages = [
161+
{ role: "system", content: "You are a helpful assistant" },
162+
{ role: "user", content: "How many languages are in the world?" },
163+
];
164+
165+
var response = await client.path("/chat/completions").post({
166+
body: {
167+
messages: messages,
168+
safe_mode: true
169+
}
170+
});
171+
```
172+
173+
# [REST](#tab/rest)
174+
77175
__Request__
78176

79177
```HTTP/1.1
@@ -102,6 +200,8 @@ extra-parameters: allow
102200
}
103201
```
104202

203+
---
204+
105205
> [!TIP]
106206
> Alternatively, you can set `extra-parameters: drop` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
107207
@@ -111,6 +211,71 @@ The Azure AI Model Inference API indicates a general set of capabilities but eac
111211

112212
The following example shows the response for a chat completion request indicating the parameter `reponse_format` and asking for a reply in `JSON` format. In the example, since the model doesn't support such capability an error 422 is returned to the user.
113213

214+
# [Python](#tab/python)
215+
216+
```python
217+
from azure.ai.inference.models import ChatCompletionsResponseFormat
218+
from azure.core.exceptions import HttpResponseError
219+
import json
220+
221+
try:
222+
response = model.complete(
223+
messages=[
224+
SystemMessage(content="You are a helpful assistant."),
225+
UserMessage(content="How many languages are in the world?"),
226+
],
227+
response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT }
228+
)
229+
except HttpResponseError as ex:
230+
if ex.status_code == 422:
231+
response = json.loads(ex.response._content.decode('utf-8'))
232+
if isinstance(response, dict) and "detail" in response:
233+
for offending in response["detail"]:
234+
param = ".".join(offending["loc"])
235+
value = offending["input"]
236+
print(
237+
f"Looks like the model doesn't support the parameter '{param}' with value '{value}'"
238+
)
239+
else:
240+
raise ex
241+
```
242+
243+
# [JavaScript](#tab/javascript)
244+
245+
```javascript
246+
try {
247+
var messages = [
248+
{ role: "system", content: "You are a helpful assistant" },
249+
{ role: "user", content: "How many languages are in the world?" },
250+
];
251+
252+
var response = await client.path("/chat/completions").post({
253+
body: {
254+
messages: messages,
255+
response_format: { type: "json_object" }
256+
}
257+
});
258+
}
259+
catch (error) {
260+
if (error.status_code == 422) {
261+
var response = JSON.parse(error.response._content)
262+
if (response.detail) {
263+
for (const offending of response.detail) {
264+
var param = offending.loc.join(".")
265+
var value = offending.input
266+
console.log(`Looks like the model doesn't support the parameter '${param}' with value '${value}'`)
267+
}
268+
}
269+
}
270+
else
271+
{
272+
throw error
273+
}
274+
}
275+
```
276+
277+
# [REST](#tab/rest)
278+
114279
__Request__
115280

116281
```HTTP/1.1
@@ -150,6 +315,7 @@ __Response__
150315
"message": "One of the parameters contain invalid values."
151316
}
152317
```
318+
---
153319

154320
> [!TIP]
155321
> You can inspect the property `details.loc` to understand the location of the offending parameter and `details.input` to see the value that was passed in the request.
@@ -160,6 +326,65 @@ The Azure AI model inference API supports [Azure AI Content Safety](../concepts/
160326

161327
The following example shows the response for a chat completion request that has triggered content safety.
162328

329+
# [Python](#tab/python)
330+
331+
```python
332+
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
333+
334+
try:
335+
response = model.complete(
336+
messages=[
337+
SystemMessage(content="You are an AI assistant that helps people find information."),
338+
UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
339+
]
340+
)
341+
342+
print(response.choices[0].message.content)
343+
344+
except HttpResponseError as ex:
345+
if ex.status_code == 400:
346+
response = json.loads(ex.response._content.decode('utf-8'))
347+
if isinstance(response, dict) and "error" in response:
348+
print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
349+
else:
350+
raise ex
351+
else:
352+
raise ex
353+
```
354+
355+
# [JavaScript](#tab/javascript)
356+
357+
```javascript
358+
try {
359+
var messages = [
360+
{ role: "system", content: "You are an AI assistant that helps people find information." },
361+
{ role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
362+
]
363+
364+
var response = await client.path("/chat/completions").post({
365+
body: {
366+
messages: messages,
367+
}
368+
});
369+
370+
console.log(response.body.choices[0].message.content)
371+
}
372+
catch (error) {
373+
if (error.status_code == 400) {
374+
var response = JSON.parse(error.response._content)
375+
if (response.error) {
376+
console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`)
377+
}
378+
else
379+
{
380+
throw error
381+
}
382+
}
383+
}
384+
```
385+
386+
# [REST](#tab/rest)
387+
163388
__Request__
164389

165390
```HTTP/1.1
@@ -196,95 +421,8 @@ __Response__
196421
"type": null
197422
}
198423
```
424+
---
199425

200426
## Getting started
201427

202-
The Azure AI Model Inference API is currently supported in models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md). Deploy any of the [supported models](#availability) to a new [Serverless API endpoints](../how-to/deploy-models-serverless.md) to get started. Then you can consume the API in the following ways:
203-
204-
# [Studio](#tab/azure-studio)
205-
206-
You can use the Azure AI Model Inference API to run evaluations or while building with *Prompt flow*. Create a [Serverless Model connection](../how-to/deploy-models-serverless-connect.md) to a *Serverless API endpoint* and consume its predictions. The Azure AI Model Inference API is used under the hood.
207-
208-
# [Python](#tab/python)
209-
210-
Since the API is OpenAI-compatible, you can use any supported SDK that already supports Azure OpenAI. In the following example, we show how you can use LiteLLM with the common API:
211-
212-
```python
213-
import litellm
214-
215-
client = litellm.LiteLLM(
216-
base_url="https://<endpoint-name>.<region>.inference.ai.azure.com",
217-
api_key="<key>",
218-
)
219-
220-
response = client.chat.completions.create(
221-
messages=[
222-
{
223-
"content": "Who is the most renowned French painter?",
224-
"role": "user"
225-
}
226-
],
227-
model="azureai",
228-
custom_llm_provider="custom_openai",
229-
)
230-
231-
print(response.choices[0].message.content)
232-
```
233-
234-
# [REST](#tab/rest)
235-
236-
Models deployed in Azure Machine Learning and Azure AI studio in Serverless API endpoints support the Azure AI Model Inference API. Each endpoint exposes the OpenAPI specification for the modalities the model support. Use the **Endpoint URI** and the **Key** to download the OpenAPI definition for the model. In the following example, we download it from a bash console. Replace `<TOKEN>` by the **Key** and `<ENDPOINT_URI>` for the **Endpoint URI**.
237-
238-
```bash
239-
wget -d --header="Authorization: Bearer <TOKEN>" <ENDPOINT_URI>/swagger.json
240-
```
241-
242-
Use the **Endpoint URI** and the **Key** to submit requests. The following example sends a request to a Cohere embedding model:
243-
244-
```HTTP/1.1
245-
POST /embeddings?api-version=2024-04-01-preview
246-
Authorization: Bearer <bearer-token>
247-
Content-Type: application/json
248-
```
249-
250-
```JSON
251-
{
252-
"input": [
253-
"Explain the theory of strings"
254-
],
255-
"input_type": "query",
256-
"encoding_format": "float",
257-
"dimensions": 1024
258-
}
259-
```
260-
261-
__Response__
262-
263-
```json
264-
{
265-
"id": "ab1c2d34-5678-9efg-hi01-0123456789ea",
266-
"object": "list",
267-
"data": [
268-
{
269-
"index": 0,
270-
"object": "embedding",
271-
"embedding": [
272-
0.001912117,
273-
0.048706055,
274-
-0.06359863,
275-
//...
276-
-0.00044369698
277-
]
278-
}
279-
],
280-
"model": "",
281-
"usage": {
282-
"prompt_tokens": 7,
283-
"completion_tokens": 0,
284-
"total_tokens": 7
285-
}
286-
}
287-
```
288-
289-
290-
---
428+
The Azure AI Model Inference API is currently supported in certain models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md) and Managed Online Endpoints. Deploy any of the [supported models](#availability) and use the exact same code to consume their predictions.

articles/app-service/overview-vnet-integration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,9 @@ The virtual network integration feature supports two virtual interfaces per work
7070

7171
Virtual network integration depends on a dedicated subnet. When you create a subnet, the Azure subnet consumes five IPs from the start. One address is used from the integration subnet for each App Service plan instance. If you scale your app to four instances, then four addresses are used.
7272

73-
When you scale up/down in instance size, the amount of IP addresses used by the App Service plan is temporarily doubled while the scale operation completes. The new instances need to be fully operational before the existing instances are deprovisioned. The scale operation affects the real, available supported instances for a given subnet size. Platform upgrades need free IP addresses to ensure upgrades can happen without interruptions to outbound traffic. Finally, after scale up, down, or in operations complete, there might be a short period of time before IP addresses are released. In rare cases, this operation can be up to 12 hours.
73+
When you scale up/down in instance size, the amount of IP addresses used by the App Service plan is temporarily doubled while the scale operation completes. The new instances need to be fully operational before the existing instances are deprovisioned. The scale operation affects the real, available supported instances for a given subnet size. Platform upgrades need free IP addresses to ensure upgrades can happen without interruptions to outbound traffic. Finally, after scale up, down, or in operations complete, there might be a short period of time before IP addresses are released. In rare cases, this operation can be up to 12 hours and if you rapidly scaling in/out or up/down, you need more IPs than the maximum scale.
7474

75-
Because subnet size can't be changed after assignment, use a subnet that's large enough to accommodate whatever scale your app might reach. You should also reserve IP addresses for platform upgrades. To avoid any issues with subnet capacity, use a `/26` with 64 addresses. When you're creating subnets in Azure portal as part of integrating with the virtual network, a minimum size of `/27` is required. If the subnet already exists before integrating through the portal, you can use a `/28` subnet.
75+
Because subnet size can't be changed after assignment, use a subnet that's large enough to accommodate whatever scale your app might reach. You should also reserve IP addresses for platform upgrades. To avoid any issues with subnet capacity, we recommand allocating double the IPs of your planned maximum scale. A `/26` with 64 addresses cover the maximum scale of a single multitenant App Service plan. When you're creating subnets in Azure portal as part of integrating with the virtual network, a minimum size of `/27` is required. If the subnet already exists before integrating through the portal, you can use a `/28` subnet.
7676

7777
With multi plan subnet join (MPSJ), you can join multiple App Service plans in to the same subnet. All App Service plans must be in the same subscription but the virtual network/subnet can be in a different subscription. Each instance from each App Service plan requires an IP address from the subnet and to use MPSJ a minimum size of `/26` subnet is required. If you plan to join many and/or large scale plans, you should plan for larger subnet ranges.
7878

0 commit comments

Comments
 (0)