You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use the reference section to explore the API design and which parameters are available. For example, the reference section for [Chat completions](reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions:
125
+
126
+
__Request__
127
+
128
+
```HTTP/1.1
129
+
POST /chat/completions?api-version=2024-04-01-preview
130
+
Authorization: Bearer <bearer-token>
131
+
Content-Type: application/json
132
+
```
133
+
---
134
+
69
135
### Extensibility
70
136
71
137
The Azure AI Model Inference API specifies a set of modalities and parameters that models can subscribe to. However, some models may have further capabilities that the ones the API indicates. On those cases, the API allows the developer to pass them as extra parameters in the payload.
@@ -74,6 +140,38 @@ By setting a header `extra-parameters: allow`, the API will attempt to pass any
74
140
75
141
The following example shows a request passing the parameter `safe_prompt` supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API:
76
142
143
+
# [Python](#tab/python)
144
+
145
+
```python
146
+
response = model.complete(
147
+
messages=[
148
+
SystemMessage(content="You are a helpful assistant."),
149
+
UserMessage(content="How many languages are in the world?"),
150
+
],
151
+
model_extras={
152
+
"safe_mode": True
153
+
}
154
+
)
155
+
```
156
+
157
+
# [JavaScript](#tab/javascript)
158
+
159
+
```javascript
160
+
var messages = [
161
+
{ role:"system", content:"You are a helpful assistant" },
162
+
{ role:"user", content:"How many languages are in the world?" },
163
+
];
164
+
165
+
var response =awaitclient.path("/chat/completions").post({
166
+
body: {
167
+
messages: messages,
168
+
safe_mode:true
169
+
}
170
+
});
171
+
```
172
+
173
+
# [REST](#tab/rest)
174
+
77
175
__Request__
78
176
79
177
```HTTP/1.1
@@ -102,6 +200,8 @@ extra-parameters: allow
102
200
}
103
201
```
104
202
203
+
---
204
+
105
205
> [!TIP]
106
206
> Alternatively, you can set `extra-parameters: drop` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
107
207
@@ -111,6 +211,71 @@ The Azure AI Model Inference API indicates a general set of capabilities but eac
111
211
112
212
The following example shows the response for a chat completion request indicating the parameter `reponse_format` and asking for a reply in `JSON` format. In the example, since the model doesn't support such capability an error 422 is returned to the user.
113
213
214
+
# [Python](#tab/python)
215
+
216
+
```python
217
+
from azure.ai.inference.models import ChatCompletionsResponseFormat
218
+
from azure.core.exceptions import HttpResponseError
219
+
import json
220
+
221
+
try:
222
+
response = model.complete(
223
+
messages=[
224
+
SystemMessage(content="You are a helpful assistant."),
225
+
UserMessage(content="How many languages are in the world?"),
f"Looks like the model doesn't support the parameter '{param}' with value '{value}'"
238
+
)
239
+
else:
240
+
raise ex
241
+
```
242
+
243
+
# [JavaScript](#tab/javascript)
244
+
245
+
```javascript
246
+
try {
247
+
var messages = [
248
+
{ role:"system", content:"You are a helpful assistant" },
249
+
{ role:"user", content:"How many languages are in the world?" },
250
+
];
251
+
252
+
var response =awaitclient.path("/chat/completions").post({
253
+
body: {
254
+
messages: messages,
255
+
response_format: { type:"json_object" }
256
+
}
257
+
});
258
+
}
259
+
catch (error) {
260
+
if (error.status_code==422) {
261
+
var response =JSON.parse(error.response._content)
262
+
if (response.detail) {
263
+
for (constoffendingofresponse.detail) {
264
+
var param =offending.loc.join(".")
265
+
var value =offending.input
266
+
console.log(`Looks like the model doesn't support the parameter '${param}' with value '${value}'`)
267
+
}
268
+
}
269
+
}
270
+
else
271
+
{
272
+
throw error
273
+
}
274
+
}
275
+
```
276
+
277
+
# [REST](#tab/rest)
278
+
114
279
__Request__
115
280
116
281
```HTTP/1.1
@@ -150,6 +315,7 @@ __Response__
150
315
"message": "One of the parameters contain invalid values."
151
316
}
152
317
```
318
+
---
153
319
154
320
> [!TIP]
155
321
> You can inspect the property `details.loc` to understand the location of the offending parameter and `details.input` to see the value that was passed in the request.
@@ -160,6 +326,65 @@ The Azure AI model inference API supports [Azure AI Content Safety](../concepts/
160
326
161
327
The following example shows the response for a chat completion request that has triggered content safety.
162
328
329
+
# [Python](#tab/python)
330
+
331
+
```python
332
+
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
333
+
334
+
try:
335
+
response = model.complete(
336
+
messages=[
337
+
SystemMessage(content="You are an AI assistant that helps people find information."),
338
+
UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
console.log(`Your request triggered an ${response.error.code} error:\n\t${response.error.message}`)
377
+
}
378
+
else
379
+
{
380
+
throw error
381
+
}
382
+
}
383
+
}
384
+
```
385
+
386
+
# [REST](#tab/rest)
387
+
163
388
__Request__
164
389
165
390
```HTTP/1.1
@@ -196,95 +421,8 @@ __Response__
196
421
"type": null
197
422
}
198
423
```
424
+
---
199
425
200
426
## Getting started
201
427
202
-
The Azure AI Model Inference API is currently supported in models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md). Deploy any of the [supported models](#availability) to a new [Serverless API endpoints](../how-to/deploy-models-serverless.md) to get started. Then you can consume the API in the following ways:
203
-
204
-
# [Studio](#tab/azure-studio)
205
-
206
-
You can use the Azure AI Model Inference API to run evaluations or while building with *Prompt flow*. Create a [Serverless Model connection](../how-to/deploy-models-serverless-connect.md) to a *Serverless API endpoint* and consume its predictions. The Azure AI Model Inference API is used under the hood.
207
-
208
-
# [Python](#tab/python)
209
-
210
-
Since the API is OpenAI-compatible, you can use any supported SDK that already supports Azure OpenAI. In the following example, we show how you can use LiteLLM with the common API:
"content": "Who is the most renowned French painter?",
224
-
"role": "user"
225
-
}
226
-
],
227
-
model="azureai",
228
-
custom_llm_provider="custom_openai",
229
-
)
230
-
231
-
print(response.choices[0].message.content)
232
-
```
233
-
234
-
# [REST](#tab/rest)
235
-
236
-
Models deployed in Azure Machine Learning and Azure AI studio in Serverless API endpoints support the Azure AI Model Inference API. Each endpoint exposes the OpenAPI specification for the modalities the model support. Use the **Endpoint URI** and the **Key** to download the OpenAPI definition for the model. In the following example, we download it from a bash console. Replace `<TOKEN>` by the **Key** and `<ENDPOINT_URI>` for the **Endpoint URI**.
Use the **Endpoint URI** and the **Key** to submit requests. The following example sends a request to a Cohere embedding model:
243
-
244
-
```HTTP/1.1
245
-
POST /embeddings?api-version=2024-04-01-preview
246
-
Authorization: Bearer <bearer-token>
247
-
Content-Type: application/json
248
-
```
249
-
250
-
```JSON
251
-
{
252
-
"input": [
253
-
"Explain the theory of strings"
254
-
],
255
-
"input_type": "query",
256
-
"encoding_format": "float",
257
-
"dimensions": 1024
258
-
}
259
-
```
260
-
261
-
__Response__
262
-
263
-
```json
264
-
{
265
-
"id": "ab1c2d34-5678-9efg-hi01-0123456789ea",
266
-
"object": "list",
267
-
"data": [
268
-
{
269
-
"index": 0,
270
-
"object": "embedding",
271
-
"embedding": [
272
-
0.001912117,
273
-
0.048706055,
274
-
-0.06359863,
275
-
//...
276
-
-0.00044369698
277
-
]
278
-
}
279
-
],
280
-
"model": "",
281
-
"usage": {
282
-
"prompt_tokens": 7,
283
-
"completion_tokens": 0,
284
-
"total_tokens": 7
285
-
}
286
-
}
287
-
```
288
-
289
-
290
-
---
428
+
The Azure AI Model Inference API is currently supported in certain models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md) and Managed Online Endpoints. Deploy any of the [supported models](#availability) and use the exact same code to consume their predictions.
0 commit comments