You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/reference/reference-model-inference-api.md
+46-3Lines changed: 46 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ While foundational models excel in specific domains, they lack a uniform set of
32
32
> * Use smaller models that can run faster on specific tasks.
33
33
> * Compose multiple models to develop intelligent experiences.
34
34
35
-
Having a uniform way to consume foundational models allow developers to realize all those benefits without changing a single line of code on their applications.
35
+
Having a uniform way to consume foundational models allow developers to realize all those benefits without sacrificing portability or changing the underlying code.
36
36
37
37
## Availability
38
38
@@ -43,8 +43,8 @@ Models deployed to [serverless API endpoints](../how-to/deploy-models-serverless
43
43
> [!div class="checklist"]
44
44
> *[Cohere Embed V3](../how-to/deploy-models-cohere-embed.md) family of models
45
45
> *[Cohere Command R](../how-to/deploy-models-cohere-command.md) family of models
46
-
> *[Meta Llama 2](../how-to/deploy-models-llama.md) family of models
47
-
> *[Meta Llama 3](../how-to/deploy-models-llama.md) family of models
46
+
> *[Meta Llama 2 chat](../how-to/deploy-models-llama.md) family of models
47
+
> *[Meta Llama 3 instruct](../how-to/deploy-models-llama.md) family of models
> *[Phi-3](../how-to/deploy-models-phi-3.md) family of models
@@ -154,6 +154,49 @@ __Response__
154
154
> [!TIP]
155
155
> You can inspect the property `details.loc` to understand the location of the offending parameter and `details.input` to see the value that was passed in the request.
156
156
157
+
## Content safety
158
+
159
+
The Azure AI model inference API supports [Azure AI Content Safety](../concepts/content-filtering.md). When using deployments with Azure AI Content Safety on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
160
+
161
+
The following example shows the response for a chat completion request that has triggered content safety.
162
+
163
+
__Request__
164
+
165
+
```HTTP/1.1
166
+
POST /chat/completions?api-version=2024-04-01-preview
167
+
Authorization: Bearer <bearer-token>
168
+
Content-Type: application/json
169
+
```
170
+
171
+
```JSON
172
+
{
173
+
"messages": [
174
+
{
175
+
"role": "system",
176
+
"content": "You are a helpful assistant"
177
+
},
178
+
{
179
+
"role": "user",
180
+
"content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
181
+
}
182
+
],
183
+
"temperature": 0,
184
+
"top_p": 1,
185
+
}
186
+
```
187
+
188
+
__Response__
189
+
190
+
```JSON
191
+
{
192
+
"status": 400,
193
+
"code": "content_filter",
194
+
"message": "The response was filtered",
195
+
"param": "messages",
196
+
"type": null
197
+
}
198
+
```
199
+
157
200
## Getting started
158
201
159
202
The Azure AI Model Inference API is currently supported in models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md). Deploy any of the [supported models](#availability) to a new [Serverless API endpoints](../how-to/deploy-models-serverless.md) to get started. Then you can consume the API in the following ways:
Copy file name to clipboardExpand all lines: articles/ai-studio/reference/reference-model-inference-chat-completions.md
+31-14Lines changed: 31 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,14 @@ POST /chat/completions?api-version=2024-04-01-preview
30
30
| --- | --- | --- | --- | --- |
31
31
| api-version | query | True | string | The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview". |
32
32
33
+
## Request Header
34
+
35
+
36
+
| Name | Required | Type | Description |
37
+
| --- | --- | --- | --- |
38
+
| extra-parameters || string | The behavior of the API when extra parameters are indicated in the payload. Using `allow` makes the API to pass the parameter to the underlying model. Use this value when you want to pass parameters that you know the underlying model can support. Using `drop` makes the API to drop any unsupported parameter. Use this value when you need to use the same payload across different models, but one of the extra parameters may make a model to error out if not supported. Using `error` makes the API to reject any extra parameter in the payload. Only parameters specified in this API can be indicated, or a 400 error is returned. |
39
+
| azureml-model-deployment || string | Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments. |
40
+
33
41
## Request Body
34
42
35
43
| Name | Required | Type | Description |
@@ -113,7 +121,7 @@ POST /chat/completions?api-version=2024-04-01-preview
113
121
"stream": false,
114
122
"temperature": 0,
115
123
"top_p": 1,
116
-
"response_format": "text"
124
+
"response_format": { "type": "text" }
117
125
}
118
126
```
119
127
@@ -157,7 +165,8 @@ Status code: 200
157
165
|[ChatCompletionFinishReason](#chatcompletionfinishreason)| The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
|[ChatCompletionResponseFormat](#chatcompletionresponseformat)| The response format for the model response. Setting to `json_object` enables JSON mode, which guarantees the message the model generates is valid JSON. When using JSON mode, you **must** also instruct the model to produce JSON yourself via a system or user message. Also note that the message content may be partially cut off if `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
169
+
|[ChatCompletionResponseFormatType](#chatcompletionresponseformattype)| The response format type. |
161
170
|[ChatCompletionResponseMessage](#chatcompletionresponsemessage)| A chat completion message generated by the model. |
162
171
|[ChatCompletionTool](#chatcompletiontool)||
163
172
|[ChatMessageRole](#chatmessagerole)| The role of the author of this message. |
@@ -166,15 +175,15 @@ Status code: 200
166
175
|[ContentFilterError](#contentfiltererror)| The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
|[CreateChatCompletionResponse](#createchatcompletionresponse)| Represents a chat completion response returned by model, based on the provided input. |
169
-
|[Detail](#detail)||
178
+
|[Detail](#detail)|Details for the [UnprocessableContentError](#unprocessablecontenterror) error.|
170
179
|[Function](#function)| The function that the model called. |
171
-
|[FunctionObject](#functionobject)||
180
+
|[FunctionObject](#functionobject)|Definition of a function the model has access to.|
172
181
|[ImageDetail](#imagedetail)| Specifies the detail level of the image. |
173
-
|[NotFoundError](#notfounderror)||
182
+
|[NotFoundError](#notfounderror)|The route is not valid for the deployed model.|
174
183
|[ToolType](#tooltype)| The type of the tool. Currently, only `function` is supported. |
|[TooManyRequestsError](#toomanyrequestserror)|You have hit your assigned rate limit and your requests need to be paced. |
185
+
|[UnauthorizedError](#unauthorizederror)|Authentication is missing or invalid.|
186
+
|[UnprocessableContentError](#unprocessablecontenterror)|The request contains unprocessable content. The error is returned when the payload indicated is valid according to this specification. However, some of the instructions indicated in the payload are not supported by the underlying model. Use the `details` section to understand the offending parameter.|
178
187
179
188
180
189
### ChatCompletionFinishReason
@@ -209,6 +218,15 @@ The object type, which is always `chat.completion`.
209
218
210
219
### ChatCompletionResponseFormat
211
220
221
+
The response format for the model response. Setting to `json_object` enables JSON mode, which guarantees the message the model generates is valid JSON. When using JSON mode, you **must** also instruct the model to produce JSON yourself via a system or user message. Also note that the message content may be partially cut off if `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length.
222
+
223
+
| Name | Type | Description |
224
+
| --- | --- | --- |
225
+
| type |[ChatCompletionResponseFormatType](#chatcompletionresponseformattype)| The response format type. |
226
+
227
+
### ChatCompletionResponseFormatType
228
+
229
+
The response format type.
212
230
213
231
| Name | Type | Description |
214
232
| --- | --- | --- |
@@ -237,7 +255,6 @@ A chat completion message generated by the model.
237
255
238
256
The role of the author of this message.
239
257
240
-
241
258
| Name | Type | Description |
242
259
| --- | --- | --- |
243
260
| assistant | string ||
@@ -249,7 +266,6 @@ The role of the author of this message.
249
266
250
267
A list of chat completion choices. Can be more than one if `n` is greater than 1.
251
268
252
-
253
269
| Name | Type | Description |
254
270
| --- | --- | --- |
255
271
| finish\_reason |[ChatCompletionFinishReason](#chatcompletionfinishreason)| The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
@@ -282,7 +298,6 @@ The API call fails when the prompt triggers a content filter as configured. Modi
282
298
283
299
### CreateChatCompletionRequest
284
300
285
-
286
301
| Name | Type | Default Value | Description |
287
302
| --- | --- | --- | --- |
288
303
| frequency\_penalty | number | 0 | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
@@ -348,7 +363,6 @@ Specifies the detail level of the image.
348
363
349
364
Represents a chat completion response returned by model, based on the provided input.
350
365
351
-
352
366
| Name | Type | Description |
353
367
| --- | --- | --- |
354
368
| choices |[Choices](#choices)\[\]| A list of chat completion choices. Can be more than one if `n` is greater than 1. |
@@ -361,6 +375,7 @@ Represents a chat completion response returned by model, based on the provided i
361
375
362
376
### Detail
363
377
378
+
Details for the [UnprocessableContentError](#unprocessablecontenterror) error.
364
379
365
380
| Name | Type | Description |
366
381
| --- | --- | --- |
@@ -371,14 +386,14 @@ Represents a chat completion response returned by model, based on the provided i
371
386
372
387
The function that the model called.
373
388
374
-
375
389
| Name | Type | Description |
376
390
| --- | --- | --- |
377
391
| arguments | string | The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may generate incorrect parameters not defined by your function schema. Validate the arguments in your code before calling your function. |
378
392
| name | string | The name of the function to call. |
379
393
380
394
### FunctionObject
381
395
396
+
Definition of a function the model has access to.
382
397
383
398
| Name | Type | Description |
384
399
| --- | --- | --- |
@@ -407,6 +422,7 @@ The type of the tool. Currently, only `function` is supported.
407
422
### TooManyRequestsError
408
423
409
424
425
+
410
426
| Name | Type | Description |
411
427
| --- | --- | --- |
412
428
| error | string | The error description. |
@@ -424,11 +440,12 @@ The type of the tool. Currently, only `function` is supported.
424
440
425
441
### UnprocessableContentError
426
442
443
+
The request contains unprocessable content. The error is returned when the payload indicated is valid according to this specification. However, some of the instructions indicated in the payload are not supported by the underlying model. Use the `details` section to understand the offending parameter.
0 commit comments