Skip to content

Commit 486d999

Browse files
authored
Update reference-model-inference-api.md
1 parent bb23133 commit 486d999

File tree

1 file changed

+210
-91
lines changed

1 file changed

+210
-91
lines changed

articles/ai-studio/reference/reference-model-inference-api.md

Lines changed: 210 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,13 @@ Models deployed to [serverless API endpoints](../how-to/deploy-models-serverless
4949
> * [Mistral-Large](../how-to/deploy-models-mistral.md)
5050
> * [Phi-3](../how-to/deploy-models-phi-3.md) family of models
5151
52+
Models deployed to [managed inference](../concepts/deployments-overview.md):
53+
54+
> [!div class="checklist"]
55+
> * [Meta Llama 3 instruct](../how-to/deploy-models-llama.md) family of models
56+
> * [Phi-3](../how-to/deploy-models-phi-3.md) family of models
57+
> * Mixtral famility of models
58+
5259
The API is compatible with Azure OpenAI model deployments.
5360

5461
## Capabilities
@@ -66,6 +73,50 @@ The API indicates how developers can consume predictions for the following modal
6673
* [Image embeddings](reference-model-inference-images-embeddings.md): Creates an embedding vector representing the input text and image.
6774

6875

76+
### Inference SDK support
77+
78+
You can use streamlined inference clients in the language of your choice to consume predictions from models running the API.
79+
80+
# [REST](#tab/python)
81+
82+
```python
83+
import os
84+
from azure.ai.inference import ChatCompletionsClient
85+
from azure.core.credentials import AzureKeyCredential
86+
87+
model = ChatCompletionsClient(
88+
endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
89+
credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
90+
)
91+
```
92+
93+
# [REST](#tab/javascript)
94+
95+
```javascript
96+
import ModelClient from "@azure-rest/ai-inference";
97+
import { isUnexpected } from "@azure-rest/ai-inference";
98+
import { AzureKeyCredential } from "@azure/core-auth";
99+
100+
const client = new ModelClient(
101+
process.env.AZUREAI_ENDPOINT_URL,
102+
new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
103+
);
104+
```
105+
106+
# [REST](#tab/rest)
107+
108+
Use the reference section to explore the API design and which parameters are available. For example, the reference section for [Chat completions](reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions:
109+
110+
__Request__
111+
112+
```HTTP/1.1
113+
POST /chat/completions?api-version=2024-04-01-preview
114+
Authorization: Bearer <bearer-token>
115+
Content-Type: application/json
116+
```
117+
118+
---
119+
69120
### Extensibility
70121

71122
The Azure AI Model Inference API specifies a set of modalities and parameters that models can subscribe to. However, some models may have further capabilities that the ones the API indicates. On those cases, the API allows the developer to pass them as extra parameters in the payload.
@@ -74,6 +125,38 @@ By setting a header `extra-parameters: allow`, the API will attempt to pass any
74125

75126
The following example shows a request passing the parameter `safe_prompt` supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API:
76127

128+
# [REST](#tab/python)
129+
130+
```python
131+
response = model.complete(
132+
messages=[
133+
SystemMessage(content="You are a helpful assistant."),
134+
UserMessage(content="How many languages are in the world?"),
135+
],
136+
model_extras={
137+
"safe_mode": True
138+
}
139+
)
140+
```
141+
142+
# [REST](#tab/javascript)
143+
144+
```javascript
145+
var messages = [
146+
{ role: "system", content: "You are a helpful assistant" },
147+
{ role: "user", content: "How many languages are in the world?" },
148+
];
149+
150+
var response = await client.path("/chat/completions").post({
151+
body: {
152+
messages: messages,
153+
safe_mode: true
154+
}
155+
});
156+
```
157+
158+
# [REST](#tab/rest)
159+
77160
__Request__
78161

79162
```HTTP/1.1
@@ -102,6 +185,8 @@ extra-parameters: allow
102185
}
103186
```
104187

188+
---
189+
105190
> [!TIP]
106191
> Alternatively, you can set `extra-parameters: drop` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
107192
@@ -111,6 +196,71 @@ The Azure AI Model Inference API indicates a general set of capabilities but eac
111196

112197
The following example shows the response for a chat completion request indicating the parameter `reponse_format` and asking for a reply in `JSON` format. In the example, since the model doesn't support such capability an error 422 is returned to the user.
113198

199+
# [REST](#tab/python)
200+
201+
```python
202+
from azure.ai.inference.models import ChatCompletionsResponseFormat
203+
from azure.core.exceptions import HttpResponseError
204+
import json
205+
206+
try:
207+
response = model.complete(
208+
messages=[
209+
SystemMessage(content="You are a helpful assistant."),
210+
UserMessage(content="How many languages are in the world?"),
211+
],
212+
response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT }
213+
)
214+
except HttpResponseError as ex:
215+
if ex.status_code == 422:
216+
response = json.loads(ex.response._content.decode('utf-8'))
217+
if isinstance(response, dict) and "detail" in response:
218+
for offending in response["detail"]:
219+
param = ".".join(offending["loc"])
220+
value = offending["input"]
221+
print(
222+
f"Looks like the model doesn't support the parameter '{param}' with value '{value}'"
223+
)
224+
else:
225+
raise ex
226+
```
227+
228+
# [REST](#tab/python)
229+
230+
```javascript
231+
try {
232+
var messages = [
233+
{ role: "system", content: "You are a helpful assistant" },
234+
{ role: "user", content: "How many languages are in the world?" },
235+
];
236+
237+
var response = await client.path("/chat/completions").post({
238+
body: {
239+
messages: messages,
240+
response_format: { type: "json_object" }
241+
}
242+
});
243+
}
244+
catch (error) {
245+
if (error.status_code == 422) {
246+
var response = JSON.parse(error.response._content)
247+
if (response.detail) {
248+
for (const offending of response.detail) {
249+
var param = offending.loc.join(".")
250+
var value = offending.input
251+
console.log(`Looks like the model doesn't support the parameter '${param}' with value '${value}'`)
252+
}
253+
}
254+
}
255+
else
256+
{
257+
throw error
258+
}
259+
}
260+
```
261+
262+
# [REST](#tab/rest)
263+
114264
__Request__
115265

116266
```HTTP/1.1
@@ -150,6 +300,7 @@ __Response__
150300
"message": "One of the parameters contain invalid values."
151301
}
152302
```
303+
---
153304

154305
> [!TIP]
155306
> You can inspect the property `details.loc` to understand the location of the offending parameter and `details.input` to see the value that was passed in the request.
@@ -160,6 +311,65 @@ The Azure AI model inference API supports [Azure AI Content Safety](../concepts/
160311

161312
The following example shows the response for a chat completion request that has triggered content safety.
162313

314+
# [REST](#tab/python)
315+
316+
```python
317+
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
318+
319+
try:
320+
response = model.complete(
321+
messages=[
322+
SystemMessage(content="You are an AI assistant that helps people find information."),
323+
UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
324+
]
325+
)
326+
327+
print(response.choices[0].message.content)
328+
329+
except HttpResponseError as ex:
330+
if ex.status_code == 400:
331+
response = json.loads(ex.response._content.decode('utf-8'))
332+
if isinstance(response, dict) and "error" in response:
333+
print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
334+
else:
335+
raise ex
336+
else:
337+
raise ex
338+
```
339+
340+
# [REST](#tab/javascript)
341+
342+
```javascript
343+
try {
344+
var messages = [
345+
{ role: "system", content: "You are an AI assistant that helps people find information." },
346+
{ role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
347+
]
348+
349+
var response = await client.path("/chat/completions").post({
350+
body: {
351+
messages: messages,
352+
}
353+
});
354+
355+
console.log(response.body.choices[0].message.content)
356+
}
357+
catch (error) {
358+
if (error.status_code == 400) {
359+
var response = JSON.parse(error.response._content)
360+
if (response.error) {
361+
console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`)
362+
}
363+
else
364+
{
365+
throw error
366+
}
367+
}
368+
}
369+
```
370+
371+
# [REST](#tab/rest)
372+
163373
__Request__
164374

165375
```HTTP/1.1
@@ -196,95 +406,4 @@ __Response__
196406
"type": null
197407
}
198408
```
199-
200-
## Getting started
201-
202-
The Azure AI Model Inference API is currently supported in models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md). Deploy any of the [supported models](#availability) to a new [Serverless API endpoints](../how-to/deploy-models-serverless.md) to get started. Then you can consume the API in the following ways:
203-
204-
# [Studio](#tab/azure-studio)
205-
206-
You can use the Azure AI Model Inference API to run evaluations or while building with *Prompt flow*. Create a [Serverless Model connection](../how-to/deploy-models-serverless-connect.md) to a *Serverless API endpoint* and consume its predictions. The Azure AI Model Inference API is used under the hood.
207-
208-
# [Python](#tab/python)
209-
210-
Since the API is OpenAI-compatible, you can use any supported SDK that already supports Azure OpenAI. In the following example, we show how you can use LiteLLM with the common API:
211-
212-
```python
213-
import litellm
214-
215-
client = litellm.LiteLLM(
216-
base_url="https://<endpoint-name>.<region>.inference.ai.azure.com",
217-
api_key="<key>",
218-
)
219-
220-
response = client.chat.completions.create(
221-
messages=[
222-
{
223-
"content": "Who is the most renowned French painter?",
224-
"role": "user"
225-
}
226-
],
227-
model="azureai",
228-
custom_llm_provider="custom_openai",
229-
)
230-
231-
print(response.choices[0].message.content)
232-
```
233-
234-
# [REST](#tab/rest)
235-
236-
Models deployed in Azure Machine Learning and Azure AI studio in Serverless API endpoints support the Azure AI Model Inference API. Each endpoint exposes the OpenAPI specification for the modalities the model support. Use the **Endpoint URI** and the **Key** to download the OpenAPI definition for the model. In the following example, we download it from a bash console. Replace `<TOKEN>` by the **Key** and `<ENDPOINT_URI>` for the **Endpoint URI**.
237-
238-
```bash
239-
wget -d --header="Authorization: Bearer <TOKEN>" <ENDPOINT_URI>/swagger.json
240-
```
241-
242-
Use the **Endpoint URI** and the **Key** to submit requests. The following example sends a request to a Cohere embedding model:
243-
244-
```HTTP/1.1
245-
POST /embeddings?api-version=2024-04-01-preview
246-
Authorization: Bearer <bearer-token>
247-
Content-Type: application/json
248-
```
249-
250-
```JSON
251-
{
252-
"input": [
253-
"Explain the theory of strings"
254-
],
255-
"input_type": "query",
256-
"encoding_format": "float",
257-
"dimensions": 1024
258-
}
259-
```
260-
261-
__Response__
262-
263-
```json
264-
{
265-
"id": "ab1c2d34-5678-9efg-hi01-0123456789ea",
266-
"object": "list",
267-
"data": [
268-
{
269-
"index": 0,
270-
"object": "embedding",
271-
"embedding": [
272-
0.001912117,
273-
0.048706055,
274-
-0.06359863,
275-
//...
276-
-0.00044369698
277-
]
278-
}
279-
],
280-
"model": "",
281-
"usage": {
282-
"prompt_tokens": 7,
283-
"completion_tokens": 0,
284-
"total_tokens": 7
285-
}
286-
}
287-
```
288-
289-
290409
---

0 commit comments

Comments
 (0)