Skip to content

Commit f89d746

Browse files
authored
Update reference-model-inference-api.md
1 parent 486d999 commit f89d746

File tree

1 file changed

+29
-10
lines changed

1 file changed

+29
-10
lines changed

articles/ai-studio/reference/reference-model-inference-api.md

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,17 @@ The API indicates how developers can consume predictions for the following modal
7575

7676
### Inference SDK support
7777

78-
You can use streamlined inference clients in the language of your choice to consume predictions from models running the API.
78+
You can use streamlined inference clients in the language of your choice to consume predictions from models running the Azure AI model inference API.
7979

80-
# [REST](#tab/python)
80+
# [Python](#tab/python)
81+
82+
Install the package `azure-ai-inference` using your package manager, like pip:
83+
84+
```bash
85+
pip install azure-ai-inference
86+
```
87+
88+
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
8189

8290
```python
8391
import os
@@ -90,7 +98,15 @@ model = ChatCompletionsClient(
9098
)
9199
```
92100

93-
# [REST](#tab/javascript)
101+
# [JavaScript](#tab/javascript)
102+
103+
Install the package `@azure-rest/ai-inference` using npm:
104+
105+
```bash
106+
npm install @azure-rest/ai-inference
107+
```
108+
109+
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
94110

95111
```javascript
96112
import ModelClient from "@azure-rest/ai-inference";
@@ -114,7 +130,6 @@ POST /chat/completions?api-version=2024-04-01-preview
114130
Authorization: Bearer <bearer-token>
115131
Content-Type: application/json
116132
```
117-
118133
---
119134

120135
### Extensibility
@@ -125,7 +140,7 @@ By setting a header `extra-parameters: allow`, the API will attempt to pass any
125140

126141
The following example shows a request passing the parameter `safe_prompt` supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API:
127142

128-
# [REST](#tab/python)
143+
# [Python](#tab/python)
129144

130145
```python
131146
response = model.complete(
@@ -139,7 +154,7 @@ response = model.complete(
139154
)
140155
```
141156

142-
# [REST](#tab/javascript)
157+
# [JavaScript](#tab/javascript)
143158

144159
```javascript
145160
var messages = [
@@ -196,7 +211,7 @@ The Azure AI Model Inference API indicates a general set of capabilities but eac
196211

197212
The following example shows the response for a chat completion request indicating the parameter `reponse_format` and asking for a reply in `JSON` format. In the example, since the model doesn't support such capability an error 422 is returned to the user.
198213

199-
# [REST](#tab/python)
214+
# [Python](#tab/python)
200215

201216
```python
202217
from azure.ai.inference.models import ChatCompletionsResponseFormat
@@ -225,7 +240,7 @@ except HttpResponseError as ex:
225240
raise ex
226241
```
227242

228-
# [REST](#tab/python)
243+
# [JavaScript](#tab/python)
229244

230245
```javascript
231246
try {
@@ -311,7 +326,7 @@ The Azure AI model inference API supports [Azure AI Content Safety](../concepts/
311326

312327
The following example shows the response for a chat completion request that has triggered content safety.
313328

314-
# [REST](#tab/python)
329+
# [Python](#tab/python)
315330

316331
```python
317332
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
@@ -337,7 +352,7 @@ except HttpResponseError as ex:
337352
raise ex
338353
```
339354

340-
# [REST](#tab/javascript)
355+
# [JavaScript](#tab/javascript)
341356

342357
```javascript
343358
try {
@@ -407,3 +422,7 @@ __Response__
407422
}
408423
```
409424
---
425+
426+
## Getting started
427+
428+
The Azure AI Model Inference API is currently supported in certain models deployed as [Serverless API endpoints](../how-to/deploy-models-serverless.md) and Managed Online Endpoints. Deploy any of the [supported models](#availability) and use the exact same code to consume their predictions.

0 commit comments

Comments
 (0)