You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/api-management/openai-compatible-llm-api.md
+1-59Lines changed: 1 addition & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ API Management supports two types of language model APIs for this scenario. Choo
39
39
- A self-hosted or non-Azure-provided language model deployment with an API endpoint.
40
40
41
41
42
-
## Import language model API using the portal
42
+
## Import language model API using the portalF
43
43
44
44
When you import the LLM API in the portal, API Management automatically configures:
45
45
@@ -92,62 +92,4 @@ To ensure that your LLM API is working as expected, test it in the API Managemen
92
92
93
93
When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.
94
94
95
-
## Example: Google Gemini
96
-
97
-
You can import an OpenAI-compatible Google Gemini API to access models such as `gemini-2.0-flash`. For these models, Azure API Management can manage an OpenAI-compatible chat completions endpoint.
98
-
99
-
To import an OpenAI-compatible Gemini model:
100
-
101
-
1. Create an API key for the Gemini API at [Google AI Studio](https://aistudio.google.com/apikey) and store it in a safe location.
102
-
1. Note the following base URL from the [Gemini OpenAI compatibility documentation](https://ai.google.dev/gemini-api/docs/openai).
1. In the [Azure portal](https://portal.azure.com), navigate to your API Management instance.
107
-
1. In the left menu, under **APIs**, select **APIs** > **+ Add API**.
108
-
1. Under **Define a new API**, select **Language Model API**.
109
-
1. On the **Configure API** tab:
110
-
1. Enter a **Display name** and optional **Description** for the API.
111
-
1. In **URL**, enter the following base URL that you copied previously: `https://generativelanguage.googleapis.com/v1beta/openai`
112
-
113
-
1. In **Path**, append a path that your API Management instance uses to route requests to the Gemini API endpoints.
114
-
1. In **Type**, select **Create OpenAI API**.
115
-
1. In **Access key**, enter the following:
116
-
1.**Header name**: *Authorization*.
117
-
1.**Header value (key)**: `Bearer` followed by the API key for the Gemini API that you created previously.
118
-
1. On the remaining tabs, optionally configure policies to manage token consumption, semantic caching, and AI content safety.
119
-
1. Select **Create**.
120
-
121
-
### Test Gemini model
122
-
123
-
After importing the API, you can test the chat completions endpoint for the API.
124
-
125
-
1. Select the API you created in the previous step.
126
-
1. Select the **Test** tab.
127
-
1. Select the `POST Creates a model response for the given chat conversation` operation, which is a `POST` request to the `/chat/completions` endpoint.
128
-
1. In the **Request body** section, enter the following JSON to specify the model and an example prompt. In this example, the `gemini-2.0-flash` model is used.
129
-
130
-
```json
131
-
{
132
-
"model": "gemini-2.0-flash",
133
-
"messages": [
134
-
{
135
-
"role": "system",
136
-
"content": "You are a helpful assistant"
137
-
},
138
-
{
139
-
"role": "user",
140
-
"content": "How are you?"
141
-
}
142
-
],
143
-
"max_tokens": 50
144
-
}
145
-
```
146
-
147
-
When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.
148
-
149
-
:::image type="content" source="media/openai-compatible-llm-api/gemini-test.png" alt-text="Screenshot of testing a Gemini LLM API in the portal.":::
0 commit comments