You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/api-management/openai-compatible-llm-api.md
+53-1Lines changed: 53 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,14 +24,20 @@ Learn more about managing AI APIs in API Management:
24
24
25
25
API Management supports two types of language model APIs for this scenario. Choose the option suitable for your model deployment. The option determines how clients call the API and how the API Management instance routes requests to the AI service.
26
26
27
-
***OpenAI-compatible** - Language model endpoints that are compatible with OpenAI's API. Examples include certain models exposed by inference providers such as [Hugging Face Text Generation Inference (TGI)](https://huggingface.co/docs/text-generation-inference/en/index).
27
+
***OpenAI-compatible** - Language model endpoints that are compatible with OpenAI's API. Examples include certain models exposed by inference providers such as [Hugging Face Text Generation Inference (TGI)](https://huggingface.co/docs/text-generation-inference/en/index) and [Google Gemini API](https://ai.google.dev/gemini-api/docs).
28
28
29
29
API Management configures an OpenAI-compatible chat completions endpoint.
30
30
31
31
***Passthrough** - Other language model endpoints that aren't compatible with OpenAI's API. Examples include models deployed in [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) or other providers.
32
32
33
33
API Management configures wildcard operations for common HTTP verbs. Clients can append paths to the wildcard operations, and API Management passes requests to the backend.
34
34
35
+
When you import the API, API Management automatically configures:
36
+
37
+
* A [backend](backends.md) resource and a [set-backend-service](set-backend-service-policy.md) policy that direct API requests to the LLM endpoint.
38
+
* (optionally) Access to the LLM backend using an access key you provide. The key is protected as a secret [named value](api-management-howto-properties.md) in API Management.
39
+
* (optionally) Policies to help you monitor and manage the Azure OpenAI API.
40
+
35
41
## Prerequisites
36
42
37
43
- An existing API Management instance. [Create one if you haven't already](get-started-create-service-instance.md).
@@ -70,6 +76,8 @@ To import a language model API to API Management:
70
76
1. Select **Review**.
71
77
1. After settings are validated, select **Create**.
72
78
79
+
API Management creates the API, and configures operations for the LLM endpoints. By default, the API requires an API Management subscription.
80
+
73
81
## Test the LLM API
74
82
75
83
To ensure that your LLM API is working as expected, test it in the API Management test console.
@@ -84,5 +92,49 @@ To ensure that your LLM API is working as expected, test it in the API Managemen
84
92
85
93
When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.
86
94
95
+
## Example: Google Gemini
96
+
97
+
You can import OpenAI-compatible models from Google Gemini such as `gemini-2.0-flash`. Azure API Management can manage an OpenAI-compatible chat completion endpoint for these models.
98
+
99
+
To import an OpenAI-compatible Gemini model:
100
+
101
+
1. Create an API key for the Gemini API at [Google AI Studio](https://aistudio.google.com/apikey) and store it in a safe location.
102
+
1. Note the following base URL from the [Gemini OpenAI compatiblity documentation](https://ai.google.dev/gemini-api/docs/openai).
1. In the [Azure portal](https://portal.azure.com), navigate to your API Management instance.
107
+
1. In the left menu, under **APIs**, select **APIs** > **+ Add API**.
108
+
1. Under **Define a new API**, select **Language Model API**.
109
+
1. On the **Configure API** tab:
110
+
1. Enter a **Display name** and optional **Description** for the API.
111
+
1. In **URL**, enter the following base URL that you copied previously: `https://generativelanguage.googleapis.com/v1beta/openai`
112
+
113
+
1. In **Path**, append a path that your API Management instance uses to access the Gemini API endpoints.
114
+
1. In **Type**, select **Create OpenAI API**.
115
+
1. In **Access key**, enter the following:
116
+
1.**Header name**: *Authorization*.
117
+
1.**Header value (key)**: `Bearer` followed by the API key for the Gemini API that you created previously.
118
+
1. On the remaining tabs, optionally configure policies to manage token consumption, semantic caching, and AI content safety.
119
+
1. Select **Create**.
120
+
121
+
### Test Gemini model
122
+
123
+
After importing the API, you can test it using the test console in the Azure portal. Choose an OpenAI-compatible model and endpoint for the test
124
+
125
+
1. Select the API you created in the previous step.
126
+
1. Select the **Test** tab.
127
+
1. Select the `POST Creates a model response for the given chat conversation` operation, which is a `POST` request to the `/chat/completions` endpoint.
128
+
1. In the **Request body** section, enter the following JSON to specify the model and an example prompt. In this example, the OpenAI-compatible `gemini-2.0-flash` model is used.
129
+
130
+
```json
131
+
{"model":"gpt-4o","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"How are you?"}],"max_tokens":50}
132
+
```
133
+
134
+
When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.
135
+
136
+
:::image type="content" source="media/openai-compatible-llm-api/gemini-test-small.png" lightbox="media/openai-compatible-llm-api/gemini-test.png" alt-text="Screenshot of testing a Gemini LLM API in the portal.":::
0 commit comments