|
| 1 | +--- |
| 2 | +title: How to deploy Mistral family of models with Azure Machine Learning studio |
| 3 | +titleSuffix: Azure Machine Learning |
| 4 | +description: Learn how to deploy Mistral-large with Azure Machine Learning studio. |
| 5 | +manager: scottpolly |
| 6 | +ms.service: machine-learning |
| 7 | +ms.subservice: inferencing |
| 8 | +ms.topic: how-to |
| 9 | +ms.date: 02/23/2024 |
| 10 | +ms.reviewer: shubhiraj |
| 11 | +reviewer: shubhirajMsft |
| 12 | +ms.author: mopeakande |
| 13 | +author: msakande |
| 14 | +ms.custom: [references_regions] |
| 15 | + |
| 16 | +#This functionality is also available in Azure AI Studio: /azure/ai-studio/how-to/deploy-models-mistral.md |
| 17 | +--- |
| 18 | +# How to deploy Mistral models with Azure Machine Learning studio |
| 19 | +Mistral AI offers two categories of models in Azure Machine Learning studio: |
| 20 | + |
| 21 | +- Premium models: Mistral-large. These models are available with pay-as-you-go token based billing with Models as a Service in the studio model catalog. |
| 22 | +- Open models: Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01. These models are also available in the Azure Machine Learning studio model catalog and can be deployed to dedicated VM instances in your own Azure subscription with managed online endpoints. |
| 23 | + |
| 24 | +You can browse the Mistral family of models in the model catalog by filtering on the Mistral collection. |
| 25 | + |
| 26 | +## Mistral-large |
| 27 | + |
| 28 | +In this article, you learn how to use Azure Machine Learning studio to deploy the Mistral-large model as a service with pay-as you go billing. |
| 29 | + |
| 30 | +Mistral-large is Mistral AI's most advanced Large Language Model (LLM). It can be used on any language-based task thanks to its state-of-the-art reasoning and knowledge capabilities. |
| 31 | + |
| 32 | +Additionally, mistral-large is: |
| 33 | + |
| 34 | +- Straight-to-the-point. Purposely trained to eliminate unnecessary verbosity and generate concise outputs. |
| 35 | +- Specialized in RAG. Crucial information is not lost in the middle of long context windows (up to 32K tokens). |
| 36 | +- Strong in coding. Code generation, review, and comments. Can output results as JSON and do function calling. |
| 37 | +- Multi-lingual by design. Best-in-class performance in French, German, Spanish, and Italian - in addition to English. Dozens of other languages are supported. |
| 38 | + |
| 39 | +[!INCLUDE [machine-learning-preview-generic-disclaimer](includes/machine-learning-preview-generic-disclaimer.md)] |
| 40 | + |
| 41 | +## Deploy Mistral-large with pay-as-you-go |
| 42 | + |
| 43 | +Certain models in the model catalog can be deployed as a service with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription. |
| 44 | + |
| 45 | +Mistral-large can be deployed as a service with pay-as-you-go, and is offered by Mistral AI through the Microsoft Azure Marketplace. Please note that Mistral AI can change or update the terms of use and pricing of this model. |
| 46 | + |
| 47 | +### Azure Marketplace model offerings |
| 48 | + |
| 49 | +The following models are available in Azure Marketplace for Mistral AI when deployed as a service with pay-as-you-go: |
| 50 | + |
| 51 | +* Mistral-large (preview) |
| 52 | + |
| 53 | +### Prerequisites |
| 54 | + |
| 55 | +- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin. |
| 56 | +- An Azure Machine Learning workspace and a compute instance. If you don't have these, use the steps in the [Quickstart: Create workspace resources](quickstart-create-resources.md) article to create them. |
| 57 | + |
| 58 | + > [!IMPORTANT] |
| 59 | + > Pay-as-you-go model deployment offering is only available in workspaces created in **East US 2** and **France Central** regions. |
| 60 | +
|
| 61 | +- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions: |
| 62 | + |
| 63 | + - On the Azure subscription—to subscribe the workspace to the Azure Marketplace offering, once for each workspace, per offering: |
| 64 | + - `Microsoft.MarketplaceOrdering/agreements/offers/plans/read` |
| 65 | + - `Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action` |
| 66 | + - `Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read` |
| 67 | + - `Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read` |
| 68 | + - `Microsoft.SaaS/register/action` |
| 69 | + |
| 70 | + - On the resource group—to create and use the SaaS resource: |
| 71 | + - `Microsoft.SaaS/resources/read` |
| 72 | + - `Microsoft.SaaS/resources/write` |
| 73 | + |
| 74 | + - On the workspace—to deploy endpoints (the Azure AI Developer role contains these permissions already): |
| 75 | + - `Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*` |
| 76 | + - `Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/*` |
| 77 | + |
| 78 | + For more information on permissions, see [Manage access to an Azure Machine Learning workspace](how-to-assign-roles.md). |
| 79 | + |
| 80 | +### Create a new deployment |
| 81 | + |
| 82 | +To create a deployment: |
| 83 | + |
| 84 | +1. Go to [Azure Machine Learning studio](https://ml.azure.com/home). |
| 85 | +1. Select the workspace in which you want to deploy your models. To use the pay-as-you-go model deployment offering, your workspace must belong to the **East US 2** or **France Central** region. |
| 86 | +1. Choose the model (Mistral-large) you want to deploy from the [model catalog](https://ml.azure.com/model/catalog). |
| 87 | + |
| 88 | + Alternatively, you can initiate deployment by going to your workspace and selecting **Endpoints** > **Serverless endpoints** > **Create**. |
| 89 | + |
| 90 | +1. On the model's overview page in the model catalog, select **Deploy** and then **Pay-as-you-go**. |
| 91 | + |
| 92 | + :::image type="content" source="media/how-to-deploy-models-mistral/mistral-deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the pay-as-you-go option." lightbox="media/how-to-deploy-models-mistral/mistral-deploy-pay-as-you-go.png"::: |
| 93 | + |
| 94 | +1. In the deployment wizard, select the link to **Azure Marketplace Terms** to learn more about the terms of use. |
| 95 | +1. You can also select the **Marketplace offer details** tab to learn about pricing for the selected model. |
| 96 | +1. If this is your first time deploying the model in the workspace, you have to subscribe your workspace for the particular offering (for example, Mistral-large). This step requires that your account has the Azure subscription permissions and resource group permissions listed in the prerequisites. Each workspace has its own subscription to the particular Azure Marketplace offering, which allows you to control and monitor spending. Select **Subscribe and Deploy**. Currently you can have only one deployment for each model within a workspace. |
| 97 | + |
| 98 | + > [!NOTE] |
| 99 | + > Subscribing a workspace to a particular Azure Marketplace offering (in this case, Mistral-large) requires that your account has **Contributor** or **Owner** access at the subscription level where the project is created. Alternatively, your user account can be assigned a custom role that has the Azure subscription permissions and resource group permissions listed in the [prerequisites](#prerequisites). |
| 100 | +
|
| 101 | + :::image type="content" source="media/how-to-deploy-models-mistral/mistral-deploy-marketplace-terms.png" alt-text="A screenshot showing the terms and conditions of a given model." lightbox="media/how-to-deploy-models-mistral/mistral-deploy-marketplace-terms.png"::: |
| 102 | + |
| 103 | +1. Once you subscribe the workspace for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ workspace don't require subscribing again. Therefore, you don't need to have the subscription-level permissions for subsequent deployments. If this scenario applies to you, you will see a **Continue to deploy** option to select. |
| 104 | + |
| 105 | + :::image type="content" source="media/how-to-deploy-models-mistral/mistral-deploy-pay-as-you-go-project.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="media/how-to-deploy-models-mistral/mistral-deploy-pay-as-you-go-project.png"::: |
| 106 | + |
| 107 | +1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. |
| 108 | + |
| 109 | + :::image type="content" source="media/how-to-deploy-models-mistral/mistral-deployment-name.png" alt-text="A screenshot showing how to indicate the name of the deployment you want to create." lightbox="media/how-to-deploy-models-mistral/mistral-deployment-name.png"::: |
| 110 | + |
| 111 | +1. Select **Deploy**. Wait until the deployment is finished and you're redirected to the serverless endpoints page. |
| 112 | +1. Select the endpoint to open its Details page. |
| 113 | +1. Select the **Test** tab to start interacting with the model. |
| 114 | +1. You can also take note of the **Target** URL and the **Secret Key** to call the deployment and generate chat completions. |
| 115 | +1. You can always find the endpoint's details, URL, and access keys by navigating to **Workspace** > **Endpoints** > **Serverless endpoints**. |
| 116 | + |
| 117 | +To learn about billing for Mistral models deployed with pay-as-you-go, see [Cost and quota considerations for Mistral models deployed as a service](#cost-and-quota-considerations-for-mistral-large-deployed-as-a-service). |
| 118 | + |
| 119 | +### Consume the Mistral-large model as a service |
| 120 | + |
| 121 | +Mistral-large can be consumed using the chat API. |
| 122 | + |
| 123 | +1. In the **workspace**, select **Endpoints** > **Serverless endpoints**. |
| 124 | +1. Find and select the deployment you created. |
| 125 | +1. Copy the **Target** URL and the **Key** token values. |
| 126 | +1. Make an API request using the [`<target_url>/v1/chat/completions`](#chat-api) API. |
| 127 | + |
| 128 | + For more information on using the APIs, see the [reference](#reference-for-mistral-large-deployed-as-a-service) section. |
| 129 | + |
| 130 | +### Reference for Mistral large deployed as a service |
| 131 | + |
| 132 | +#### Chat API |
| 133 | + |
| 134 | +Use the method `POST` to send the request to the `/v1/chat/completions` route: |
| 135 | + |
| 136 | +__Request__ |
| 137 | + |
| 138 | +```rest |
| 139 | +POST /v1/chat/completions HTTP/1.1 |
| 140 | +Host: <DEPLOYMENT_URI> |
| 141 | +Authorization: Bearer <TOKEN> |
| 142 | +Content-type: application/json |
| 143 | +``` |
| 144 | + |
| 145 | +#### Request schema |
| 146 | + |
| 147 | +Payload is a JSON formatted string containing the following parameters: |
| 148 | + |
| 149 | +| Key | Type | Default | Description | |
| 150 | +|-----|-----|-----|-----| |
| 151 | +| `messages` | `string` | No default. This value must be specified. | The message or history of messages to use to prompt the model. | |
| 152 | +| `stream` | `boolean` | `False` | Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available. | |
| 153 | +| `max_tokens` | `integer` | `16` | The maximum number of tokens to generate in the completion. The token count of your prompt plus `max_tokens` can't exceed the model's context length. | |
| 154 | +| `top_p` | `float` | `1` | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering `top_p` or `temperature`, but not both. | |
| 155 | +| `temperature` | `float` | `1` | The sampling temperature to use, between 0 and 2. Higher values mean the model samples more broadly the distribution of tokens. Zero means greedy sampling. We recommend altering this or `top_p`, but not both. | |
| 156 | +| `ignore_eos` | `boolean` | `False` | Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. | |
| 157 | +| `safe_prompt` | `boolean` | `False` | Whether to inject a safety prompt before all conversations. | |
| 158 | + |
| 159 | +The `messages` object has the following fields: |
| 160 | + |
| 161 | +| Key | Type | Value | |
| 162 | +|-----------|-----------|------------| |
| 163 | +| `content` | `string` | The contents of the message. Content is required for all messages. | |
| 164 | +| `role` | `string` | The role of the message's author. One of `system`, `user`, or `assistant`. | |
| 165 | + |
| 166 | + |
| 167 | +#### Example |
| 168 | + |
| 169 | +__Body__ |
| 170 | + |
| 171 | +```json |
| 172 | +{ |
| 173 | + "messages": |
| 174 | + [ |
| 175 | + { |
| 176 | + "role": "system", |
| 177 | + "content": "You are a helpful assistant that translates English to Italian." |
| 178 | + }, |
| 179 | + { |
| 180 | + "role": "user", |
| 181 | + "content": "Translate the following sentence from English to Italian: I love programming." |
| 182 | + } |
| 183 | + ], |
| 184 | + "temperature": 0.8, |
| 185 | + "max_tokens": 512, |
| 186 | +} |
| 187 | +``` |
| 188 | + |
| 189 | +#### Response schema |
| 190 | + |
| 191 | +The response payload is a dictionary with the following fields. |
| 192 | + |
| 193 | +| Key | Type | Description | |
| 194 | +|-----------|-----------|----------------------------------------------------------------------------| |
| 195 | +| `id` | `string` | A unique identifier for the completion. | |
| 196 | +| `choices` | `array` | The list of completion choices the model generated for the input messages. | |
| 197 | +| `created` | `integer` | The Unix timestamp (in seconds) of when the completion was created. | |
| 198 | +| `model` | `string` | The model_id used for completion. | |
| 199 | +| `object` | `string` | The object type, which is always `chat.completion`. | |
| 200 | +| `usage` | `object` | Usage statistics for the completion request. | |
| 201 | + |
| 202 | +> [!TIP] |
| 203 | +> In the streaming mode, for each chunk of response, `finish_reason` is always `null`, except from the last one which is terminated by a payload `[DONE]`. In each `choices` object, the key for `messages` is changed by `delta`. |
| 204 | +
|
| 205 | + |
| 206 | +The `choices` object is a dictionary with the following fields. |
| 207 | + |
| 208 | +| Key | Type | Description | |
| 209 | +|---------|-----------|--------------| |
| 210 | +| `index` | `integer` | Choice index. When `best_of` > 1, the index in this array might not be in order and might not be `0` to `n-1`. | |
| 211 | +| `messages` or `delta` | `string` | Chat completion result in `messages` object. When streaming mode is used, `delta` key is used. | |
| 212 | +| `finish_reason` | `string` | The reason the model stopped generating tokens: <br>- `stop`: model hit a natural stop point or a provided stop sequence. <br>- `length`: if max number of tokens have been reached. <br>- `content_filter`: When RAI moderates and CMP forces moderation <br>- `content_filter_error`: an error during moderation and wasn't able to make decision on the response <br>- `null`: API response still in progress or incomplete. | |
| 213 | +| `logprobs` | `object` | The log probabilities of the generated tokens in the output text. | |
| 214 | + |
| 215 | + |
| 216 | +The `usage` object is a dictionary with the following fields. |
| 217 | + |
| 218 | +| Key | Type | Value | |
| 219 | +|---------------------|-----------|-----------------------------------------------| |
| 220 | +| `prompt_tokens` | `integer` | Number of tokens in the prompt. | |
| 221 | +| `completion_tokens` | `integer` | Number of tokens generated in the completion. | |
| 222 | +| `total_tokens` | `integer` | Total tokens. | |
| 223 | + |
| 224 | +The `logprobs` object is a dictionary with the following fields: |
| 225 | + |
| 226 | +| Key | Type | Value | |
| 227 | +|------------------|-------------------------|---------| |
| 228 | +| `text_offsets` | `array` of `integers` | The position or index of each token in the completion output. | |
| 229 | +| `token_logprobs` | `array` of `float` | Selected `logprobs` from dictionary in `top_logprobs` array. | |
| 230 | +| `tokens` | `array` of `string` | Selected tokens. | |
| 231 | +| `top_logprobs` | `array` of `dictionary` | Array of dictionary. In each dictionary, the key is the token and the value is the prob. | |
| 232 | + |
| 233 | +#### Example |
| 234 | + |
| 235 | +The following is an example response: |
| 236 | + |
| 237 | +```json |
| 238 | +{ |
| 239 | + "id": "12345678-1234-1234-1234-abcdefghijkl", |
| 240 | + "object": "chat.completion", |
| 241 | + "created": 2012359, |
| 242 | + "model": "", |
| 243 | + "choices": [ |
| 244 | + { |
| 245 | + "index": 0, |
| 246 | + "finish_reason": "stop", |
| 247 | + "message": { |
| 248 | + "role": "assistant", |
| 249 | + "content": "Sure, I\'d be happy to help! The translation of ""I love programming"" from English to Italian is:\n\n""Amo la programmazione.""\n\nHere\'s a breakdown of the translation:\n\n* ""I love"" in English becomes ""Amo"" in Italian.\n* ""programming"" in English becomes ""la programmazione"" in Italian.\n\nI hope that helps! Let me know if you have any other sentences you\'d like me to translate." |
| 250 | + } |
| 251 | + } |
| 252 | + ], |
| 253 | + "usage": { |
| 254 | + "prompt_tokens": 10, |
| 255 | + "total_tokens": 40, |
| 256 | + "completion_tokens": 30 |
| 257 | + } |
| 258 | +} |
| 259 | +``` |
| 260 | + |
| 261 | +#### Additional inference examples |
| 262 | + |
| 263 | +| **Sample Type** | **Sample Notebook** | |
| 264 | +|----------------|----------------------------------------| |
| 265 | +| Curl | [webrequests.ipynb](https://aka.ms/mistral-large/webrequests-sample)| |
| 266 | +| OpenAI SDK(experimental) | [openaisdk.ipynb](https://aka.ms/mistral-large/openaisdk) | |
| 267 | +| LangChain | [langchain.ipynb](https://aka.ms/mistral-large/langchain-sample) | |
| 268 | +| Mistral AI | [mistralai.ipynb](https://aka.ms/mistral-large/mistralai-sample) | |
| 269 | +| LiteLLM | [litellm.ipynb](https://aka.ms/mistral-large/litellm-sample) |
| 270 | + |
| 271 | +## Cost and quotas |
| 272 | + |
| 273 | +### Cost and quota considerations for Mistral-large deployed as a service |
| 274 | + |
| 275 | +Mistral models deployed as a service are offered by Mistral AI through Azure Marketplace and integrated with Azure Machine Learning studio for use. You can find Azure Marketplace pricing when deploying the models. |
| 276 | + |
| 277 | +Each time a workspace subscribes to a given model offering from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. |
| 278 | + |
| 279 | +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](../ai-studio/how-to/costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). |
| 280 | + |
| 281 | +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. |
| 282 | + |
| 283 | +## Data and policy |
| 284 | + |
| 285 | +No data from the user using models deployed as a service with pay-as-you-go is sent to the model provider (in this case Mistral AI). |
| 286 | + |
| 287 | +## Content filtering |
| 288 | + |
| 289 | +Models deployed as a service with pay-as-you-go are protected by Azure AI content safety. With Azure AI content safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](/azure/ai-services/content-safety/overview). |
| 290 | + |
| 291 | +## Related content |
| 292 | + |
| 293 | +- [Model Catalog and Collections](concept-model-catalog.md) |
| 294 | +- [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md) |
| 295 | +- [Plan and manage costs for Azure AI Studio](concept-plan-manage-cost.md) |
0 commit comments